# Matrix Algebra

A *Naive* Introduction to Algorithms: Part I

*BB*

### Introduction

These preliminary algorithmic tours are meant to be a very general introduction to a vast, vast subject. At the same time, I wanted to go far enough and deep enough into the chosen topics so as to provide an overall picture of how at the core, modern statistical algorithms are, in essence, a compendium of very simple, small pieces, which fit together to make a slightly more complex whole. For example, the inverse of the matrix requires one to calculate the determinant (well, usually), which of course requires multiplication, and so on...

For these overviews, I omit detailed proofs, although when warranted, formulas and/or pseudocode is provided. However, most attention is paid to the actual implementation of the algorithm in R. Similarly, math fundamentals/theory are referenced, but not in any great detail. In terms of coverage, I specifically focus on the underpinnings of major statistical algorithms (Linear Algebra/OLS, Optimization/Logistic Regression). Again, it is my hope that this 'tour', so to speak, provides a roadmap for future study, at the very least. After this part is complete, I intend to post more short pieces that consist mainly of algorithms/code with only writing/comments that are truly necessary.

## Basic Matrix Operations

As you are probably well aware, the modern computational methods we use today would not be possible without linear algebra. At the most molecular level, most matrix operations are just simple arithmetic that the computer performs over and over again. Again, although nothing is particularly complex, both constructing the iteration and/or recursion scheme and consequent debugging can be tricky given that one missing piece will break everything, or even worse, results will be off in a manner that is difficult to recognize.

## Basics

### Multiplication

Matrix multiplication uses a triple for-loop to sum up products across rows and columns. \[ \textbf{for}\ i=1:m\\\hspace{1cm} \textbf{for}\ j = 1:n\\ \hspace{2cm} \textbf{for}\ k = 1:r\\ \hspace{5cm}M_{i,j} = M_{i,j} + X_{i,k} * Y_{k,j}\\ \hspace{2cm} \textbf{end loop}\ \\ \hspace{1cm} \textbf{end loop}\ \\ \hspace{0cm} \textbf{end loop}\ \]

The subscript i names the row, j names the column, and k handles the dot product. Nevertheless, the ordering of the loops is arbitrary.

matmult <- function(X,Y) { m <- matrix(0,dim(X)[1],dim(Y)[2]) # Row for (i in 1:dim(X)[1]) { # Column for (j in 1:dim(Y)[2]) { # Column of X or Row of Y for (k in 1:dim(Y)[1]) { m[i,j] <- m[i,j] + X[i,k] * Y[k,j] } } } m } x <- matrix(rnorm(16),4,4) y <- matrix(rnorm(16),4,4) matmult(x,y)

## [,1] [,2] [,3] [,4] ## [1,] -2.858235 -2.5435280 1.8303066 1.0208735 ## [2,] 1.547737 -2.3085681 -0.6122848 -1.3666563 ## [3,] -3.256425 -0.2429263 0.6441392 -1.0394348 ## [4,] 1.524830 0.4802570 -0.8023529 0.2774851

x %*% y # base R

## [,1] [,2] [,3] [,4] ## [1,] -2.858235 -2.5435280 1.8303066 1.0208735 ## [2,] 1.547737 -2.3085681 -0.6122848 -1.3666563 ## [3,] -3.256425 -0.2429263 0.6441392 -1.0394348 ## [4,] 1.524830 0.4802570 -0.8023529 0.2774851

# different dims z <- matrix(rnorm(8),4,2) matmult(x,z)

## [,1] [,2] ## [1,] -0.9093389 -1.03023634 ## [2,] 1.1929292 0.04823813 ## [3,] -0.9385204 -2.35411594 ## [4,] 0.2981690 1.58123597

z %*% x # error in base R

## Error in z %*% x: non-conformable arguments

matmult(z,x) # ditto

## Error in X[i, k]: subscript out of bounds

It is important to note that matrix multiplication is almost always taken care of in a primitive language for speed purposes. R, of course, uses the GNU BLAS/LAPACK/LINPACK etc. libraries which have roughly 50 years of use Also, while the basic matrix multiplication algorithm is \(O(N^3)\), there are other algorithms (e.g., Strassen) that offer some improvement... ...

### Determinant

The determinant is a single value that expresses the overall variability of a matrix. It is used in numerous other matrix algebra applications, including the inverse, which we cover next. \[ det(A) = \sum_{j=1}^n (-1)^{1+j}a_{1j}M_{1j} \]

The basic idea (for a NXN Matrix) is to break everything down into the 2x2 case through minor expansion. The 2x2 case is just the difference of both diagonals multiplied with each other, \(|A| = \begin{bmatrix} a &b\\ c &d\end{bmatrix} = ad - bc \).

D2x2 <- function(X) { out <- (X[1,1] * X[2,2] - X[2,1] * X[1,2]) out }

Then, rows and columns are "blocked" off depending on position The full column and row corresponding to the entry is what is blocked off , and the determinants of the sub-matrices are calculated. These are called "minors" and once calculated, comprise a matrix of minors with each element representing the determinant of the sub-matrix. Reading that last sentence over confuses me and I just wrote it so let's take a look.

matdet <- function(x) { D2x2 <- function(X) { out <- (X[1,1] * X[2,2] - X[2,1] * X[1,2]) out } if (length(x)==1) { return(x) } else if (any(dim(x)) == 2) { dtemp <- D2x2(x) return(dtemp) } else { for (j in 1:dim(x)[2]) { multiplier <- ((-1)^(1+j))*(x[1,j]) # alternating sign if (j==1) { d <- (multiplier*(matdet(x[-1,-j])))

# 'locking' first row by default for simplicity } else { d <- d + (multiplier*(matdet(x[-1,-j]))) } } } return(d) } matdet(x)

## [1] -6.037856

det(x) # base R

## [1] -6.037856

Like most of the algorithms in this intro, there are other/faster See this SO question... ways to compute the determinant, but the simple recursive version is fairly easy to understand. As much as I would like to walk through each operation line by line, space/time considerations don't allow that. Still, if the above is still unclear, I'd recommend following the code either through debugging or taking apart the function in the global environment. Then, look at other algorithms/implementations and compare...

### Matrix Inverse

From the determinant, we move to the inverse. There is no such thing as matrix division, but there is multiplication. The inverse can be found by first finding the adjoint matrix, which is just the transpose of the matrix of cofactors This is a common theme, each operation/matrix/whatever has its own name which at first can be overwhelming... , and then dividing (or use inverse of the determinant) each element of the adjoint by the overall determinant. \[ \ Adj(A) = F^T \] \[ \ A^{-1} = \frac{Adj(A)}{Det(A)} \] Take a look below...

inv <- function(x) { inversemat <- matrix(0,dim(x)[1],dim(x)[2]) for (i in 1:dim(x)[1]) { for (j in 1:dim(x)[2]) { multiplier <- ((-1)^(i+j)) inversemat[i,j] <- multiplier*(matdet(x[-i,-j])) } } detout <- matdet(x) stopifnot(abs(detout) > 1e-04) # stop if determinant is zero inversemat <- (1/detout) * t(inversemat)

# using base R for transpose here.. return(inversemat) }

### Ordinary Least Squares

So, what was the point of all that? Well, for one, it is now possible to solve the famous OLS equation, \( \hat\beta = (X^TX)^{-1}X^Ty \), from scratch...

X <- as.matrix(mtcars[,5:8],32,3) # just throwing in a few columns X <- cbind(1,X) # intercept y <- matrix(mtcars[,1]) # matrix so dim() functions in matmult() work correctly # steps are arbitrary, just a way to break it down... (step1 <- matmult(t(X),X))

## [,1] [,2] [,3] [,4] [,5] ## [1,] 32.000 115.0900 102.9520 571.160 14.000 ## [2,] 115.090 422.7907 358.7190 2056.914 54.030 ## [3,] 102.952 358.7190 360.9011 1828.095 36.558 ## [4,] 571.160 2056.9140 1828.0946 10293.480 270.670 ## [5,] 14.000 54.0300 36.5580 270.670 14.000

(step2 <- inv(step1))

## [,1] [,2] [,3] [,4] [,5] ## [1,] 14.0506300 -1.34446369 -0.37276217 -0.48277552 1.44520301 ## [2,] -1.3444637 0.23810018 0.07872045 0.01476050 -0.06536778 ## [3,] -0.3727622 0.07872045 0.08571491 -0.01260577 0.08884587 ## [4,] -0.4827755 0.01476050 -0.01260577 0.02870522 -0.09624656 ## [5,] 1.4452030 -0.06536778 0.08884587 -0.09624656 0.50728638

(step3 <- matmult(t(X),y))

## [,1] ## [1,] 642.900 ## [2,] 2380.277 ## [3,] 1909.753 ## [4,] 11614.745 ## [5,] 343.800

(step4 <- matmult(step2,step3))

## [,1] ## [1,] 10.6166864 ## [2,] 1.6913239 ## [3,] -4.4456101 ## [4,] 0.9980007 ## [5,] -0.2730062

step1R <- t(X) %*% X step2R <- solve(step1) step3R <- t(X) %*% y step4R <- step2 %*% step3 step4R

## [,1] ## [1,] 10.6166864 ## [2,] 1.6913239 ## [3,] -4.4456101 ## [4,] 0.9980007 ## [5,] -0.2730062

(lm.fit(X,y))$coefficients

## drat wt qsec vs ## 10.6166864 1.6913239 -4.4456101 0.9980007 -0.2730062

## QR Decomposition

### QR Overview

Although the ability to calculate the inverse of a matrix is all you need for OLS, in actuality, the technique R uses is actually slightly more complicated. Instead, the default least squares fitting method in R is the QR decomposition. There are a number of different types of matrix decompositions
A matrix decomposition is where matrix \(A\) can be broken down into multiple other matrices, \(A = CWF\)
including LU, Eigen, Cholesky, etc., but QR tends to be the default because it is both a) stable/accurate and b) relatively fast. The R functions `qr`

and `qr.coeff`

See also `qr.fitted`

and `qr.resid`

...
allow one to directly utilize this method, but the `lm()`

function runs QR Fortran routines under the hood be default. In fact, this is currently the only method available for `lm()`

or any of its other variants.

Like most statistical algorithms, there are a number of different variants of the QR decomposition and even more sources. For this overview, I closely follow Brown's (2007)
*Linear Models in Matrix Form*. Note that I highly recommend this book as a gentle, but still rigorous text for getting one's feet wet in terms linear algebra/models. Readers in search of a serious, comprehensive guide to matrix algorithms and proofs are directed to Golub and Van Loan (2013) or one of the SIAM publications..
discussion of the QR decomposition/algorithm. Although this is one of the rare books that actually presents R code (as opposed to MATLAB) at the end of each section, but for learning/challenge/fun(??) reasons, I code them directly from formulas so there will undoubtedly be differences in the approach.

Like other decompositions (see sidenote #5), the QR factors a matrix into Q, an orthonormal matrix "Orthonormal (orthogonal) matrices are matrices in which the columns vectors form an orthonormal set (each column vector has length one and is orthogonal to all the other colum vectors" , and R, an upper-triangular matrix. Moreover,

An orthonormal matrix returns an identity matrix when it is premultiplied by its transpose, and an upper triangular matrix has 0's below the main diagonal. These features make the QR decomposition a highly efficient and accurate method for solving a variety of statistical problems.

### Householder Transform

The QR decomposition can be performed with a number of different orthogonalization techniques. For dense matrices Givens rotations require fewer operations for sparse matrices... , Householder Dr. Alston Scott Householder, pretty smart dude.. transformations, tend to produce highly reliable results,

Householder transformations are used to selectively zero out blocks of entries in vectors or columns of matrices in a manner that is extremely stable with respect to round-off error.

The basic formula is, \( H = I − 2uu^t\).

In terms of the process, following Brown (2014), Burden and Faire (2011) Basic numerical computing textbook..., Golub and Van Loan (2013), etc., one first creates an identity matrix with k+1 (includes intercept) and then calculate \(u\) via the following ratio, \[ u=\frac{v\pm\|v\|p_{i}}{\|(v\pm\|v\|p_{i})\|} \] where the plus-minus in the numerator depends on whether the first non-zero element is greater than (plus), or less than (minus), zero. The ratio itself consists of the column-vector of the original matrix plus(minus) the Euclidean norm of the same vector times the corresponding column of the identity matrix over the Euclidean norm of the numerator. If this leads to a 'wtf!!??' moment, implementation in R code may add clarification.

# Euclidean norm eucnorm <- function(x) (x <- sqrt(sum(x^2))) calculate_u <- function(A,p,i) { # don't forget to add the drop=FALSE (like I did at first!!) v <- A[,i,drop=FALSE] if (i>1) v[1:i-1] <- 0 # adding zeros # finds the first non-zero element x <- Find(function(x) x != 0, x = v) if (x > 0) { # if positive, then addition num <- v + (eucnorm(v) * p[,i,drop=FALSE]) denom <- eucnorm(v + eucnorm(v) * p[,i,drop=FALSE]) u <- num / denom u } else { # if negative, then subtraction num <- v - (eucnorm(v) * p[,i,drop=FALSE]) denom <- eucnorm(v - eucnorm(v) * p[,i,drop=FALSE]) u <- num / denom u } }

Each resulting \(H_{p}\) is in turn used to pre-multiply the original matrix where already transformed columns are set to zero to get \(R\). \(Q\) is found by the multiplicative product of \(\prod P_{i}\). Again, code makes this considerably more clear...

M <- matrix(rnorm(16),4,4) QR <- function(A) { # latter condition below isn't necessary, but makes things easier for common case.. stopifnot(is.matrix(A) | (dim(A)[2] >= dim(A)[1])) I <- diag(dim(A)[1]) Hlist <- list() Rlist <- list() QRlist <- list() for (i in 1:dim(A)[2]) { if (i==1) { u <- calculate_u(A,I,i) Hlist[[i]] <- I - 2 * matmult(u,t(u)) Rlist[[i]] <- matmult(Hlist[[i]],A) } else { u <- calculate_u(Rlist[[(i-1)]],I,i) Hlist[[i]] <- (I - 2 * matmult(u,t(u))) Rlist[[i]] <- matmult(Hlist[[i]],Rlist[[(i-1)]]) } } Q <- Reduce(matmult,Hlist) ## yes, cheating with Reduce() here... R <- Rlist[[dim(A)[2]]] QRlist <- list(Q=Q,R=R) QRlist } (out <- QR(M))

## $Q ## [,1] [,2] [,3] [,4] ## [1,] -0.3382630 0.2907179 0.5490472 -0.7068298 ## [2,] -0.0948519 -0.3569862 -0.6820139 -0.6312060 ## [3,] 0.7592288 0.5737387 -0.1675652 -0.2575218 ## [4,] 0.5478621 -0.6773979 0.4531293 -0.1888204 ## ## $R ## [,1] [,2] [,3] [,4] ## [1,] -1.362792e+00 1.230086e+00 1.155561e+00 -0.8892821 ## [2,] 1.221896e-16 -1.287082e+00 -1.064953e+00 -0.4146394 ## [3,] -6.577927e-17 1.750513e-16 -2.292122e+00 1.1328554 ## [4,] -3.760067e-18 2.423225e-16 4.440892e-16 -1.5915067

So now what? Well, for one, you can reconstruct the original matrix nearly perfectly...

```
M
```

## [,1] [,2] [,3] [,4] ## [1,] 0.4609820 -0.7902707 -1.9589672 1.9271836 ## [2,] 0.1292634 0.3427946 1.8338251 0.4643161 ## [3,] -1.0346706 0.1954678 0.6504099 -0.6930427 ## [4,] -0.7466218 1.5457845 0.3158573 0.6075107

head(matmult(out$Q,out$R))

## [,1] [,2] [,3] [,4] ## [1,] 0.4609820 -0.7902707 -1.9589672 1.9271836 ## [2,] 0.1292634 0.3427946 1.8338251 0.4643161 ## [3,] -1.0346706 0.1954678 0.6504099 -0.6930427 ## [4,] -0.7466218 1.5457845 0.3158573 0.6075107

all.equal(M,matmult(out$Q,out$R))

## [1] TRUE

The QR algorithm can also easily be extended to obtain the eigenvalues of a matrix, which is, in fact, in many cases its main purpose. What is an eigenvalue you ask?

To explain eigenvalues, we first explain eigenvectors. Almost all vectors change di-rection, when they are multiplied by A. Certain exceptional vectors x are in the same direction as Ax. Those are the "eigenvectors". Multiply an eigenvector by A, and the vector Ax is a number lamda times the original x.

The basic equation is \( Ax = \lambda x \). The number \(\lambda\) is an eigenvalue of A.

The eigenvalue lambda tells whether the special vector x is stretched or shrunk or reversed or left unchanged-when it is multiplied by A.

In case you couldn't tell, defining eigenvalues(vectors) in an inuitive sense without relying on the geometric interpretation or mathmatical operations is rather difficult, if not impossible. For the completely uninitiated, I'd recommend watching several YouTube videos
If the author's, uh, enthusiasm isn't too much for you try this channel...
and then checking out any solid linear algebra textbook
See Banerjee and Roy (2014), Gruber (2013), or even the cheekily titled *No Bullshit Guide to Linear Algebra* by Ivan Savov (2016)...
. However, once you venture into a specific domain, they can take on a more concrete meaning. For example, in my field, quantitative psychology, eigenvalues play a large role in dimensional reduction techniques like factor analysis and PCA. In this case, when performing a factor analysis on a number of psychometric items, one would look for the largest eigenvalues as they hold more information than those that are smaller (this, alas, is a massive subject in its own right). Before I demonstrate code, it's also important to note that eigenvalues are usually, but not always, computed for symmetric matrices. However, in the latter case, things get dicey quick and software like R resorts to the use of complex numbers for approximations..

Here, in the following code, we get small taste of iteration until a certain tolerance level is reached, something that is a fundamental aspect for optimization algorithms.

QR_alg <- function(A, iter) { QR_i <- QR(A) RQ <- QR_i[[2]] %*% QR_i[[1]] e <- rep(1, dim(RQ)[2]) i=0 while (i < iter) { i <- i + 1 QR_i <- QR(RQ) RQ <- QR_i[[2]] %*% QR_i[[1]] x <- isTRUE(all.equal(diag(RQ),e)) if (x) { break } e <- diag(RQ) } out <- list(RQ=RQ,eigenvals=e,iter=i) out } head(X) # recall our subset from mtcars from earlier

## drat wt qsec vs ## Mazda RX4 1 3.90 2.620 16.46 0 ## Mazda RX4 Wag 1 3.90 2.875 17.02 0 ## Datsun 710 1 3.85 2.320 18.61 1 ## Hornet 4 Drive 1 3.08 3.215 19.44 1 ## Hornet Sportabout 1 3.15 3.440 17.02 0 ## Valiant 1 2.76 3.460 20.22 1

X_SS <- matmult(t(X),X) # sum of squares QR_alg(X_SS,50)$eigenvals

## [1] 1.106900e+04 4.103495e+01 1.077402e+01 2.294991e+00 6.965035e-02

# as far as I know, no eigenvalues via QR in R.. eigen(X_SS,only.values = TRUE)

## $values ## [1] 1.106900e+04 4.103495e+01 1.077402e+01 2.294991e+00 6.965035e-02 ## ## $vectors ## NULL

Note that this code could
Read *should*...
be extended to include shifts which alter the diagonal of A \(RQ + sI\) by subtracting values that appear after i iterations from the values of iteration \(i+1\).

Finally, we can get our least squares estimates via,
\[
b = R^{-1}Q^{T}y
\]
It is important to mention that you will need to 'cull' row/columns based on the `QR()`

code provided above in order to make R symmetric, or, in other words, keep only as many rows as the non-zero upper triangular part. Similarly, for Q, take just the number columns (plus vector of one's) as there were variables in the original matrix. This is the 'skinny QR'.
See here...

X <- as.matrix(mtcars[,5:8],32,3) X <- cbind(1,X) y <- matrix(mtcars[,1]) Q <- QR(X)[[1]] R <- QR(X)[[2]] step1QR <- inv(R[1:5,1:5]) # making R symmetric as it 'should be' step2QR <- t(Q[,1:5]) # other columns are rubbish step3QR <- matmult(step1QR,step2QR) matmult(step3QR,y)

## [,1] ## [1,] 10.6166864 ## [2,] 1.6913239 ## [3,] -4.4456101 ## [4,] 0.9980007 ## [5,] -0.2730062

lm.fit(X,y)$coefficients

## drat wt qsec vs ## 10.6166864 1.6913239 -4.4456101 0.9980007 -0.2730062

Well, that's more than enough for now. This took quite a while to do, despite its incomplete nature. Moreover, numerous little bugs popped up for the more complicated algorithms when creating examples (and there are probably more I haven't found!), as, afterall, the 'devil is in the details'.
One of my favorite things with the *heavy duty* linear algebra and optimization textbooks are the warnings at the front that code should not be "used as-is" in production applications and most certainly not where lives are at stake! Makes me feel special reading big boy(girl) books...

## Comments

2019-01-21 00:36:00

Buy Ocuvir Cream [url=http://rxbill7.com]cialis online[/url] Finasteride Farmaco Propecia Viagra Pour Elle Uso Viagra Para Mujeres Zitromax On Line [url=http://curerxshop.com]cialis 5mg best price[/url] Kamagra Superactive Amoxicillin And Yeast Infection Order Diflucan Online No Prescription [url=http://cialiviag.com]cialis overnight shipping from usa[/url] Lowest Price Flagyl Low Price My Pharmacy 365 Order Now Doryx In Internet Buy Fincar On Line Comprare Levitra In Italia [url=http://antabusefast.com]antabuse[/url] Buy Valtrex 1000 Mg Discount Isotretinoin Buy [url=http://drugsor.com]levitra and cialis online[/url] 100mg Viagra For Sale Order Metronidazole 200mg Online Canada Cheap Cialis And Viagra

2019-04-20 19:55:00

Side effects of drinking alcohol and taking Coupon for Cialis? <a href="http://www.wikidot.com/user:info/cialis-coiupon ">Cialis Generic</a> Cheap Cialis no script. Best online site for Cialis Coupon reviews. <a href="https://couponforcialis.wordpress.com/ ">Coupon for Cialis</a> Can you drink alcohol and take Cialis Coupon?

2019-04-26 15:03:00

Side Effects Keflex [url=http://mdsmeds.com]cialis cheapest online prices[/url] Viagra Opinione

2019-05-29 17:16:00

Thanks to GM I had sex for the first time in two months. <a href="https://viagraoktobuy.com/">generic viagra</a> We want you to be fully satisfied with every item that you purchase from www.

2019-06-05 11:45:00

Abuse is Bad Bad chronic will help you value of cookie will right now here in. <a href="https://viagraoktobuy.com/">where to buy viagra</a> Most designer clothing labels are actually produced in India and China as well, so this is not an issue.