linear_algebra:matrix_decompositions

minireference.com/linear_algebra/matrix_decompositions

The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).

Matrix Decompositions

Matrix Decompositions

It is often useful to express a given matrix $M$ as the product of different, simpler, matrices. These matrix decompositions (factorizations) can help us understand the structure of matrices by looking at their constituents. In this section we'll discuss various matrix factorizations and specify what types of matrices they are applicable to.

Most of the material covered here is not usually part of a first-year course on linear algebra. Nevertheless, I want you to know about the different matrix decompositions because many linear algebra applications depend on these techniques.

Eigendecomposition

The eigenvalue decomposition is a way to break-up a matrix into its natural basis. In its natural basis, a diagonalizable matrix can be written as: $M = Q \Lambda Q^{-1},$ where $Q$ is a matrix of eigenvectors $Q=[\vec{e}_1,\vec{e}_2,\ldots,\vec{e}_n]$ and $\Lambda$ is a diagonal matrix $\Lambda_{ii} = \lambda_i$ , where $\lambda_1,\lambda_2,\ldots,\lambda_n$ are the eigenvalues the matrix $M$ .

When the matrix $M$ is normal ( $MM^T = M^T M$ ), we can choose $Q$ to be an orthogonal matrix $O$ which satisfies $O^T O = I$ . Calculating the inverse of an orthogonal matrix is easy: $O^{-1}=O^T$ , so the diagonalization for normal matrices becomes: $M = O \Lambda O^T.$

If the matrix $M$ is symmetric, then all its eigenvalues will be real numbers.

Similarity transformation

Consider the matrix $N \in \mathbb{R}^{n\times n}$ and an invertible matrix $P \in \mathbb{R}^{n\times n}$ . In a similarity transformation the matrix $N$ is multiplied by $P$ from the left and by the inverse of $P$ on the right: $M = P N P^{-1}.$

Because $P$ is an invertible matrix, its columns form a basis for the space $\mathbb{R}^n$ . Thus we can interpret $P$ as a change of basis from the standard basis to the basis of the columns of $P$ . The matrix $P^{-1}$ corresponds to the inverse change of basis.

The matrices $M$ and $N$ correspond to the same linear transformation but with respect to different bases. We say matrix $N$ is similar to the matrix $M$ . Similar matrices have the same eigenvalues $\textrm{eig}(N)=\textrm{eig}(M)$ , and therefore have the same trace $\textrm{Tr}(M)=\textrm{Tr}(N)=\sum_i \lambda_i$ and the same determinant $|M|=|N|=\prod_i \lambda_i$ .

Note that the eigendecomposition of a matrix is a type of similarity transformation where the change of basis matrix is constructed from the set of eigenvectors.

Singular value decomposition

We can generalize the concept of eigenvalues to non-square matrices. Consider an $m \times n$ matrix $M$ . We can always write it as a diagonal matrix $\Sigma$ surrounded by matrices of left eigenvectors and right eigenvectors: $M = U\Sigma V,$ where

$\Sigma \in \mathbb{R}^{m\times n}$ is a diagonal matrix containing the square roots $\sigma_i$

of the eigenvalues $\lambda_i$ of the matrix $MM^T$ (or the matrix $M^TM$ ,

  since $M^TM$ and $MM^T$ have the same eigenvalues):
  \[
   \sigma_i = \sqrt{ \lambda_i }, 
    \textrm{ where } \{ \lambda_i \} = \textrm{eig}(MM^T) = \textrm{eig}(M^T M).
  \]
* $U$ is an orthogonal matrix who's columns are the $m$-dimensional eigenvectors
  of $MM^T$.
  \[ 
   U=     
   \begin{bmatrix}
    |  &  & | \nl
    \hat{u}_{\lambda_1}  &  \cdots &  \hat{u}_{\lambda_m} \nl
    |  &  & | 
    \end{bmatrix},
    \textrm{ where } \{ (\lambda_i,\hat{u}_i) \} = \textrm{eigv}(MM^T).
  \]
* $V$ is a orthogonal matrix whose //rows// are the $n$-dimensional
  eigenvectors of $M^T M$.
  \[
   V=
     \begin{bmatrix}
     - & \hat{v}_{1}  &  - \nl
      & \vdots &  \nl
     - & \hat{v}_{n} & -
     \end{bmatrix},
    \textrm{ where } \{ (\lambda_i,\hat{v}_i) \} = \textrm{eigv}(M^T M).
  \]

Written more explicitly, the singular value decomposition of the matrix $M$ is $M= \underbrace{ \begin{bmatrix} | & & | \nl \hat{u}_{\lambda_1} & \cdots & \hat{u}_{\lambda_m} \nl | & & | \end{bmatrix} }_U \underbrace{ \begin{bmatrix} \sigma_1 & 0 & \cdots \nl 0 & \sigma_2 & \cdots \nl 0 & 0 & \cdots \end{bmatrix} }_\Sigma \underbrace{ \begin{bmatrix} \ \ - & \hat{v}_{1} & - \ \ \nl & \vdots & \nl - & \hat{v}_{n} & - \end{bmatrix} }_V.$ The above formula allows us to see the structure of the matrix $M$ . We can interpret the operation $\vec{y} = M\vec{x}$ as a three step process:

Convert the input vector $\vec{x}$ from the standard basis to the basis $\{ \vec{v}_i \}$
Scale each component by the corresponding singular value $\sigma_i$
Convert the output from the $\{ \vec{u}_i \}$ basis

back to the standard basis

LU decomposition

It is much easier to compute the inverse of a triangular matrix than it is for a general matrix. Thus, it is useful to write a matrix as the product of two triangular matrices for computational purposes. We call this the $LU$ decomposition: $A = LU,$ where $U$ is an upper triangular matrix and $L$ is a lower triangular matrix.

The main application of this decomposition is to obtain more efficient solutions to equations of the form $A\vec{x}=\vec{b}$ . Because $A=LU$ , we can solve this equation in two steps. Starting from $L^{-1}LU\vec{x}=U\vec{x}=L^{-1}\vec{b}$ and then $U^{-1}U\vec{x}=\vec{x}=U^{-1}L^{-1}\vec{b}$ . We have split the work of finding the inverse $A^{-1}$ into two simpler subtasks: finding $L^{-1}$ and $U^{-1}$ .

The $LU$ decomposition is mainly used in computer algorithms, but it is also possible to find the $LU$ decomposition of a matrix by hand. Recall the algorithm for finding the inverse of a matrix in which you start from the array $[A|I]$ and do row operations until you get the array into the reduced row echelon form $[I|A^{-1}]$ . Consider the midpoint of the algorithm, when the left-hand side of the array is the row echelon form (REF). Since the matrix $A$ in its REF is upper triangular, the array will contain $[U|L^{-1}]$ . The $U$ part of the decomposition is on the left-hand side, and the $L$ part is obtained by finding the inverse of the right hand side of the array.

Cholesky decomposition

For a symmetric and positive semidefinite matrix $A$ , the $LU$ decomposition takes the simpler form. Such matrices can be written as the product of a triangular matrix with its transpose: $A = LL^T, \quad \textrm{or} \quad A=U^TU,$ where $U$ is an upper triangular matrix and L is a lower triangular matrix.

QR decomposition

Any real square matrix $A \in \mathbb{R}^{n\times n}$ can be decomposed as a product of an orthogonal matrix $O$ and an upper triangular matrix $U$ : $A = OU.$ For historical reasons, the orthogonal matrix is usually denoted $Q$ instead of $O$ and the upper triangular matrix is $R$ instead (think “right-triangular” since it has entries only to the right of main diagonal). The decomposition then becomes: $A = QR,$ and this is why it is known as the QR decomposition.

The $QR$ decomposition is equivalent to the Gram-Schmidt orthogonalization procedure.

Example

Consider the decomposition of $A = \begin{bmatrix} 12 & -51 & 4 \nl 6 & 167 & -68 \nl -4 & 24 & -41 \end{bmatrix} = OR.$

We are looking for the orthogonal matrix $O$ , i.e., a matrix $O$ which obeys $O^{T}\,O = I$ and an upper triangular matrix $R$ . We can obtain an orthogonal matrix by making its columns orthonormal vectors (Gram–Schmidt procedure) and recording the Gram–Schmidt coefficients in the matrix $R$ .

Let us now illustrate the procedure can be used to compute the factorization $A=OR$ . The first step is to change the second column in $A$ so that it becomes orthogonal to the first (by subtracting a multiple of the first column). Next we change the third column in $A$ so that it is orthogonal to both of the first columns (by subtracting multiples of the first two columns). In doing so we obtain a matrix which has the same column space as $A$ but which has orthogonal columns: $\begin{bmatrix} | & | & | \nl \mathbf u_1 & \mathbf u_2 & \mathbf u_3 \nl | & | & | \end{bmatrix} = \begin{bmatrix} 12 & -69 & -58/5 \nl 6 & 158 & 6/5 \nl -4 & 30 & -33 \end{bmatrix}.$ To obtain an orthogonal matrix we must normalize each column to be of unit length: $O = \begin{bmatrix} | & | & | \nl \frac{\mathbf u_1}{\|\mathbf u_1\|} & \frac{\mathbf u_2}{\|\mathbf u_2\|} & \frac{\mathbf u_3}{\|\mathbf u_3\|} \nl | & | & | \end{bmatrix} = \begin{bmatrix} 6/7 & -69/175 & -58/175 \nl 3/7 & 158/175 & 6/175 \nl -2/7 & 6/35 & -33/35 \end{bmatrix}.$

We can find the matrix $R$ as follows: $\begin{matrix} O^{T} A = O^{T}Q\,R = R \end{matrix}, \qquad \begin{matrix} R = O^{T}A = \end{matrix} \begin{bmatrix} 14 & 21 & -14 \nl 0 & 175 & -70 \nl 0 & 0 & 35 \end{bmatrix}.$ The columns of $R$ contain the mixture coefficients required to obtain the columns of $A$ from the columns of $O$ . For example, the second column of $A$ is equal to $21\frac{\mathbf u_1}{\|\mathbf u_1\|}+175\frac{\mathbf u_2}{\|\mathbf u_2\|}$ .

Discussion

You will no doubt agree with me that spending time on learning about these different decompositions was educational. If you are interested in pursuing the subject matrix factorization (decomposition) you will find that there we have only scratched the subject. I encourage you to research this subject on your own. There are countless areas of application for matrix methods. I will just mention three topics from machine learning: nonnegative matrix factorization, latent semantic indexing, and latent Dirichlet allocation.

Links

[ Retro movie showing steps in SVD ]
http://www.youtube.com/watch?v=R9UoFyqJca8

NOINDENT [ More info from wikipedia ]
http://en.wikipedia.org/wiki/Matrix_decomposition
http://en.wikipedia.org/wiki/Singular_value_decomposition

NOINDENT [ A detailed example of the QR factorization of a matrix ]
http://www.math.ucla.edu/~yanovsky/Teaching/Math151B/handouts/GramSchmidt.pdf

NOINDENT [ Cholesky decomposition ]
http://en.wikipedia.org/wiki/Cholesky_decomposition