The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).


Eigenvalues and eigenvectors

The set of eigenvectors of a matrix is a special set of input vectors for which the action of the matrix is described as a scaling. Decomposing a matrix in terms of its eigenvalues and its eigenvectors gives valuable insights into the properties of the matrix.

Certain matrix calculations like computing the power of the matrix become much easier when we use the eigendecomposition of the matrix. For example, suppose you are given a square matrix $A$ and you want to compute $A^5$. To make this example more concrete, let's use the matrix \[ A = \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}. \]

We want to compute \[ A^5 = \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}. \] That is a lot of matrix multiplications. You'll have to multiply and add entries for a while! Imagine how many times you would have to multiply the matrix if I had asked for $A^{55}$ instead?

Let's be smart about this. Every matrix corresponds to some linear operation. This means that it is a legitimate question to ask “what does the matrix $A$ do?” and once we figure out what it does, we can compute $A^{55}$ by simply doing what $A$ does $55$ times.

The best way to see what a matrix does is to look inside of it and see what it is made of. What is its natural basis (own basis) and what are its values (own values).

Deep down inside, the matrix $A$ is really a product of three matrices: \[ \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} = \underbrace{\begin{bmatrix} 0.850.. & -0.525.. \nl 0.525.. & 0.850.. \end{bmatrix} }_Q \ \underbrace{\! \begin{bmatrix} 1.618.. & 0 \nl 0 &-0.618.. \end{bmatrix} }_{\Lambda} \underbrace{ \begin{bmatrix} 0.850.. & 0.525.. \nl -0.525.. & 0.850.. \end{bmatrix} }_{Q^{-1}}. \] \[ A = Q\Lambda Q^{-1} \] I am serious. You can multiply these three matrices together and you will get $A$. Notice that the “middle matrix” $\Lambda$ (the capital Greek letter lambda) has entries only on the diagonal, the matrix $\Lambda$ is sandwiched between between the matrix $Q$ on the left and $Q^{-1}$ (the inverse of $Q$) on the right. This way of writing $A$ will allow us to compute $A^5$ in a civilized manner: \[ \begin{eqnarray} A^5 & = & A A A A A \nl & = & Q\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda Q^{-1} \nl & = & Q\Lambda I \Lambda I \Lambda I \Lambda I \Lambda Q^{-1} \nl & = & Q\Lambda \Lambda \Lambda \Lambda \Lambda Q^{-1} \nl & = & Q\Lambda^5 Q^{-1}. \end{eqnarray} \]

Since the matrix $\Lambda$ is diagonal, it is really easy to compute its fifth power $\Lambda^5$: \[ \begin{bmatrix} 1.618.. & 0 \nl 0 &-0.618.. \end{bmatrix}^5 = \begin{bmatrix} (1.618..)^5 & 0 \nl 0 &(-0.618..)^5 \end{bmatrix} = \begin{bmatrix} 11.090.. & 0 \nl 0 &-0.090.. \end{bmatrix}\!. \]

Thus we have \[ \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}^5 \! = \underbrace{\begin{bmatrix} 0.850..\! & -0.525.. \nl 0.525..\! & 0.850.. \end{bmatrix} }_Q \! \begin{bmatrix} 11.090.. \! & 0 \nl 0 \! &-0.090.. \end{bmatrix} \! \underbrace{ \begin{bmatrix} 0.850.. & 0.525.. \nl -0.525.. & 0.850.. \end{bmatrix} }_{Q^{-1}}\!. \] We still have to multiply these three matrices together, but we have brought down the work from $4$ matrix multiplications down to just two.

The answer is \[ A^5 = Q\Lambda^5 Q^{-1} = \begin{bmatrix} 8 & 5 \nl 5 & 3 \end{bmatrix}. \]

Using the same technique, we can just as easily compute $A^{55}$: \[ A^{55} = Q\Lambda^{55} Q^{-1} = \begin{bmatrix} 225851433717 & 139583862445 \nl 139583862445 & 86267571272 \end{bmatrix}. \]

We could even compute $A^{5555}$ if we wanted to, but you get the point. If you look at $A$ in the right basis, repeated multiplication only involves computing the powers of its eigenvalues.

Definitions

  • $A$: an $n\times n$ square matrix.

When necessary, we will denote the individual entries of $A$ as $a_{ij}$.

  • $\textrm{eig}(A)\equiv(\lambda_1, \lambda_2, \ldots, \lambda_n )$:

the list of eigenvalues of $A$. Usually denoted with the greek letter lambda.

  Note that some eigenvalues could be repeated.
* $p(\lambda)=\det(A - \lambda I)$: 
  the //characteristic polynomial// for the matrix $A$. The eigenvalues are the roots of this polynomial.
* $\{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \ldots, \vec{e}_{\lambda_n} \}$: 
  the set of //eigenvectors// of $A$. Each eigenvector is associated with a corresponding eigenvalue.
* $\Lambda  \equiv {\rm diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$: 
  the diagonal version of $A$. The matrix $\Lambda$ contains the eigenvalues of $A$ on the diagonal:
  \[
   \Lambda = 
   \begin{bmatrix}
   \lambda_1	&  \cdots  &  0 \nl
   \vdots 	&  \ddots  &  0  \nl
   0  	&   0      &  \lambda_n
   \end{bmatrix}.
  \]
  The matrix $\Lambda$ corresponds to the matrix representation of $A$ with respect to its eigenbasis.
* $Q$: a matrix whose columns are the eigenvectors of $A$:
  \[
   Q 
   \equiv
   \begin{bmatrix}
   |  &  & | \nl
   \vec{e}_{\lambda_1}  &  \cdots &  \vec{e}_{\lambda_n} \nl
   |  &  & | 
   \end{bmatrix}
    =  \ 
   _{B_s}\![I]_{B_\lambda}.
  \]
  The matrix $Q$ corresponds to the //change of basis matrix// 
  from the eigenbasis $B_\lambda = \{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \vec{e}_{\lambda_3}, \ldots \}$
  to the standard basis $B_s = \{\hat{\imath}, \hat{\jmath}, \hat{k}, \ldots \}$.
* $A=Q\Lambda Q^{-1}$: the //eigendecomposition// of the matrix $A$.
* $\Lambda = Q^{-1}AQ$: the //diagonalization// of the matrix $A$.

TODO: fix/tensorify indices above and use \ mathbbm{1} instead of I

Eigenvalues

The eigenvalue equation is \[ A\vec{e}_\lambda =\lambda\vec{e}_\lambda, \] where $\lambda$ is an eigenvalue and $\vec{e}_\lambda$ is an eigenvector of the matrix $A$. If we multiply $A$ by an eigenvector $\vec{e}_\lambda$, we get back the same vector scaled by the constant $\lambda$.

To find the eigenvalue of a matrix we start from the eigenvalue equation $A\vec{e}_\lambda =\lambda\vec{e}_\lambda$, insert the identity ${11}$, and rewrite it as a null-space problem: \[ A\vec{e}_\lambda =\lambda{11}\vec{e}_\lambda \qquad \Rightarrow \qquad \left(A - \lambda{11}\right)\vec{e}_\lambda = \vec{0}. \] This equation will have a solution whenever $|A - \lambda{11}|=0$. The eigenvalues of $A \in \mathbb{R}^{n \times n}$, denoted $(\lambda_1, \lambda_2, \ldots, \lambda_n )$, are the roots of the characteristic polynomial: \[ p(\lambda)=\det(A - \lambda I) \equiv |A-\lambda I|=0. \] When we calculate this determinant, we'll obtain an expression involving the coefficients $a_{ij}$ and the variable $\lambda$. If $A$ is an $n \times n $ matrix, the characteristic polynomial is of degree $n$ in the variable $\lambda$.

We denote the list of eigenvalues as $\textrm{eig}(A)=( \lambda_1, \lambda_2, \ldots, \lambda_n )$. If a $\lambda_i$ is a repeated root of the characteristic polynomial $p(\lambda)$, we say that it is a degenerate eigenvalue. For example the identity matrix $I \in \mathbb{R}^{2\times 2}$ has the characteristic polynomial $p_I(\lambda)=(\lambda-1)^2$ which has a repeated root at $\lambda=1$. We say the eigenvalue $\lambda=1$ has algebraic multiplicity $2$. It is important to keep track of degenerate eigenvalues, so we'll specify the multiplicity of an eigenvalue by repeatedly including it in the list of eigenvalues $\textrm{eig}(I)=(\lambda_1, \lambda_2) = (1,1)$.

Eigenvectors

The eigenvectors associated with eigenvalue $\lambda_i$ of matrix $A$ are the vectors in the null space of the matrix $(A-\lambda_i I )$.

To find the eigenvectors associated with the eigenvalue $\lambda_i$, you have to solve for the components $e_{\lambda,x}$ and $e_{\lambda,y}$ of the vector $\vec{e}_\lambda=(e_{\lambda,x},e_{\lambda,y})$ that satisfies the equation: \[ A\vec{e}_\lambda =\lambda\vec{e}_\lambda, \] or equivalently \[ (A-\lambda I ) \vec{e}_\lambda = 0\qquad \Rightarrow \qquad \begin{bmatrix} a_{11}-\lambda & a_{12} \nl a_{21} & a_{22}-\lambda \end{bmatrix} \begin{bmatrix} e_{\lambda,x} \nl e_{\lambda,y} \end{bmatrix} = \begin{bmatrix} 0 \nl 0 \end{bmatrix}. \]

If $\lambda_i$ is a repeated root (degenerate eigenvalue), the null space $(A-\lambda_i I )$ could contain multiple eigenvectors. The dimension of the null space of $(A-\lambda_i I )$ is called the geometric multiplicity of the eigenvalue $\lambda_i$.

Eigendecomposition

If an $n \times n$ matrix $A$ is diagonalizable, this means that we can find $n$ eigenvectors for that matrix. The eigenvectors that come from different eigenspaces are guaranteed to be linearly independent (see exercises). We can also pick a set of linearly independent vectors within each of the degenerate eigenspaces. Combining the eigenvectors from all the eigenspaces we get a set of $n$ linearly independent eigenvectors, which form a basis for $\mathbb{R}^n$. We call this the eigenbasis.

Let's put the $n$ eigenvectors next to each other as the columns of a matrix: \[ Q = \begin{bmatrix} | & & | \nl \vec{e}_{\lambda_1} & \cdots & \vec{e}_{\lambda_n} \nl | & & | \end{bmatrix}. \]

We can decompose $A$ into its eigenvalues and its eigenvectors: \[ A = Q \Lambda Q^{-1} = \begin{bmatrix} | & & | \nl \vec{e}_{\lambda_1} & \cdots & \vec{e}_{\lambda_n} \nl | & & | \end{bmatrix} \begin{bmatrix} \lambda_1 & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & \lambda_n \end{bmatrix} \begin{bmatrix} \ \nl \ \ \ \ \ \ Q^{-1} \ \ \ \ \ \ \nl \ \end{bmatrix}. \] The matrix $\Lambda$ is a diagonal matrix of eigenvalues and the matrix $Q$ is the “change of basis” matrix which contains the corresponding eigenvectors as columns.

Note that only the direction of each eigenvector is important and not the length. Indeed if $\vec{e}_\lambda$ is an eigenvector (with value $\lambda$), then so is any $\alpha \vec{e}_\lambda$ for all $\alpha \in \mathbb{R}$. Thus we are free to use any multiple of the vectors $\vec{e}_{\lambda_i}$ as the columns of the matrix $Q$.

Example

Find the eigenvalues, the eigenvectors and the diagonalization of the matrix: \[ A=\begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix}. \]

The eigenvalues of the matrix are (in decreasing order) \[ \lambda_1 = 3, \quad \lambda_2 = 2, \quad \lambda_3= 1. \] When an $n \times n$ matrix has $n$ distinct eigenvalues, it is diagonalizable since it will have $n$ linearly independent eigenvectors. Since the matrix $A$ has has $3$ different eigenvalues it is diagonalizable.

The eigenvalues of $A$ are the values that will appear in the diagonal of $\Lambda$, so by finding the eigenvalues of $A$ we already know its diagonalization. We could stop here, but instead, let's continue and find the eigenvectors of $A$.

The eigenvectors of $A$ are found by solving for the null space of the matrices $(A-3I)$, $(A-2I)$, and $(A-I)$ respectively: \[ \vec{e}_{\lambda_1} = \begin{bmatrix} -1 \nl -1 \nl 2 \end{bmatrix}, \quad \vec{e}_{\lambda_2} = \begin{bmatrix} 0 \nl 0 \nl 1 \end{bmatrix}, \quad \vec{e}_{\lambda_3} = \begin{bmatrix} -1 \nl 0 \nl 2 \end{bmatrix}. \] Check that $A \vec{e}_{\lambda_k} = \lambda_k \vec{e}_{\lambda_k}$ for each of the above vectors. Let $Q$ be the matrix with these eigenvectors as its columns: \[ Q= \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix}, \qquad \textrm{and} \qquad Q^{-1} = \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix}. \] These matrices form the eigendecomposition of the matrix $A$: \[ A = Q\Lambda Q^{-1} = \begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix} = \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix} \!\! \begin{bmatrix} 3 & 0 & 0 \nl 0 & 2 & 0 \nl 0 & 0 & 1\end{bmatrix} \!\! \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix}\!. \]

To find the diagonalization of $A$, we must move $Q$ and $Q^{-1}$ to the other side of the equation. More specifically, we multiply the equation $A=Q\Lambda Q^{-1}$ by $Q^{-1}$ on the left and by $Q$ on the right to obtain the diagonal matrix: \[ \Lambda = Q^{-1}AQ = \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix} \!\! \begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix} \!\! \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 3 & 0 & 0 \nl 0 & 2 & 0 \nl 0 & 0 & 1\end{bmatrix}\!. \]

Explanations

Eigenspaces

Recall the definition of the null space of a matrix $M$: \[ \mathcal{N}(M) \equiv \{ \vec{v} \in \mathbb{R}^n \ | \ M\vec{v} = 0 \}. \] The dimension of the null space is the number of linearly independent vectors you can find in the null space. If $M$ sends exactly two linearly independent vectors $\vec{v}$ and $\vec{w}$ to the zero vector: \[ M\vec{v} = 0, \qquad M\vec{w} = 0, \] then the null space is two-dimensional. We can always choose the vectors $\vec{v}$ and $\vec{w}$ to be orthogonal $\vec{v}\cdot\vec{w}=0$ and thus obtain an orthogonal basis for the null space.

Each eigenvalue $\lambda_i$ has an eigenspace associated with it. The eigenspace is the null space of the matrix $(A-\lambda_i I)$: \[ E_{\lambda_i} \equiv \mathcal{N}\left( A-\lambda_i I \right) = \{ \vec{v} \in \mathbb{R}^n \ | \ \left( A-\lambda_i I \right)\vec{v} = 0 \}. \] For degenerate eigenvalues (repeated roots of the characteristic polynomial) the null space of $\left( A-\lambda_i I \right)$ could contain multiple eigenvectors.

Change of basis

The matrix $Q$ can be interpreted as a change of basis matrix. Given a vector written in terms of the eigenbasis $[\vec{v}]_{B_{\lambda}}=(v^\prime_1,v^\prime_2,v^\prime_3)_{B_{\lambda}} = v^\prime_1\vec{e}_{\lambda_1}+ v^\prime_2\vec{e}_{\lambda_3}+v^\prime_3\vec{e}_{\lambda_3}$, we can use the matrix $Q$ to convert it to the standard basis $[\vec{v}]_{B_{s}} = (v_1, v_2,v_3) = v_1\hat{\imath} + v_2\hat{\jmath}+v_3\hat{k}$ as follows: \[ [\vec{v}]_{B_{s}} = \ Q [\vec{v}]_{B_{\lambda}} = \ _{B_{s}\!}[{11}]_{B_{\lambda}} [\vec{v}]_{B_{\lambda}}. \]

The change of basis in the other direction is given by the inverse matrix: \[ [\vec{v}]_{B_{\lambda}} = \ Q^{-1} [\vec{v}]_{B_{s}} = _{B_{\lambda}\!}\left[{11}\right]_{B_{s}} [\vec{v}]_{B_{s}}. \]

Interpretations

The eigendecomposition $A = Q \Lambda Q^{-1}$ allows us to interpret the action of $A$ on an arbitrary input vector $\vec{v}$ as the following three steps: \[ [\vec{w}]_{B_{s}} = \ _{B_{s}\!}[A]_{B_{s}} [\vec{v}]_{B_{s}} = Q\Lambda Q^{-1} [\vec{v}]_{B_{s}} = \ \underbrace{\!\!\ _{B_{s}\!}[{11}]_{B_{\lambda}} \ \underbrace{\!\!\ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} \underbrace{\ _{B_{\lambda}\!}[{11}]_{B_{s}} [\vec{v}]_{B_{s}} }_1 }_2 }_3. \]

  1. In the first step we convert the vector $\vec{v}$ from the standard basis

to the eigenabasis.

  1. In the second step the action of $A$ on vectors expressed with respect to its eigenbasis

corresponds to a multiplication by the diagonal matrix $\Lambda$.

  1. In the third step we convert the output $\vec{w}$ from the eigenbasis

back to the standard basis.

Another way of interpreting the above steps is to say that, deep down inside, the matrix $A$ is actually the diagonal matrix $\Lambda$. To see the diagonal form of the matrix, we have to express the input vectors with respect to the eigenabasis: \[ [\vec{w}]_{B_{\lambda}} = \ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} [\vec{v}]_{B_{\lambda}}. \]

It is extremely important that you understand the the equation $A=Q\Lambda Q^{-1}$ intuitively in terms of the three step procedure. To help you understand, we'll analyze in detail what happens when we multiply $A$ by one of its eigenvectors. Let's pick $\vec{e}_{\lambda_1}$ and verify the equation $A\vec{e}_{\lambda_1} = Q\Lambda Q^{-1}\vec{e}_{\lambda_1} \lambda_1\vec{e}_{\lambda_1}$ by follow the vector through the three steps: \[ \ _{B_{s}\!}[A]_{B_{s}} [\vec{e}_{\lambda_1}]_{B_{s}} = Q\Lambda Q^{-1} [\vec{e}_{\lambda_1}]_{B_{s}} = \ \underbrace{\!\!\ _{B_{s}\!}[{11}]_{B_{\lambda}} \ \underbrace{\!\!\ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} \underbrace{\ _{B_{\lambda}\!}[{11}]_{B_{s}} [\vec{e}_{\lambda_1}]_{B_{s}} }_{ (1,0,\ldots)^T_{B_\lambda} } }_{ (\lambda_1,0,\ldots)^T_{B_\lambda} } }_{ \lambda_1 [\vec{e}_{\lambda_1}]_{B_{s}} } = \lambda_1 [\vec{e}_{\lambda_1}]_{B_{s}} \] In the first step, we convert the vector $[\vec{e}_{\lambda_1}]_{B_{s}}$ to the eigenbasis and obtain $(1,0,\ldots,0)^T_{B_\lambda}$. The result of the second step is $(\lambda_1,0,\ldots,0)^T_{B_\lambda}$ because multiplying $\Lambda$ by the vector $(1,0,\ldots,0)^T_{B_\lambda}$ “selects only the first column of $\Lambda$. In the third step we convert $(\lambda_1,0,\ldots,0)^T_{B_\lambda}=\lambda_1(1,0,\ldots,0)^T_{B_\lambda}$ back to the standard basis to obtain $\lambda_1[\vec{e}_{\lambda_1}]_{B_{s}}$.

Invariant properties of matrices

The determinant and the trace of a matrix are strictly functions of the eigenvalues. The determinant of $A$ is the product of its eigenvalues: \[ \det(A) \equiv |A| =\prod_i \lambda_i = \lambda_1\lambda_2\cdots\lambda_n, \] and the trace is their sum: \[ {\rm Tr}(A)=\sum_i a_{ii}=\sum_i \lambda_i = \lambda_1 + \lambda_2 + \cdots \lambda_n. \]

Here are the steps we followed to obtain these equations: \[ |A|=|Q\Lambda Q^{-1}| =|Q||\Lambda| |Q^{-1}| =|Q||Q^{-1}||\Lambda| =|Q| \frac{1}{|Q|}|\Lambda| =|\Lambda| =\prod_i \lambda_i, \] \[ {\rm Tr}(A)={\rm Tr}(Q\Lambda Q^{-1}) ={\rm Tr}(\Lambda Q^{-1}Q) ={\rm Tr}(\Lambda)=\sum_i \lambda_i. \]

In fact the above calculations remain valid when the matrix undergoes any similarity transformation. A similarity transformation is essentially a “change of basis”-type of calculation: the matrix $A$ gets multiplied by an invertible matrix $P$ from the left and by the inverse of $P$ on the right: $A \to PA P^{-1}$. Therefore, the determinant and the trace of a matrix are two properties that do not depend on the choice of basis used to represent the matrix! We say the determinant and the trace are invariant properties of the matrix.

Relation to invertibility

Let us briefly revisit three of the equivalent conditions we stated in the invertible matrix theorem. For a matrix $A \in \mathbb{R}^{n \times n}$, the following statements are equivalent:

  1. $A$ is invertible
  2. $|A|\neq 0$
  3. The null space contains only the zero vector $\mathcal{N}(A)=\vec{0}$

Using the formula $|A|=\prod_{i=1}^n \lambda_i$, it is easy to see why the last two statements are equivalent. If $|A|\neq 0$ then none of the $\lambda_i$s is zero, otherwise the product of the eigenvalues would be zero. We know that $\lambda=0$ is not and eigenvalues of $A$ which means that there is no vector $\vec{v}$ such that $A\vec{v} = 0\vec{v}=\vec{0}$. Therefore there are no vectors in the null space: $\mathcal{N}(A)=\{ \vec{0} \}$.

We can also follow the reasoning in the other direction. If the null space of $A$ is empty, then there is no non-zero vector $\vec{v}$ such that $A\vec{v} = \vec{0}$, which means $\lambda=0$ is not an eigenvalue of $A$, and hence the product $\lambda_1\lambda_2\cdots \lambda_n \neq 0$.

However, if there exists a non-zero vector $\vec{v}$ such that $A\vec{v} = \vec{0}$, then $A$ has a non-empty null space and $\lambda=0$ is an eigenvalue of $A$ and thus $|A|=0$.

Normal matrices

A matrix $A$ is normal if it satisfies the equation $A^TA = A A^T$. All normal matrices are diagonalizable and furthermore the diagonalization matrix $Q$ can be chosen to be an orthogonal matrix $O$.

The eigenvectors corresponding to different eigenvalues of a normal matrix are orthogonal. Furthermore we can always choose the eigenvectors within the same eigenspace to be orthogonal. By collecting the eigenvectors from all of the eigenspaces of the matrix $A \in \mathbb{R}^{n \times n}$, it is possible to obtain a complete basis $\{\vec{e}_1,\vec{e}_2,\ldots, \vec{e}_n\}$ of orthogonal eigenvectors: \[ \vec{e}_{i} \cdot \vec{e}_{j} = \left\{ \begin{array}{ll} \|\vec{e}_i\|^2 & \text{ if } i =j, \nl 0 & \text{ if } i \neq j. \end{array}\right. \] By normalizing each of these vectors we can find a set of eigenvectors $\{\hat{e}_1,\hat{e}_2,\ldots, \hat{e}_n \}$ which is an orthonormal basis for the space $\mathbb{R}^n$: \[ \hat{e}_{i} \cdot \hat{e}_{j} = \left\{ \begin{array}{ll} 1 & \text{ if } i =j, \nl 0 & \text{ if } i \neq j. \end{array}\right. \]

Consider now the matrix $O$ constructed by using these orthonormal vectors as the columns: \[ O= \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix}. \]

The matrix $O$ is an orthogonal matrix, which means that it satisfies $OO^T=I=O^TO$. In other words, the inverse of $O$ is obtained by taking the transpose $O^T$. To see that this is true consider the following product: \[ O^T O = \begin{bmatrix} - & \hat{e}_{1} & - \nl & \vdots & \nl - & \hat{e}_{n} & - \end{bmatrix} \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \nl 0 & \ddots & 0 \nl 0 & 0 & 1 \end{bmatrix} ={11}. \] Each of the ones on the diagonal arises from the dot product of a unit-length eigenvector with itself. The off-diagonal entries are zero because the vectors are orthogonal. By definition, the inverse $O^{-1}$ is the matrix which when multiplied by $O$ gives $I$, so we have $O^{-1} = O^T$.

Using the orthogonal matrix $O$ and its inverse $O^T$, we can write the eigendecomposition of a matrix $A$ as follows: \[ A = O \Lambda O^{-1} = O \Lambda O^T = \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix} \begin{bmatrix} \lambda_1 & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & \lambda_n \end{bmatrix} \begin{bmatrix} - & \hat{e}_{1} & - \nl & \vdots & \nl - & \hat{e}_{n} & - \end{bmatrix}\!. \]

The key advantage of using a diagonalization procedure with an orthogonal matrix $O$ is that computing the inverse is simplified significantly since $O^{-1}=O^T$.

Discussion

Non-diagonalizable matrices

Not all matrices are diagonalizable. For example, the matrix \[ B= \begin{bmatrix} 3 & 1 \nl 0 & 3 \end{bmatrix}, \] has $\lambda = 3$ as a repeated eigenvalue, but the null space of $(B-3{11})$ contains only one vector $(1,0)^T$. The matrix $B$ has a single eigenvector in the eigenspace $\lambda=3$. We're one eigenvector short, and it is not possible to obtain a complete basis of eigenvectors. Therefore we cannot build the diagonalizing change of basis matrix $Q$. We say $B$ is not diagonalizable.

Matrix power series

One of the most useful concepts of calculus is the idea that functions can be represented as Taylor series. The Taylor series of the exponential function $f(x) =e^x$ is \[ e^x = \sum_{k=0}^\infty \frac{x^k}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{3!} + \frac{x^4}{4!} + \frac{x^5}{5!} + \ldots. \] Nothing stops us from using the same Taylor series expression to define the exponential function of a matrix: \[ e^A = \sum_{k=0}^\infty \frac{A^k}{n!} = 1 + A + \frac{A^2}{2} + \frac{A^3}{3!} + \frac{A^4}{4!} + \frac{A^5}{5!} + \ldots . \] Okay, there is one thing stopping us, and that is having to compute an infinite sum of progressively longer matrix products! But wait, remember how we used the diagonalization of $A=Q\Lambda Q^{-1}$ to easily compute $A^{55}=Q\Lambda^{55} Q^{-1}$? We can use that trick here too and obtain the exponential of a matrix in a much simpler form: \[ \begin{align*} e^A & = \sum_{k=0}^\infty \frac{A^k}{n!} = \sum_{k=0}^\infty \frac{(Q\Lambda Q^{-1})^k}{n!} \nl & = \sum_{k=0}^\infty \frac{Q\:\Lambda^k\:Q^{-1} }{n!} \nl & = Q\left[ \sum_{k=0}^\infty \frac{ \Lambda^k }{n!}\right]Q^{-1} \nl & = Q\left( 1 + \Lambda + \frac{\Lambda^2}{2} + \frac{\Lambda^3}{3!} + \frac{\Lambda^4}{4!} + \ldots \right)Q^{-1} \nl & = Qe^\Lambda Q^{-1} = \begin{bmatrix} \ \nl \ \ \ \ \ \ Q \ \ \ \ \ \ \ \nl \ \end{bmatrix} \begin{bmatrix} e^{\lambda_1} & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & e^{\lambda_n} \end{bmatrix} \begin{bmatrix} \ \nl \ \ \ \ \ \ Q^{-1} \ \ \ \ \ \ \nl \ \end{bmatrix}\!. \end{align*} \]

We can use this approach to talk about “matrix functions” of the form: \[ F: \mathbb{M}(n,n) \to \mathbb{M}(n,n), \] simply by defining them as Taylor series of matrices. Computing the matrix function $F(M)$ on an input matrix $M=Q\Lambda Q^{-1}$ is equivalent to computing the function $f$ to the eigenvalues of $M$ as follows: $F(M)=Q\:f(\Lambda)\:Q^{-1}$.

Review

In this section we learned how to decompose matrices in terms of their eigenvalues and eigenvectors. Let's briefly review everything that we discussed. The fundamental equation is $A\vec{e}_{\lambda_i} = \lambda_i\vec{e}_{\lambda_i}$, where the vector $\vec{e}_{\lambda_i}$ is an eigenvector of the matrix $A$ and the number $\lambda_i$ is an eigenvalue of $A$. The word eigen is the German word for self.

The characteristic polynomial comes about from a simple manipulations of the eigenvalue equation: \[ \begin{eqnarray} A\vec{e}_{\lambda_i} & = &\lambda_i\vec{e}_{\lambda_i} \nl A\vec{e}_{\lambda_i} - \lambda \vec{e}_{\lambda_i} & = & 0 \nl (A-{\lambda_i} I)\vec{e}_{\lambda_i} & = & 0. \end{eqnarray} \]

There are two ways we can get a zero, either the vector $\vec{e}_\lambda$ is the zero vector or it lies in the null space of $(A-\lambda I)$. The problem of finding the eigenvalues therefore reduces to finding the values of $\lambda$ for which the matrix $(A-\lambda I)$ is not invertible, i.e., it has a null space. The easiest way to check if a matrix is invertible is to compute the determinant: $|A-\lambda I| = 0$.

There will be multiple eigenvalues and eigenvector that satisfy this equation, so we keep a whole list of eigenvalues $(\lambda_1, \lambda_2, \ldots, \lambda_n )$, and corresponding eigenvectors $\{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \ldots \}$.

Applications

Many scientific applications use the eigen-decomposition of a matrix as a building block. We'll mention a few of these applications without going into too much detail. - Principal component analysis - PageRank - quantum mechanics energy, and info-theory TODO, finish the above points

Analyzing a matrix in terms of its eigenvalues and its eigenvectors is a very powerful way to “see inside the matrix” and understand what the matrix does. In the next section we'll analyze several different types of matrices and discuss their properties in terms of their eigenvalues.

Links

[ Good visual examples from wikipedia ]
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

Exercises

Q1

Prove that a collection of nonzero eigenvectors corresponding to distinct eigenvalues is linearly independent.

Hint: Proof by contradiction. Assume that we have $n$ distinct eigenvalues $\lambda_i$ and eigenvectors $\{ \vec{e}_i \}$ which are linearly dependent: $\sum_{i=1}^n \alpha_i \vec{e}_i = \vec{0}$ with some $\alpha_i \neq 0$. If a non-zero combination of $\alpha_i$ really could give the zero vector as a linear combination then this equation would be true: $(A-\lambda_n I )\left(\sum \alpha_i\vec{e}_i\right) = (A-\lambda_n I )\vec{0}=\vec{0}$, but if you expand the expression on the left you will see that it is not equal to zero.

Q2

Show that an $n \times n$ matrix has at most $n$ distinct eigenvalues.

Q3
 
home about buy book