linear_algebra:matrix_inverse

minireference.com/linear_algebra/matrix_inverse

The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).

Matrix inverse

Matrix inverse

Recall that the problem of solving a system of linear equations $\begin{align*} x_1 + 2x_2 & = 5, \nl 3x_2 + 9x_2 & = 21, \end{align*}$ can be written in the form of a matrix-times-vector product: $\begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = \begin{bmatrix} 5 \nl 21 \end{bmatrix},$ or more compactly as $A\vec{x}=\vec{b}.$ Here $A$ is a $2 \times 2$ matrix, $\vec{x}$ is the vector of unknowns (a $2 \times 1$ matrix), and $\vec{b}$ is a vector of constants (a $2 \times 1$ matrix).

Consider now the matrix equation which corresponds the system of linear equations. We can solve this equation for $\vec{x}$ by multiplying (from the left) both sides of the equation by the inverse $A^{-1}$ . We obtain: $A^{-1} A \vec{x} = I \vec{x} = \vec{x} = A^{-1}\vec{b}.$ Thus, solving a system of linear equations is equivalent to finding the inverse of the matrix of coefficients and then computing the product: $\vec{x} = \begin{bmatrix}x_1 \nl x_2 \end{bmatrix} = A^{-1} \vec{b} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix} \begin{bmatrix}5 \nl 21 \end{bmatrix} = \begin{bmatrix}1 \nl 2 \end{bmatrix}.$

As you can see computing the inverse of matrices is a pretty useful skill to have. In this section, we will learn about several approaches for computing the inverse of a matrix. Note that the matrix inverse is unique so no matter which method you use to find the inverse, you will always get the same answer. Knowing this is very useful because you can verify that your calculations are correct by computing the inverse in two different ways.

Existence of an inverse

Not all matrices can be inverted. Given any matrix $A \in \mathbb{R}^{n \times n }$ we can check whether $A$ is invertible or not by computing the determinant of $A$ : $A^{-1} \ \textrm{ exists if and only if } \ \textrm{det}(A) \neq 0.$

Adjugate matrix approach

The inverse of a $2\times2$ matrix can be computed as follows: $\left[ \begin{array}{cc} a&b\nl c&d\end{array} \right]^{-1}=\frac{1}{ad-bc} \left[ \begin{array}{cc} d&-b\nl -c&a \end{array}\right].$

This is the $2 \times 2$ version of a general formula for obtaining the inverse based on the adjugate matrix: $A^{-1} = \frac{1}{ \textrm{det}(A) } \textrm{adj}(A).$ What is the adjugate you ask? It is kind of complicated, so we need to go step by step. We need to define a few prerequisite concepts before we can get to the adjugate matrix.

In what follows we will work on a matrix $A \in \mathbb{R}^{n \times n}$ and refer to the its entries as $a_{ij}$ , where $i$ is the row index and $j$ is the column index as usual. We will illustrate the steps in the $3 \times 3$ case: $A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \nl a_{21} & a_{22} & a_{23} \nl a_{31} & a_{32} & a_{33} \end{pmatrix}.$

We first need to define two new terms for dealing with determinants:

For each entry $a_{ij}$ we compute the minor $M_{ij}$ ,

which is the determinant of the matrix that remains when

  we remove row $i$ and the column $j$ from the matrix $A$.
  For example, the minor that corresponds to the entry $a_{12}$
  is given by:
  \[
   M_{12} = 
     \left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33}  \end{matrix} \right|.
  \]
- For each entry $a_{ij}$ de define the //sign// of the entry to be:
  \[
    \textrm{sign}(a_{ij}) = (-1)^{i+j}.
  \]
- Define the //cofactor// $c_{ij}$ for each entry as
  the product of its sign and its minor: $c_{ij} =\textrm{sign}(a_{ij})M_{ij}$.

The above concepts should be familiar to you from the section on determinants. Indeed, we can now write down a precise formula for computing the determinant. The most common way to take a determinant, it to expand along the top row which gives the following formula: $\textrm{det}(A) = \sum_{j=1}^n a_{1j} \textrm{sign}(a_{1j}) M_{1j} = \sum_{j=1}^n a_{1j} c_{1j}.$ Of course, we could have chosen any other row or column to expand along. Taking the determinant along the first column is given by: $\textrm{det}(A) = \sum_{i=1}^n a_{i1} \textrm{sign}(a_{i1}) M_{i1} = \sum_{i=1}^n a_{i1} c_{i1}.$ Perhaps now you can see where the name cofactor comes from: the cofactor $c_{ij}$ is what multiplies the entry $a_{ij}$ in the determinant formula.

OK, let us get back to our description of the adjugate matrix. The adjugate of a matrix is defined as the transpose of the matrix of cofactors $C$ . The matrix of cofactors is a matrix of the same dimensions as the original matrix $A$ , which is build by replacing each entry $a_{ij}$ by its cofactor $c_{ij}$ : $C = \begin{pmatrix} c_{11} & c_{12} & c_{13} \nl c_{21} & c_{22} & c_{23} \nl c_{31} & c_{32} & c_{33} \end{pmatrix} = \begin{pmatrix} +\left| \begin{matrix} a_{22} & a_{23} \nl a_{32} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{21} & a_{22} \nl a_{31} & a_{32} \end{matrix} \right| \nl & & \nl -\left| \begin{matrix} a_{12} & a_{13} \nl a_{32} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{13} \nl a_{31} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{12} \nl a_{31} & a_{32} \end{matrix} \right| \nl & & \nl +\left| \begin{matrix} a_{12} & a_{13} \nl a_{22} & a_{23} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{13} \nl a_{21} & a_{23} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{12} \nl a_{21} & a_{22} \end{matrix} \right| \end{pmatrix}.$

So to compute $\textrm{adj}(A)$ we simply take the transpose of $C$ . Combining all of the above steps into the formula for the inverse $A^{-1} = \frac{1}{ \textrm{det}(A) } \textrm{adj}(A)= \frac{1}{ \textrm{det}(A) } C^T$ we obtain the final formula: $A^{-1} = \frac{1}{ \left|\begin{matrix} a_{11} & a_{12} & a_{13} \nl a_{21} & a_{22} & a_{23} \nl a_{31} & a_{32} & a_{33} \end{matrix} \right|} \begin{pmatrix} +\left| \begin{matrix} a_{22} & a_{23} \nl a_{32} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{12} & a_{13} \nl a_{32} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{12} & a_{13} \nl a_{22} & a_{23} \end{matrix} \right| \nl & & \nl -\left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{13} \nl a_{31} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{13} \nl a_{21} & a_{23} \end{matrix} \right| \nl & & \nl +\left| \begin{matrix} a_{21} & a_{22} \nl a_{31} & a_{32} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{12} \nl a_{31} & a_{32} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{12} \nl a_{21} & a_{22} \end{matrix} \right| \end{pmatrix}.$

I know this is very complicated, but I had to show you. In practice you will rarely have to compute this by hand, and use a computer instead.

Reduced row echelon algorithm

Another way to obtain the inverse of a matrix is to record all the row operations $\mathcal{R}_1,\mathcal{R}_2,\ldots$ needed to transform the matrix $A$ into the identity matrix: $\mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( A ) )\ldots) = I = A^{-1}A.$ Recall that the matrix $A$ can be thought of “doing” something to vectors. The identity operation corresponds to multiplication by the identity matrix $I\vec{v}=\vec{v}$ . The above formula is an operational definition of the inverse $A^{-1}$ as the set of operations needed to “undo” the actions of $A$ : $A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots).$

This way of finding the inverse $A^{-1}$ may sound waaaaay too complicated to ever be useful. It would be if it weren't for the existence of a very neat trick for recording the row operations $\mathcal{R}_1$ , $\mathcal{R}_2$ , $\ldots$ , $\mathcal{R}_k$ .

We initialize an $n \times 2n$ array with the entries of the matrix $A$ on the left side and the identity matrix on the right-hand side: $[\;A\; | \ I\:\ ].$ If you perform the RREF algorithm on this array (Gauss–Jordan elimination), you will end up with the inverse $A^{-1}$ on the right-hand side of the array: $[ \ \:I\ | \; A^{-1} ].$

Example

We now illustrate the procedure by computing the inverse the following matrix: $A = \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix}.$

We start by writing the matrix $A$ next to the identity $I$ matrix: $\left[ \begin{array}{ccccc} 1 & 2 &|& 1 & 0 \nl 3 & 9 &|& 0 & 1 \end{array} \right].$

We now perform Gauss-Jordan elimination procedure on the resulting $2 \times 4$ matrix.

The first step is to subtract three times the first row

from the second row, or written compactly $R_2 \gets R_2 -3R_1$ to obtain:

  \[
  \left[ 
  \begin{array}{ccccc}
  1 & 2  &|&  1  & 0  \nl
  0 & 3  &|&  -3 & 1  
  \end{array} \right].
  \]
- Second we perform $R_2 \gets \frac{1}{3}R_2$ and get:
  \[
  \left[ 
  \begin{array}{ccccc}
  1 & 2  &|&  1  & 0  \nl
  0 & 1  &|&  -1 & \frac{1}{3}  
  \end{array} \right].
  \]
- Finally we perform $R_1 \gets R_1 - 2R_2$ to obtain:
  \[
  \left[ 
  \begin{array}{ccccc}
  1 & 0  &|&  3  & -\frac{2}{3}  \nl
  0 & 1  &|&  -1 & \frac{1}{3}  
  \end{array} \right].
  \]

The inverse of $A$ can be found on the right-hand side of the above array: $A^{-1} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix}.$

The reason why this algorithm works is because we identify the sequence of row operations $\mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \ . \ ) )\ldots)$ with the inverse matrix $A^{-1}$ because for any vector $\vec{v}$ we have $\vec{w}=A\vec{v} \quad \Rightarrow \quad \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) = \vec{v}.$ The sequence of row operations has the same effect as the inverse operation $A^{-1}$ . The right half in the above array is used to recorded the cumulative effect of all the row operations. In order to understand why this is possible we must learn a little more about the row operations and discuss their connection with elementary matrices.

Using elementary matrices

Each of the above row operations $\mathcal{R}_i$ can be represented as a matrix product with by an elementary matrix $E_{\mathcal{R}}$ from the left: $\vec{y} = \mathcal{R}_i(\vec{x}) \qquad \Leftrightarrow \qquad \vec{y} = E_{\mathcal{R}}\vec{x}.$ Applying all the operations $\mathcal{R}_1,\mathcal{R}_2,\ldots$ needed to transform the matrix $A$ into the identity matrix corresponds to a repeated product: $A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) \quad \Leftrightarrow \quad \vec{w} = E_{k}\cdots E_{2}E_{1}\vec{w} = (E_{k}\cdots E_{2}E_{1})\vec{w}.$

Thus we have obtained an expression for the inverse $A^{-1}$ as a product of elementary matrices: $A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) = E_{k}\cdots E_{2}E_{1} \vec{w}.$

There are three types of elementary matrices in correspondence with the three row operations we are allowed to use when transforming a matrix to its RREF form. We illustrate them here, with examples from the $2 \times 2$ case:

Adding $m$ times row two to row one: $\mathcal{R}_\alpha:R_1 \gets R_1 +m R_2$

corresponds to the matrix:

  \[
   E_\alpha = 
   \begin{bmatrix}
    1 & m \nl
    0 & 1 
    \end{bmatrix}.
  \]
* Swap rows one and two: $\mathcal{R}_\beta:R_1 \leftrightarrow R_2$
  is the matrix:
  \[
   E_\beta = 
   \begin{bmatrix}
    0 & 1 \nl
    1 & 0 
    \end{bmatrix}.
  \]
* Multiply row one by a constant $m$: $\mathcal{R}_\gamma:R_1 \gets m R_1$ is
  \[
   E_\gamma = 
   \begin{bmatrix}
    m & 0 \nl
    0 & 1 
    \end{bmatrix}.
  \]

We will now illustrate the formula $A^{-1}=E_{k}\cdots E_{2}E_{1}$ on the matrix $A$ which we discussed above: $A = \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix}.$ Recall the row operations we had to apply in order to transform it to the identity were:

$\mathcal{R}_1$ : $R_2 \gets R_2 -3R_1$ .
$\mathcal{R}_2$ : $R_2 \gets \frac{1}{3}R_2$ .
$\mathcal{R}_3$ : $R_1 \gets R_1 - 2R_2$ .

We now revisit the these steps performing each row operation using multiplication on the left by the elementary matrix:

The first step, $R_2 \gets R_2 -3R_1$ , corresponds to:

$\begin{bmatrix} 1 & 0 \nl -3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix} = E_1 A = \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix}$

The second step is $R_2 \gets \frac{1}{3}R_2$ :

$\begin{bmatrix} 1 & 0 \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix} = E_2 E_1 A = \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix}.$

The final step is $R_1 \gets R_1 - 2R_2$ :

$\begin{bmatrix} 1 & -2 \nl 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix} = E_3E_2E_1 A = \begin{bmatrix} 1 & 0 \nl 0 & 1 \end{bmatrix} = I$

Therefore we have the formula: $A^{-1} = E_3E_2E_1 = \begin{bmatrix} 1 & -2 \nl 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 1 & 0 \nl -3 & 1 \end{bmatrix} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix}\!.$ Verify that this gives the correct $A^{-1}$ by carrying out the matrix products.

Note also that $A=(A^{-1})^{-1}=(E_3E_2E_1)^{-1}=E_1^{-1}E_2^{-1}E_3^{-1}$ , which means that we can write $A$ as a product of elementary matrices: $A = E_1^{-1}E_2^{-1}E_3^{-1} = \begin{bmatrix} 1 & 0 \nl 3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \nl 0 & 3 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix}.$ Note how the inverses of the elementary matrices are trivial to compute: they simply correspond to the opposite operations.

The elementary matrix approach teaches us that every invertible matrix $A$ can be decomposed as the product of elementary matrices. Furthermore, the inverse matrix $A^{-1}$ consists of the inverses of the elementary matrices that make up $A$ (in the reverse order).

By inspection

Sometimes it is possible to find the matrix inverse $A^{-1}$ by looking at the structure of the matrix $A$ . For example, if we have the matrix $A = O\Lambda O^T$ , where $\Lambda$ is a diagonal matrix, and $O$ is an orthogonal matrix ( $O^{-1}=O^T$ ). Then $A^{-1} = O \Lambda^{-1} O^T$ and since $\Lambda$ is diagonal, it is easy to compute its inverse. One can verify that we have $AA^{-1} = O\Lambda O^T O \Lambda^{-1} O^T = O\Lambda \Lambda^{-1} O^T = O I O^T = I$ as required.

Using a computer

Every computer algebra system like Maple, MATLAB, Octave or Mathematica will provide a way to specify matrices and a function for computing the matrix inverse. In Python you can use the matrices from sympy.matrices.Matrix or the matrices from numpy.mat to define the a matrix objects. Let us illustrate the two approaches below.

You should sympy whenever you can are solving simple problems because it will perform the calculations symbolically and tell you the exact fractions in the answer:

>>> from sympy.matrices import Matrix
>>> A = Matrix( [ [1,2], [3,4] ] )       # define a Matrix object 
    [1, 2]
    [3, 4]
>>> A.inv()                              # call the inv method on A
    [ -2,    1]
    [3/2, -1/2]

Note how we defined the matrix as a list $[ ]$ of rows, each row also being represented as a list $[ ]$ .

The notation for matrices as lists of lists is very tedious to use for practical calculations. Imagine you had a matrix three columns and ten rows – you would have to write a lot of square brackets! There is another convention for specifying matrices which is more convenient. If you have access to numpy on your computer, you can specify matrices in the alternate notation

>>> import numpy
>>> M = numpy.mat('1 2; 3 9')
    matrix([[1, 2],
            [3, 4]])

The matrix is specified as a string in which the rows of the matrix are separated by a semicolon ;. Now that you have a numpy matrix object, you can compute its inverse as follows:

>>> M.I
    matrix([[ 3.        , -0.66666667],
            [-1.        ,  0.33333333]])
  
>>> # or equivalently using
>>> numpy.linalg.inv(M) 
    matrix([[ 3.        , -0.66666667],
            [-1.        ,  0.33333333]])

Note that the numpy inverse algorithm is based on floating point numbers which have finite precision. Floating point calculations can be very precise, but they are not exact: $0.\underbrace{33333333333 \ldots 33333}_{ n \textrm{ digits of precision} } \neq \frac{1}{3}.$ To represent $\frac{1}{3}$ exactly, you would need an infinitely long decimal expansion which is not possible using floating point numbers.

We can build a sympy.matrices.Matrix by supplying a numpy.mat matrix as an input:

>>> A = Matrix( numpy.mat('1 2; 3 9') ) 
>>> A.inv()
    [ 3, -2/3]
    [-1,  1/3]

We have combined the compact numpy.mat notation “1 2; 3 9” for specifying matrices combined with the the symbolic (exact) inverse algorithm that sympy provides. Thus, we have best of both worlds.

Discussion

In terms of finding the inverse of a matrix using pen and paper (like on a final exam, for example), I would recommend the $RREF$ algorithm the most: $[ \;A\; | \ I\: \ ] \qquad - \ \textrm{RREF} \to \qquad [ \ \:I\ | \; A^{-1} \;],$ unless of course you have a $2 \times 2$ matrix, in which case the formula is easier to use.