Recall that the problem of solving a system of linear equations \[ \begin{align*} x_1 + 2x_2 & = 5, \nl 3x_2 + 9x_2 & = 21, \end{align*} \] can be written in the form of a matrix-times-vector product: \[ \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = \begin{bmatrix} 5 \nl 21 \end{bmatrix}, \] or more compactly as \[ A\vec{x}=\vec{b}. \] Here $A$ is a $2 \times 2$ matrix, $\vec{x}$ is the vector of unknowns (a $2 \times 1$ matrix), and $\vec{b}$ is a vector of constants (a $2 \times 1$ matrix).
Consider now the matrix equation which corresponds the system of linear equations. We can solve this equation for $\vec{x}$ by multiplying (from the left) both sides of the equation by the inverse $A^{-1}$. We obtain: \[ A^{-1} A \vec{x} = I \vec{x} = \vec{x} = A^{-1}\vec{b}. \] Thus, solving a system of linear equations is equivalent to finding the inverse of the matrix of coefficients and then computing the product: \[ \vec{x} = \begin{bmatrix}x_1 \nl x_2 \end{bmatrix} = A^{-1} \vec{b} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix} \begin{bmatrix}5 \nl 21 \end{bmatrix} = \begin{bmatrix}1 \nl 2 \end{bmatrix}. \]
As you can see computing the inverse of matrices is a pretty useful skill to have. In this section, we will learn about several approaches for computing the inverse of a matrix. Note that the matrix inverse is unique so no matter which method you use to find the inverse, you will always get the same answer. Knowing this is very useful because you can verify that your calculations are correct by computing the inverse in two different ways.
Not all matrices can be inverted. Given any matrix $A \in \mathbb{R}^{n \times n }$ we can check whether $A$ is invertible or not by computing the determinant of $A$: \[ A^{-1} \ \textrm{ exists if and only if } \ \textrm{det}(A) \neq 0. \]
The inverse of a $2\times2$ matrix can be computed as follows: \[ \left[ \begin{array}{cc} a&b\nl c&d\end{array} \right]^{-1}=\frac{1}{ad-bc} \left[ \begin{array}{cc} d&-b\nl -c&a \end{array}\right]. \]
This is the $2 \times 2$ version of a general formula for obtaining the inverse based on the adjugate matrix: \[ A^{-1} = \frac{1}{ \textrm{det}(A) } \textrm{adj}(A). \] What is the adjugate you ask? It is kind of complicated, so we need to go step by step. We need to define a few prerequisite concepts before we can get to the adjugate matrix.
In what follows we will work on a matrix $A \in \mathbb{R}^{n \times n}$ and refer to the its entries as $a_{ij}$, where $i$ is the row index and $j$ is the column index as usual. We will illustrate the steps in the $3 \times 3$ case: \[ A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \nl a_{21} & a_{22} & a_{23} \nl a_{31} & a_{32} & a_{33} \end{pmatrix}. \]
We first need to define two new terms for dealing with determinants:
which is the determinant of the matrix that remains when
we remove row $i$ and the column $j$ from the matrix $A$. For example, the minor that corresponds to the entry $a_{12}$ is given by: \[ M_{12} = \left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33} \end{matrix} \right|. \] - For each entry $a_{ij}$ de define the //sign// of the entry to be: \[ \textrm{sign}(a_{ij}) = (-1)^{i+j}. \] - Define the //cofactor// $c_{ij}$ for each entry as the product of its sign and its minor: $c_{ij} =\textrm{sign}(a_{ij})M_{ij}$.
The above concepts should be familiar to you from the section on determinants. Indeed, we can now write down a precise formula for computing the determinant. The most common way to take a determinant, it to expand along the top row which gives the following formula: \[ \textrm{det}(A) = \sum_{j=1}^n a_{1j} \textrm{sign}(a_{1j}) M_{1j} = \sum_{j=1}^n a_{1j} c_{1j}. \] Of course, we could have chosen any other row or column to expand along. Taking the determinant along the first column is given by: \[ \textrm{det}(A) = \sum_{i=1}^n a_{i1} \textrm{sign}(a_{i1}) M_{i1} = \sum_{i=1}^n a_{i1} c_{i1}. \] Perhaps now you can see where the name cofactor comes from: the cofactor $c_{ij}$ is what multiplies the entry $a_{ij}$ in the determinant formula.
OK, let us get back to our description of the adjugate matrix. The adjugate of a matrix is defined as the transpose of the matrix of cofactors $C$. The matrix of cofactors is a matrix of the same dimensions as the original matrix $A$, which is build by replacing each entry $a_{ij}$ by its cofactor $c_{ij}$: \[ C = \begin{pmatrix} c_{11} & c_{12} & c_{13} \nl c_{21} & c_{22} & c_{23} \nl c_{31} & c_{32} & c_{33} \end{pmatrix} = \begin{pmatrix} +\left| \begin{matrix} a_{22} & a_{23} \nl a_{32} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{21} & a_{22} \nl a_{31} & a_{32} \end{matrix} \right| \nl & & \nl -\left| \begin{matrix} a_{12} & a_{13} \nl a_{32} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{13} \nl a_{31} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{12} \nl a_{31} & a_{32} \end{matrix} \right| \nl & & \nl +\left| \begin{matrix} a_{12} & a_{13} \nl a_{22} & a_{23} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{13} \nl a_{21} & a_{23} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{12} \nl a_{21} & a_{22} \end{matrix} \right| \end{pmatrix}. \]
So to compute $\textrm{adj}(A)$ we simply take the transpose of $C$. Combining all of the above steps into the formula for the inverse $A^{-1} = \frac{1}{ \textrm{det}(A) } \textrm{adj}(A)= \frac{1}{ \textrm{det}(A) } C^T$ we obtain the final formula: \[ A^{-1} = \frac{1}{ \left|\begin{matrix} a_{11} & a_{12} & a_{13} \nl a_{21} & a_{22} & a_{23} \nl a_{31} & a_{32} & a_{33} \end{matrix} \right|} \begin{pmatrix} +\left| \begin{matrix} a_{22} & a_{23} \nl a_{32} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{12} & a_{13} \nl a_{32} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{12} & a_{13} \nl a_{22} & a_{23} \end{matrix} \right| \nl & & \nl -\left| \begin{matrix} a_{21} & a_{23} \nl a_{31} & a_{33} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{13} \nl a_{31} & a_{33} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{13} \nl a_{21} & a_{23} \end{matrix} \right| \nl & & \nl +\left| \begin{matrix} a_{21} & a_{22} \nl a_{31} & a_{32} \end{matrix} \right| & -\left| \begin{matrix} a_{11} & a_{12} \nl a_{31} & a_{32} \end{matrix} \right| & +\left| \begin{matrix} a_{11} & a_{12} \nl a_{21} & a_{22} \end{matrix} \right| \end{pmatrix}. \]
I know this is very complicated, but I had to show you. In practice you will rarely have to compute this by hand, and use a computer instead.
Another way to obtain the inverse of a matrix is to record all the row operations $\mathcal{R}_1,\mathcal{R}_2,\ldots$ needed to transform the matrix $A$ into the identity matrix: \[ \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( A ) )\ldots) = I = A^{-1}A. \] Recall that the matrix $A$ can be thought of “doing” something to vectors. The identity operation corresponds to multiplication by the identity matrix $I\vec{v}=\vec{v}$. The above formula is an operational definition of the inverse $A^{-1}$ as the set of operations needed to “undo” the actions of $A$: \[ A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots). \]
This way of finding the inverse $A^{-1}$ may sound waaaaay too complicated to ever be useful. It would be if it weren't for the existence of a very neat trick for recording the row operations $\mathcal{R}_1$, $\mathcal{R}_2$,$\ldots$,$\mathcal{R}_k$.
We initialize an $n \times 2n$ array with the entries of the matrix $A$ on the left side and the identity matrix on the right-hand side: \[ [\;A\; | \ I\:\ ]. \] If you perform the RREF algorithm on this array (Gauss–Jordan elimination), you will end up with the inverse $A^{-1}$ on the right-hand side of the array: \[ [ \ \:I\ | \; A^{-1} ]. \]
We now illustrate the procedure by computing the inverse the following matrix: \[ A = \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix}. \]
We start by writing the matrix $A$ next to the identity $I$ matrix: \[ \left[ \begin{array}{ccccc} 1 & 2 &|& 1 & 0 \nl 3 & 9 &|& 0 & 1 \end{array} \right]. \]
We now perform Gauss-Jordan elimination procedure on the resulting $2 \times 4$ matrix.
from the second row, or written compactly $R_2 \gets R_2 -3R_1$ to obtain:
\[ \left[ \begin{array}{ccccc} 1 & 2 &|& 1 & 0 \nl 0 & 3 &|& -3 & 1 \end{array} \right]. \] - Second we perform $R_2 \gets \frac{1}{3}R_2$ and get: \[ \left[ \begin{array}{ccccc} 1 & 2 &|& 1 & 0 \nl 0 & 1 &|& -1 & \frac{1}{3} \end{array} \right]. \] - Finally we perform $R_1 \gets R_1 - 2R_2$ to obtain: \[ \left[ \begin{array}{ccccc} 1 & 0 &|& 3 & -\frac{2}{3} \nl 0 & 1 &|& -1 & \frac{1}{3} \end{array} \right]. \]
The inverse of $A$ can be found on the right-hand side of the above array: \[ A^{-1} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix}. \]
The reason why this algorithm works is because we identify the sequence of row operations $\mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \ . \ ) )\ldots)$ with the inverse matrix $A^{-1}$ because for any vector $\vec{v}$ we have \[ \vec{w}=A\vec{v} \quad \Rightarrow \quad \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) = \vec{v}. \] The sequence of row operations has the same effect as the inverse operation $A^{-1}$. The right half in the above array is used to recorded the cumulative effect of all the row operations. In order to understand why this is possible we must learn a little more about the row operations and discuss their connection with elementary matrices.
Each of the above row operations $\mathcal{R}_i$ can be represented as a matrix product with by an elementary matrix $E_{\mathcal{R}}$ from the left: \[ \vec{y} = \mathcal{R}_i(\vec{x}) \qquad \Leftrightarrow \qquad \vec{y} = E_{\mathcal{R}}\vec{x}. \] Applying all the operations $\mathcal{R}_1,\mathcal{R}_2,\ldots$ needed to transform the matrix $A$ into the identity matrix corresponds to a repeated product: \[ A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) \quad \Leftrightarrow \quad \vec{w} = E_{k}\cdots E_{2}E_{1}\vec{w} = (E_{k}\cdots E_{2}E_{1})\vec{w}. \]
Thus we have obtained an expression for the inverse $A^{-1}$ as a product of elementary matrices: \[ A^{-1}\vec{w} = \mathcal{R}_k(\ldots \mathcal{R}_2( \mathcal{R}_1( \vec{w} ) )\ldots) = E_{k}\cdots E_{2}E_{1} \vec{w}. \]
There are three types of elementary matrices in correspondence with the three row operations we are allowed to use when transforming a matrix to its RREF form. We illustrate them here, with examples from the $2 \times 2$ case:
corresponds to the matrix:
\[ E_\alpha = \begin{bmatrix} 1 & m \nl 0 & 1 \end{bmatrix}. \] * Swap rows one and two: $\mathcal{R}_\beta:R_1 \leftrightarrow R_2$ is the matrix: \[ E_\beta = \begin{bmatrix} 0 & 1 \nl 1 & 0 \end{bmatrix}. \] * Multiply row one by a constant $m$: $\mathcal{R}_\gamma:R_1 \gets m R_1$ is \[ E_\gamma = \begin{bmatrix} m & 0 \nl 0 & 1 \end{bmatrix}. \]
We will now illustrate the formula $A^{-1}=E_{k}\cdots E_{2}E_{1}$ on the matrix $A$ which we discussed above: \[ A = \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix}. \] Recall the row operations we had to apply in order to transform it to the identity were:
We now revisit the these steps performing each row operation using multiplication on the left by the elementary matrix:
\[ \begin{bmatrix} 1 & 0 \nl -3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 3 & 9 \end{bmatrix} = E_1 A = \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix} \]
\[ \begin{bmatrix} 1 & 0 \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix} = E_2 E_1 A = \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix}. \]
\[ \begin{bmatrix} 1 & -2 \nl 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix} = E_3E_2E_1 A = \begin{bmatrix} 1 & 0 \nl 0 & 1 \end{bmatrix} = I \]
Therefore we have the formula: \[ A^{-1} = E_3E_2E_1 = \begin{bmatrix} 1 & -2 \nl 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 1 & 0 \nl -3 & 1 \end{bmatrix} = \begin{bmatrix} 3 & -\frac{2}{3} \nl -1 & \frac{1}{3} \end{bmatrix}\!. \] Verify that this gives the correct $A^{-1}$ by carrying out the matrix products.
Note also that $A=(A^{-1})^{-1}=(E_3E_2E_1)^{-1}=E_1^{-1}E_2^{-1}E_3^{-1}$, which means that we can write $A$ as a product of elementary matrices: \[ A = E_1^{-1}E_2^{-1}E_3^{-1} = \begin{bmatrix} 1 & 0 \nl 3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \nl 0 & 3 \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 1 \end{bmatrix}. \] Note how the inverses of the elementary matrices are trivial to compute: they simply correspond to the opposite operations.
The elementary matrix approach teaches us that every invertible matrix $A$ can be decomposed as the product of elementary matrices. Furthermore, the inverse matrix $A^{-1}$ consists of the inverses of the elementary matrices that make up $A$ (in the reverse order).
Sometimes it is possible to find the matrix inverse $A^{-1}$ by looking at the structure of the matrix $A$. For example, if we have the matrix $A = O\Lambda O^T$, where $\Lambda$ is a diagonal matrix, and $O$ is an orthogonal matrix ($O^{-1}=O^T$). Then $A^{-1} = O \Lambda^{-1} O^T$ and since $\Lambda$ is diagonal, it is easy to compute its inverse. One can verify that we have $AA^{-1} = O\Lambda O^T O \Lambda^{-1} O^T = O\Lambda \Lambda^{-1} O^T = O I O^T = I$ as required.
Every computer algebra system like Maple, MATLAB, Octave or Mathematica will provide
a way to specify matrices and a function for computing the matrix inverse.
In Python
you can use the matrices from sympy.matrices.Matrix
or the matrices from numpy.mat
to define the a matrix objects.
Let us illustrate the two approaches below.
You should sympy
whenever you can are solving simple problems because
it will perform the calculations symbolically and tell you the exact fractions in the answer:
>>> from sympy.matrices import Matrix >>> A = Matrix( [ [1,2], [3,4] ] ) # define a Matrix object [1, 2] [3, 4] >>> A.inv() # call the inv method on A [ -2, 1] [3/2, -1/2]
Note how we defined the matrix as a list $[ ]$ of rows, each row also being represented as a list $[ ]$.
The notation for matrices as lists of lists is very tedious to use
for practical calculations. Imagine you had a matrix three columns
and ten rows – you would have to write a lot of square brackets!
There is another convention for specifying matrices which is more convenient.
If you have access to numpy
on your computer, you can specify matrices
in the alternate notation
>>> import numpy >>> M = numpy.mat('1 2; 3 9') matrix([[1, 2], [3, 4]])
The matrix is specified as a string in which the rows of the
matrix are separated by a semicolon ;
.
Now that you have a numpy
matrix object, you can
compute its inverse as follows:
>>> M.I matrix([[ 3. , -0.66666667], [-1. , 0.33333333]]) >>> # or equivalently using >>> numpy.linalg.inv(M) matrix([[ 3. , -0.66666667], [-1. , 0.33333333]])
Note that the numpy
inverse algorithm is based on
floating point numbers which have finite precision.
Floating point calculations can be very precise,
but they are not exact:
\[
0.\underbrace{33333333333 \ldots 33333}_{ n \textrm{ digits of precision} }
\neq
\frac{1}{3}.
\]
To represent $\frac{1}{3}$ exactly, you would need an infinitely long decimal expansion
which is not possible using floating point numbers.
We can build a sympy.matrices.Matrix
by supplying a numpy.mat
matrix
as an input:
>>> A = Matrix( numpy.mat('1 2; 3 9') ) >>> A.inv() [ 3, -2/3] [-1, 1/3]
We have combined the compact numpy.mat
notation “1 2; 3 9” for specifying matrices
combined with the the symbolic (exact) inverse algorithm that sympy
provides.
Thus, we have best of both worlds.
In terms of finding the inverse of a matrix using pen and paper (like on a final exam, for example), I would recommend the $RREF$ algorithm the most: \[ [ \;A\; | \ I\: \ ] \qquad - \ \textrm{RREF} \to \qquad [ \ \:I\ | \; A^{-1} \;], \] unless of course you have a $2 \times 2$ matrix, in which case the formula is easier to use.
Compute $A^{-1}$ where \[ A = \begin{bmatrix} 1 & 1 \nl 1 & 2 \end{bmatrix} \qquad \textrm{Ans: } A^{-1} = \begin{bmatrix} 2 & -1 \nl -1 & 1 \end{bmatrix}. \]
Show that for an $n \times n$ invertible matrix $A$, we have: $\left| \textrm{adj}(A) \right| = \left(\left| A \right|\right)^{n-1}$. Hint: Recall that $\left| \alpha A \right|=\alpha^n \left| A \right|$.