linear_algebra:introduction

minireference.com/linear_algebra/introduction

The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).

Introduction to linear algebra

Introduction to linear algebra

Linear algebra is the math of vectors and matrices. A vector $\vec{v} \in \mathbb{R}^n$ is an array of $n$ numbers. For example, a three-dimensional vector is a triple of the form: \[ \vec{v} = (v_1,v_2,v_3) \ \in \ (\mathbb{R},\mathbb{R},\mathbb{R}) \equiv \mathbb{R}^3. \] To specify the vector $\vec{v}$, we need to specify the values for its three components $v_1$, $v_2$ and $v_3$.

A matrix $M \in \mathbb{R}^{m\times n}$ is an table of numbers with $m$ rows and $n$ columns. Consider as an example the following $3\times 3$ matrix: \[ A = \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13} \nl a_{21} & a_{22} & a_{23} \nl a_{31} & a_{32} & a_{33} \end{array}\right] \ \in \ \left[\begin{array}{ccc} \mathbb{R} & \mathbb{R} & \mathbb{R} \nl \mathbb{R} & \mathbb{R} & \mathbb{R} \nl \mathbb{R} & \mathbb{R} & \mathbb{R} \end{array}\right] \equiv \mathbb{R}^{3\times 3}. \] To specify the matrix $A$ we need to specify the values of its nine components $a_{11}$, $a_{12}$, $\ldots$, $a_{33}$.

We will study the mathematical operations that we can performed on vectors and matrices and their applications. Many problems in science, business and technology are described naturally in terms of vectors and matrices so it is important for you to understand how to work with these things.

Context

To illustrate what is new about vectors and matrices, let us review the properties of something old and familiar: the real numbers $\mathbb{R}$. The basic operations on numbers are:

addition (denoted $+$)
subtraction, the inverse of addition (denoted $-$)
multiplication (denoted $\times$ or implicit)
division, the inverse of multiplication

(denoted $\div$ or as a fraction) You have been using these operations all your life, so you know how to use these operations when solving equations.

You also know about functions $f: \mathbb{R} \to \mathbb{R}$, which take real numbers as inputs and give real numbers as outputs. Recall that the inverse function of $f$ is defined as the function $f^{-1}$ which undoes the effect of $f$ to get back the original input variable: \[ f^{-1}\left( f(x) \right)=x. \] For example when $f(x)=\ln(x)$, $f^{-1}(x)=e^x$ and given $g(x)=\sqrt{x}$, the inverse is $g^{-1}(x)=x^2$.

Vectors $\vec{v}$ and matrices $A$ are the new objects of study, so our first step should be to similarly define the basic operations which we can perform on them.

For vectors we have the following operations:

addition (denoted $+$)
subtraction, the inverse of addition (denoted $-$)
dot product (denoted $\cdot$)
cross product (denoted $\times$)

For matrices we have the following operations:

addition (denoted $+$)
subtraction, the inverse of addition (denoted $-$)
matrix product (implicitly denoted, e.g. $AB$).

The matrix-matrix product includes the matrix-vector products $A\vec{x}$ as a special case.

matrix inverse (denoted $A^{-1}$)
matrix trace (denoted $\textrm{Tr}(A)$)
matrix determinant (denoted $\textrm{det}(A)$ or $|A|$)

Matrix-vector product

The matrix-vector product $A\vec{x}$ is a linear combination of the columns of the matrix $A$. For example, consider the product of a $3 \times 2$ matrix $A$ and $2 \times 1$ vector $\vec{x}$. The output of the product $A\vec{x}$ will be denoted $\vec{y}$ and is $3 \times 1$ vector given by: \[ \begin{align*} \vec{y} &= A \vec{x}, \nl \begin{bmatrix} y_1 \nl y_2 \nl y_3 \end{bmatrix} & = \begin{bmatrix} a_{11} & a_{12} \nl a_{21} & a_{22} \nl a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = x_1\! \begin{bmatrix} a_{11} \nl a_{21} \nl a_{31} \end{bmatrix} + x_2\! \begin{bmatrix} a_{12} \nl a_{22} \nl a_{32} \end{bmatrix} = \begin{bmatrix} x_1a_{11} + x_2a_{12} \nl x_1a_{21} + x_2a_{22} \nl x_1a_{31} + x_2a_{32} \end{bmatrix}. \end{align*} \] The key thing to observe in the above formula is the new notion of product for matrices as linear combinations of their columns. We have $\vec{y}=A\vec{x}=x_1A_{[:,1]} + x_2A_{[:,2]}$ where $A_{[:,1]}$ and $A_{[:,2]}$ are the first and second columns of $A$.

Linear combinations as matrix products

Consider now some set of vectors $\{ \vec{e}_1, \vec{e}_2 \}$ and a third vector $\vec{y}$ which is a linear combination of the vectors $\vec{e}_1$ and $\vec{e}_2$: \[ \vec{y} = \alpha \vec{e}_1 \ + \ \beta \vec{e}_2. \] The numbers $\alpha, \beta \in \mathbb{R}$ are called coefficients of the linear combination.

The matrix-vector product is defined expressly for the purpose of studying linear combinations. We can describe the above linear combination as the following matrix-vector product: \[ \vec{y} = \begin{bmatrix} | & | \nl \vec{e}_1 & \vec{e}_2 \nl | & | \end{bmatrix} \begin{bmatrix} \alpha \nl \beta \end{bmatrix} = E\vec{x}. \] The matrix $E$ has $\vec{e}_1$ and $\vec{e}_2$ as columns. The dimensions of the matrix $E$ will be $d \times 2$, where $d$ is the dimension of the vectors $\vec{e}_1$, $\vec{e}_2$ and $\vec{y}$.

Matrices as vector functions

OK, my dear readers we have now reached the key notion in the study of linear algebra. One could even say the main idea.

I know you are ready to handle it because you are now familiar with functions of a real variable $f:\mathbb{R} \to \mathbb{R}$, and you just saw the definition of the matrix-vector product in which the variables were chosen to subliminally remind you of the standard convention for calling the function input $x$ and the function output $y=f(x)$. Without further ado, I present to you: the notion of a vector function, which is also known as a linear transformation.

Multiplication by a matrix $A \in \mathbb{R}^{m \times n}$ can be thought of as computing a vector function of the form: \[ T_A:\mathbb{R}^n \to \mathbb{R}^m, \] which take as input $n$-vectors and gives $m$-vectors as outputs. Instead of writing $T_A(\vec{x})=\vec{y}$ for the vector function $T_A$ applied to the vector $\vec{x}$ we can simply write $A\vec{x}=\vec{y}$ where the “application of function $T_A$” corresponds to the product of the matrix $A$ and the vector $\vec{x}$.

When the matrix $A\in \mathbb{R}^{n \times n}$ is invertible, there exists an inverse matrix $A^{-1}$ which undoes the effect of $A$ to give back the original input vector: \[ A^{-1}\!\left( A(\vec{x}) \right)=A^{-1}A\vec{x}=\vec{x}. \]

For example, the transformation which multiplies the first components of input vectors by $3$ and multiplies the second components by $5$ is described by the matrix \[ A = \begin{bmatrix} 3 & 0 \nl 0 & 5 \end{bmatrix}\!, \ \qquad A(\vec{x})= \begin{bmatrix} 3 & 0 \nl 0 & 5 \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = \begin{bmatrix} 3x_1 \nl 5x_2 \end{bmatrix}. \] Its inverse is \[ A^{-1} = \begin{bmatrix} \frac{1}{3} & 0 \nl 0 & \frac{1}{5} \end{bmatrix}, \ \qquad A^{-1}\!\left( A(\vec{x}) \right)= \begin{bmatrix} \frac{1}{3} & 0 \nl 0 & \frac{1}{5} \end{bmatrix} \begin{bmatrix} 3x_1 \nl 5x_2 \end{bmatrix} = \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} =\vec{x}. \] Note how the inverse matrix corresponds to the multiplication of the first component by $\frac{1}{3}$ and the second component by $\frac{1}{5}$, which has the effect of undoing the action of $A$.

Things get a little more complicated when matrices mix the different coefficients of the input vector as in the following example: \[ B = \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix}, \ \qquad \text{which acts as } \ \ B(\vec{x})= \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = \begin{bmatrix} x_1 +2x_2 \nl 3x_2 \end{bmatrix}. \] To understand the output of the matrix $B$ on the vector $\vec{x}$, you must recall the definition of the matrix-vector product.

The inverse of the matrix $B$ is the matrix \[ B^{-1} = \begin{bmatrix} 1 & \frac{-2}{3} \nl 0 & \frac{1}{3} \end{bmatrix}. \] Multiplication by the matrix $B^{-1}$ is the “undo action” for the multiplication by $B$: \[ B^{-1}\!\left( B(\vec{x}) \right)= \begin{bmatrix} 1 & \frac{-2}{3} \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 1 & 2 \nl 0 & 3 \end{bmatrix} \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} = \begin{bmatrix} 1 & \frac{-2}{3} \nl 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} x_1 +2x_2 \nl 3x_2 \end{bmatrix} = \begin{bmatrix} x_1 \nl x_2 \end{bmatrix} =\vec{x}. \]

We will discuss matrix inverses and how to compute them in more detail later, but for now it is important that you know that they exist and you know what they do. By definition, the inverse matrix $A^{-1}$ undoes the effects of the matrix $A$: \[ A^{-1}A\vec{x} =\mathbb{I}\vec{x} =\vec{x} \qquad \Rightarrow \qquad A^{-1}A = \begin{bmatrix} 1 & 0 \nl 0 & 1 \end{bmatrix}= \mathbb{I}. \] The cumulative effect of applying $A$ and $A^{-1}$ is an identity matrix, which has ones on the diagonal and zeros everywhere else.

An analogy

You can think of linear transformations as “vector functions” and describe their properties in analogy with the regular functions you are familiar with. The action of a function on a number is similar to the action of a matrix on a vector: \[ \begin{align*} \textrm{function } f:\mathbb{R}\to \mathbb{R} & \ \Leftrightarrow \! \begin{array}{l} \textrm{linear transformation } T_A:\mathbb{R}^{n}\! \to \mathbb{R}^{m} \end{array} \nl % \textrm{input } x\in \mathbb{R} & \ \Leftrightarrow \ \textrm{input } \vec{x} \in \mathbb{R}^n \nl %\textrm{compute } \textrm{output } f(x) & \ \Leftrightarrow \ % \textrm{compute matrix-vector product } \textrm{output } T_A(\vec{x})=A\vec{x} \in \mathbb{R}^m \nl %\textrm{function composition } g\circ f \! = \! g(f(x)) & \ \Leftrightarrow \ % \textrm{matrix product } T_B(T_A(\vec{x})) = BA \vec{x} \nl \textrm{function inverse } f^{-1} & \ \Leftrightarrow \ \textrm{matrix inverse } A^{-1} \nl \textrm{zeros of } f & \ \Leftrightarrow \ \mathcal{N}(A) \equiv \textrm{null space of } A \nl \textrm{range of } f & \ \Leftrightarrow \ \begin{array}{l} \mathcal{C}(A) \equiv \textrm{column space of } A =\textrm{range of } T_A \end{array} \end{align*} \]

The end goal of this book is to develop your intuition about vectors, matrices, and linear transformations. Our journey towards this goal will take us through many interesting new concepts along the way. We will develop new computational techniques and learn new ways of thinking that will open many doors for understanding science. Let us look in a little more detail at what lies ahead in the book.

Computational linear algebra

The first steps towards understanding linear algebra will be quite tedious. You have to develop the basic skills for manipulating vectors and matrices. Matrices and vectors have many entries and performing operations on them will involve a lot of arithmetic steps—there is no way to circumvent this complexity. Make sure you understand the basic algebra rules: how to add, subtract and multiply vectors and matrices, because they are a prerequisite for learning about the cool stuff later on.

The good news is that, except for the homework assignments and the problems on your final exam, you will not have to do matrix algebra by hand. In the real world, we use computers to take care of the tedious calculations, but that doesn't mean that you should not learn how to perform matrix algebra. The more you develop your matrix algebra intuition, the deeper you will be able to go into the advanced material.

Geometrical linear algebra

So far we described vectors and matrices as arrays of numbers. This is fine for the purpose of doing algebra on vectors and matrices, but it is not sufficient to understand their geometrical properties. The components of a vector $\vec{v} \in \mathbb{R}^n$ can be thought of as measuring distances along a coordinate system with $n$ axes. The vector $\vec{v}$ can therefore be said to “point” in a particular direction with respect to the coordinate system. The fun part of linear algebra starts when you learn about the geometrical interpretation of each of the algebraic operations on vectors and matrices.

Consider some unit length vector that specifies a direction of interest $\hat{r}$. Suppose we are given some other vector $\vec{v}$, and we are asked to find how much of $\vec{v}$ is in the $\hat{r}$ direction. The answer is computed using the dot product: $v_r = \vec{v} \cdot \hat{r} = \|\vec{v}\|\cos\theta$, where $\theta$ is the angle between $\vec{v}$ and $\hat{r}$. The technical term for the quantity $v_r$ is “the projection of $\vec{v}$ in the $\hat{r}$ direction.” By projection we mean that we ignore all parts of $\vec{v}$ that are not in the $\hat{r}$ direction. Projections are used in mechanics to calculate the $x$ and $y$ components of forces in force diagrams. In Chapter~\ref{chapter:geometrical_linear_algebra} we'll learn how to think intuitively about projections in terms of dot products.

TODO: check above reference is OK

As another example of the geometrical aspect of vector operations, consider the following situation. Suppose I give you two vectors $\vec{u}$ and $\vec{v}$ and I ask you to find a third vector $\vec{w}$ that is perpendicular to both $\vec{u}$ and $\vec{v}$. A priori this sounds like a complicated question to answer, but in fact the required vector $\vec{w}$ can easily be obtained by computing the cross product $\vec{w}=\vec{u}\times\vec{v}$.

You will also learn how to describe lines and planes in space using vectors. Given the equations of two lines (or planes), there is a procedure for finding their solution, that is, the point (or line) where they intercept.

The determinant of a matrix also carries geometrical interpretation. It tells you something about the relative orientation of the vectors that make up the rows of the matrix. If the determinant of a matrix is zero, it means that the rows are not linearly independent—at least one of the rows can be written in terms of the other rows. Linear independence, as we will learn shortly, is an important property for vectors to have and the determinant is a convenient way to test whether a set of vectors has this property.

It is really important that you try to visualize every new concept you learn about. You should always keep a picture in your head of what is going on. The relationships between two-dimensional vectors can easily be drawn on paper, while three-dimensional vectors can be visualized by pointing pens and pencils in different directions. Though our ability to draw and visualize only extends up to three dimensions, the notion of a vector does not stop there. We could have four-dimensional vectors $\mathbb{R}^4$ or even ten-dimensional vectors $\mathbb{R}^{10}$. All the intuition you build-up in two and three dimensions is still applicable to vectors with more dimensions.

Theoretical linear algebra

The most important aspects of linear algebra is that you will learn how to reason about vectors and matrices in a very abstract way. By thinking abstractly, you will be able to extend your geometrical intuition for two and three-dimensional problems to problems in higher dimensions. A lot of knowledge buzz awaits you as you learn about new concepts, pick up new computational skills and develop new ways of thinking.

You are probably familiar with the normal coordinate system made up of two orthogonal axes: the $x$ axis and the $y$ axis. A vector $\vec{v}$ can be specified in terms of their coordinates $(v_x,v_y)$ with respect to these axes, that is we can write down any vector $\vec{v} \in \mathbb{R}^2$ as $\vec{v} = v_x \hat{\imath} + v_y \hat{\jmath}$, where $\hat{\imath}$ and $\hat{\jmath}$ are unit vectors that point along the $x$ and $y$ axis respectively. It turns out that we can use many other kinds of coordinate systems in order to represent vectors. A basis for $\mathbb{R}^2$ is any set of two vectors $\{ \hat{e}_1, \hat{e}_2 \}$ that allows us to write all vectors $\vec{v} \in \mathbb{R}^2$ as a linear combination of the basis vectors $\vec{v} = v_1 \hat{e}_1 + v_2 \hat{e}_2$. The same vector $\vec{v}$ corresponds to two different coordinate pairs depending on which basis is used for the description: $\vec{v}=(v_x,v_y)$ in the basis $\{ \hat{\imath}, \hat{\jmath}\}$ and $\vec{v}=(v_1,v_2)$ in the $\{ \hat{e}_1, \hat{e}_2 \}$ basis. We will bases and their properties in great detail in the coming chapters.

The notions of eigenvalues and eigenvectors for matrices will allow you to describe their actions in the most natural way. The set of eigenvectors of a matrix is a special set of input vectors for which the action of the matrix is described as a scaling. When a matrix is multiplied by one of its eigenvectors the output is a vector in the same direction scaled by a constant, which we call an eigenvalue. Thinking of matrices in term of their eigenvalues and eigenvectors is a very powerful technique for describing their properties.

In the above text I explained that computing the product between a matrix and a vector $A\vec{x}=\vec{y}$ can be thought of as vector function, with input $\vec{x}$ and output $\vec{y}$. More specifically, we say that any linear transformation can be represented as a multiplication by a matrix $A$. Indeed, each $m\times n$ matrix $A \in \mathbb{R}^{m\times n}$ can be thought of as some linear transformation (vector function): $T_A \colon \mathbb{R}^n \to \mathbb{R}^m$. This relationship between matrices and linear transformations will allow us to identify certain matrix properties as properties of the corresponding linear transformations. For example, the column space of a matrix $A$ (the set of vectors that can be written as a combination of the columns of the matrix) corresponds to the image space $\textrm{Im}(T_A)$ (the set of possible outputs of the transformation $T_A$).

Part of what makes linear algebra so powerful is that linear algebra techniques can be applied to all kinds of “vector-like” objects. The abstract concept of a vector space captures precisely what it means for some class of mathematical objects to be “vector-like”. For example, the set of polynomials of degree at most two $P_2(x)$, which consists of all functions of the form $f(x)=a_0 + a_1x + a_2x^2$ is “vector like” because it is possible to describe each polynomial in terms of its coefficients $(a_0,a_1,a_2)$. Furthermore, the sum of two polynomials and the multiplication of a polynomial by a constant both correspond to vector-like calculations on their coefficients. This means that we can use concepts from linear algebra like linear independence, dimension and basis when dealing with polynomials.

Useful linear algebra

One of the most useful skills you will learn in linear algebra is the ability to solve systems of linear equations. Many real world problems can be expressed as linear relationships between multiple unknown quantities. To find these unknowns you will often have to solve $n$ equations in $n$ unknowns. You can use basic techniques such as substitution, elimination and subtraction to solve these equations, but the procedure will be very slow and tedious. If the system of equations is linear, then it can be expressed as an augmented matrix build from the coefficients in the equations. You can then use the Gauss-Jordan elimination algorithm to solve for the $n$ unknowns. The key benefit of this approach is that it allows you to focus on the coefficients and not worry about the variable names. This saves a lot of time when you have to solve many equations with many unknowns. Another approach for solving systems of equations is to express it as a matrix equation and then solve the matrix equation by computing the matrix inverse.

You will also learn how to decompose a matrix into a product of simpler matrices in various ways. Matrix decompositions are often performed for computational reasons: certain problems are easier to solve on a computer when the matrix is expressed in terms of its simpler constituents.

Other decompositions, like the decomposition of a matrix into its eigenvalues and eigenvectors, give you valuable insights into the properties of the matrix. Google's original PageRank algorithm for ranking webpages by importance can be formalized as the search for an eigenvector of a matrix. The matrix in question contains the information about all the hyperlinks that exist between webpages. The eigenvector we are looking for corresponds to a vector which tells you the relative importance of each page. So when I say that learning about eigenvectors is valuable, I am not kidding: a 300 billion dollar company was build starting from an eigenvector idea.

The techniques of linear algebra find application in many areas of science and technology. We will discuss applications such as finding approximate solutions (curve fitting), modelling of real-world problems, and constrained optimization problems using linear programming.

Discussion

In terms of difficulty of the content, I would say that you should get ready for some serious uphills. As your personal “mountain guide” to the “mountain” of linear algebra, it is my obligation to warn you about the difficulties that lie ahead so that you will be mentally prepared.

The computational aspects will be difficult in a boring and repetitive kind of way as you have to go through thousands of steps where you multiply things together and add up the results. The theoretical aspects will be difficult in a very different kind of way: you will learn about various theoretical properties of vectors, matrices and operations and how to use these properties to prove things. This is what real math is like, using axioms and basic facts about the mathematical objects in order to prove statements.

In summary, a lot of work and toil awaits you as you learn about the concepts from linear algebra, but the effort is definitely worth it. All the effort you put into understanding vectors and matrices will lead to mind-expanding insights. You will reap the benefits of your effort for the rest of your life; understanding linear algebra will open many doors for you.

Links

[ Wikibook on the subject (for additional reading) ]
http://en.wikibooks.org/wiki/Linear_Algebra

NOINDENT [ Wikipedia overview on matrices ]
http://en.wikipedia.org/wiki/Matrix_(mathematics)

NOINDENT [ List of applications of linear algebra ]
http://aix1.uottawa.ca/~jkhoury/app.htm