The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).


Linear transformations

In this section we'll study functions that take vectors as inputs and produce vectors as outputs. In order to describe a function $T$ that takes $n$-dimensional vectors as inputs and produces $m$-dimensional vectors as outputs, we will use the notation: \[ T \colon \mathbb{R}^n \to \mathbb{R}^m. \] In particular, we'll restrict our attention to the class of linear transformations, which includes most of the useful transformations from analytic geometry: stretching, projections, reflections, and rotations. Linear transformations are used to describe and model many real-world phenomena in physics, chemistry, biology, and computer science.

Definitions

Linear transformation are mappings between vector inputs and vector outputs:

  • $V =\mathbb{R}^n$: an $n$-dimensional vector space

$V$ is just a nickname we give to $\mathbb{R}^n$, which is the input vector space of $T$.

  • $W = \mathbb{R}^m$: An $m$-dimensional vector space, which is the output space of $T$.
  • ${\rm dim}(U)$: the dimension of the vector space $U$
  • $T:V \to W$: a linear transformation that takes vectors $\vec{v} \in V$ as inputs

and produces outputs $\vec{w} \in W$. $T(\vec{v}) = \vec{w}$.

  • $\textrm{Im}(T)$: the image space of the linear transformation $T$ is the

set of vectors that $T$ can output for some input $\vec{v}\in V$.

  The mathematical definition of the image space is
  \[
    \textrm{Im}(T) 
     = \{ \vec{w} \in W \ | \ \vec{w}=T(\vec{v}), \textrm{ for some } \vec{v}\in V \}.
  \]
  The image space is the vector equivalent of the //image// of a function of a single variable
  which you are familiar with $\{ y \in \mathbb{R} \ | \ y=f(x), \textrm{ for some } x \in \mathbb{R} \}$.
* $\textrm{Null}(T)$: The //null space// of the linear transformation $T$. 
  This is the set of vectors that get mapped to the zero vector by $T$. 
  Mathematically we write:
  \[
    \textrm{Null}(T) \equiv \{\vec{v}\in V   \ | \  T(\vec{v}) = \vec{0} \},
  \]
  and we have $\textrm{Null}(T) \subseteq V$. 
  The null space is the vector equivalent of the set of //roots// of a function,
  i.e., the values of $x$ where $f(x)=0$.

If we fix bases for the input and the output spaces, then a linear transformation can be represented as a matrix product:

  • $B_V=\{ \vec{b}_1, \vec{b}_2, \ldots, \vec{b}_n\}$: A basis for the vector space $V$.

Any vector $\vec{v} \in V$ can be written as:

  \[
    \vec{v} = v_1 \vec{b}_1 + v_1 \vec{b}_1 + \cdots + v_n \vec{b}_n,
  \]
  where $v_1,v_2,\ldots,v_n$ are real numbers, which we call the 
  //coordinates of the vector $\vec{v}$ with respect to the basis $B_V$//.
* $B_W=\{\vec{c}_1, \vec{c}_2, \ldots, \vec{c}_m\}$: A basis for the output vector space $W$.
* $M_T \in \mathbb{R}^{m\times n}$: A matrix representation of the linear transformation $T$:
  \[
     \vec{w} = T(\vec{v})  \qquad \Leftrightarrow \qquad \vec{w} = M_T \vec{v}.
  \]
  Multiplication of the vector $\vec{v}$ by the matrix $M_T$ (from the left) 
  is //equivalent// to applying the linear transformation $T$.
  Note that the matrix representation $M_T$ is //with respect to// the bases $B_{V}$ and $B_{W}$.
  If we need to show the choice of input and output bases explicitly, 
  we will write them in subscripts $\;_{B_W}[M_T]_{B_V}$.
* $\mathcal{C}(M_T)$: The //column space// of a matrix $M_T$ consists of all possible linear
  combinations of the columns of the matrix $M_T$.
  Given $M_T$, the representation of some linear transformation $T$,
  the column space of $M_T$ is equal to the image space of $T$: 
  $\mathcal{C}(M_T) = \textrm{Im}(T)$.
* $\mathcal{N}(M_T)$: The //null space// a matrix $M_T$ is the set of
  vectors that the matrix $M_T$ sends to the zero vector:
  \[
    \mathcal{N}(M_T) \equiv \{ \vec{v} \in V \ | \ M_T\vec{v} = \vec{0} \}.
  \]
  The null space of $M_T$ is equal to the null space of $T$: 
  $\mathcal{N}(M_T) = \textrm{Null}(T)$.

Properties of linear transformation

Linearity

The fundamental property of a linear transformation is, you guessed it, its linearity. If $\vec{v}_1$ and $\vec{v}_2$ are two input vectors and $\alpha$ and $\beta$ are two constants, then: \[ T(\alpha\vec{v}_1+\beta\vec{v}_2)= \alpha T(\vec{v}_1)+\beta T(\vec{v}_2). \]

Transformations as black boxes

Suppose someone gives you a black box which implements the transformation $T$. You are not allowed to look inside the box and see how $T$ acts, but you are allowed to probe the transformation by choosing various input vectors and observing what comes out.

Suppose we have a linear transformation $T$ of the form $T \colon \mathbb{R}^n \to \mathbb{R}^m$. It turns out that probing this transformation with $n$ carefully chosen input vectors and observing the outputs is sufficient to characterize it completely!

To see why this is true, consider a basis $\{ \vec{v}_1, \vec{v}_2, \ldots , \vec{v}_n \}$ for the $n$-dimensional input space $V = \mathbb{R}^n$. Any input vector can be written as a linear combination of the basis vectors: \[ \vec{v} = \alpha_1 \vec{v}_1 + \alpha_2 \vec{v}_2 + \cdots + \alpha_n \vec{v}_n. \] In order to characterize $T$, all we have to do is input each of $n$ basis vectors $\vec{v}_i$ into the black box that implements $T$ and record the $T(\vec{v}_i)$ that comes out. Using these observations and the linearity of $T$ we can now predict the output of $T$ for arbitrary input vectors: \[ T(\vec{v}) = \alpha_1 T(\vec{v}_1) + \alpha_2 T(\vec{v}_2) + \cdots + \alpha_n T(\vec{v}_n). \]

This black box model can be used in many areas of science, and is perhaps one of the most important ideas in linear algebra. The transformation $T$ could be the description of a chemical process, an electrical circuit or some phenomenon in biology. So long as we know that $T$ is (or can be approximated by) a linear transformation, we can obtain a complete description by probing it with the a small number of inputs. This is in contrast to non-linear transformations which could correspond to arbitrarily complex input-output relationships and would require significantly more probing in order to characterize precisely.

Input and output spaces

We said that the transformation $T$ is a map from $n$-vectors to $m$-vectors: \[ T \colon \mathbb{R}^n \to \mathbb{R}^m. \] Mathematically, we say that the domain of the transformation $T$ is $\mathbb{R}^n$ and the codomain is $\mathbb{R}^m$. The image space $\textrm{Im}(T)$ consists of all the possible outputs that the transformation $T$ can have. In general $\textrm{Im}(T) \subseteq \mathbb{R}^m$. A transformation $T$ for which $\textrm{Im}(T)=\mathbb{R}^m$ is called onto or surjective.

Furthermore, we will identify the null space as the subspace of the domain $\mathbb{R}^n$ that gets mapped to the zero vector by $T$: $\textrm{Null}(T) \equiv \{\vec{v} \in \mathbb{R}^n \ | \ T(\vec{v}) = \vec{0} \}$.

Linear transformations as matrix multiplications

There is an important relationship between linear transformations and matrices. If you fix a basis for the input vector space and a basis for the output vector space, a linear transformation $T(\vec{v})=\vec{w}$ can be represented as matrix multiplication $M_T\vec{v}=\vec{w}$ for some matrix $M_T$.

We have the following equivalence: \[ \vec{w} = T(\vec{v}) \qquad \Leftrightarrow \qquad \vec{w} = M_T \vec{v}. \] Using this equivalence, we can re-interpret several of the fact we know about matrices as properties of linear transformations. The equivalence is useful in the other direction too since it allows us to use the language of linear transformations to talk about the properties of matrices.

The idea of representing the action of a linear transformation as a matrix product is extremely important since it allows us to transform the abstract description of what the transformation $T$ does into the practical description: “take the input vector $\vec{v}$ and multiply it on the left by a matrix $M_T$.”

We'll now illustrate the “linear transformation $\Leftrightarrow$ matrix” equivalence with an example. Define $T=\Pi_{P_{xy}}$ to be the orthogonal projection onto the $xy$-plane $P_{xy}$. In words, action of this projection is simply to “kill” the $z$-component of the input vector. The matrix that corresponds to this projection is \[ T(\:(v_x,v_y,v_z)\:) = (v_x,v_y,0) \qquad \Leftrightarrow \qquad M_{T}\vec{v} = \begin{bmatrix} 1 & 0 & 0 \nl 0 & 1 & 0 \nl 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} v_x \nl v_y \nl v_z \end{bmatrix} = \begin{bmatrix} v_x \nl v_y \nl 0 \end{bmatrix}. \]

Finding the matrix

In order to find the matrix representation of a the transformation $T \colon \mathbb{R}^n \to \mathbb{R}^m$, it is sufficient to “probe it” with the $n$ vectors in the standard basis for $\mathbb{R}^n$: \[ \hat{e}_1 \equiv \begin{bmatrix} 1 \nl 0 \nl \vdots \nl 0 \end{bmatrix} \!\!, \ \ \ \hat{e}_2 \equiv \begin{bmatrix} 0 \nl 1 \nl \vdots \nl 0 \end{bmatrix}\!\!, \ \ \ \ \ldots, \ \ \ \hat{e}_n \equiv \begin{bmatrix} 0 \nl \vdots \nl 0 \nl 1 \end{bmatrix}\!\!. \] To obtain $M_T$, we combine the outputs $T(\hat{e}_1)$, $T(\hat{e}_2)$, $\ldots$, $T(\hat{e}_n)$ as the columns of a matrix: \[ M_T = \begin{bmatrix} | & | & \mathbf{ } & | \nl T(\vec{e}_1) & T(\vec{e}_2) & \dots & T(\vec{e}_n) \nl | & | & \mathbf{ } & | \end{bmatrix}. \]

Observe that the matrix constructed in this way has the right dimensions: when multiplied by an $n$-vector on the left it will produce an $m$-vector. We have $M_T \in \mathbb{R}^{m \times n}$, since the outputs of $T$ are $m$-vectors and since we used $n$ “probe” vectors.

In order to help you visualize this new “column thing”, we can analyze the matrix product $M_T \hat{e}_2$. The probe vector $\hat{e}_2\equiv (0,1,0,\ldots,0)^T$ will “select” only the second column from $M_T$ and thus we will obtain the correct output: $M_T \hat{e}_2 = T(\hat{e}_2)$. Similarly, applying $M_T$ to the other basis vectors selects each of the columns of $M_T$.

Any input vector can be written as a linear combination of the standard basis vectors $\vec{v} = v_1 \hat{e}_1 + v_2 \hat{e}_2 + \cdots + v_n\hat{e}_n$. Therefore, by linearity, we can compute the output $T(\vec{v})$: \[ \begin{align*} T(\vec{v}) &= v_1 T(\hat{e}_1) + v_2 T(\hat{e}_2) + \cdots + v_n T(\hat{e}_n) \nl & = v_1\!\begin{bmatrix} | \nl T(\hat{e}_1) \nl | \end{bmatrix} + v_2\!\begin{bmatrix} | \nl T(\hat{e}_2) \nl | \end{bmatrix} + \cdots + v_n\!\begin{bmatrix} | \nl T(\hat{e}_n) \nl | \end{bmatrix} \nl & = \begin{bmatrix} | & | & \mathbf{ } & | \nl T(\vec{e}_1) & T(\vec{e}_2) & \dots & T(\vec{e}_n) \nl | & | & \mathbf{ } & | \end{bmatrix} \begin{bmatrix} | \nl \vec{v} \nl | \end{bmatrix} \nl & = M_T \vec{v}. \end{align*} \]

Input and output spaces

Observe that the outputs of $T$ consist of all possible linear combinations of the columns of the matrix $M_T$. Thus, we can identify the image space of the transformation $\textrm{Im}(T) = \{ \vec{w} \in W \ | \ \vec{w}=T(\vec{v}), \textrm{ for some } \vec{v}\in V \}$ and the column space $\mathcal{C}(M_T)$ of the matrix $M_T$.

Perhaps not surprisingly, there is also an equivalence between the null space of the transformation $T$ and the null space of the matrix $M_T$: \[ \textrm{Null}(T) \equiv \{\vec{v}\in \mathbb{R}^n | T(\vec{v}) = \vec{0} \} = \mathcal{N}(M_T) \equiv \{\vec{v}\in \mathbb{R}^n | M_T\vec{v} = \vec{0} \}. \]

The null space $\mathcal{N}(M_T)$ of a matrix consists of all vectors that are orthogonal to the rows of the matrix $M_T$. The vectors in the null space of $M_T$ have a zero dot product with each of the rows of $M_T$. This orthogonality can also be phrased in the opposite direction. Any vector in the row space $\mathcal{R}(M_T)$ of the matrix is orthogonal to the null space $\mathcal{N}(M_T)$ of the matrix.

These observation allows us identify the domain of the transformation $T$ as the orthogonal sum of the null space and the row space of the matrix $M_T$: \[ \mathbb{R}^n = \mathcal{N}(M_T) \oplus \mathcal{R}(M_T). \] This split implies the conservation of dimensions formula \[ {\rm dim}(\mathbb{R}^n) = n = {\rm dim}({\cal N}(M_T))+{\rm dim}({\cal R}(M_T)), \] which says that sum of the dimensions of the null space and the row space of a matrix $M_T$ must add up to the total dimensions of the input space.

We can summarize everything we know about the input-output relationship of the transformation $T$ as follows: \[ T \colon \mathcal{R}(M_T) \to \mathcal{C}(M_T), \qquad T \colon \mathcal{N}(M_T) \to \{ \vec{0} \}. \] Input vectors $\vec{v} \in \mathcal{R}(M_T)$ get mapped to output vectors $\vec{w} \in \mathcal{C}(M_T)$. Input vectors $\vec{v} \in \mathcal{N}(M_T)$ get mapped to the zero vector.

Composition

The consecutive application of two linear operations on an input vector $\vec{v}$ corresponds to the following matrix product: \[ S(T(\vec{v})) = M_S M_T \vec{v}. \] Note that the matrix $M_T$ “touches” the vector first, followed by the multiplication with $M_S$.

For such composition to be well defined, the dimension of the output space of $T$ must be the same as the dimension of the input space of $S$. In terms of the matrices, this corresponds to the condition that inner dimension in the matrix product $M_S M_T$ must be the same.

Choice of basis

In the above, we assumed that the standard bases were used both for the inputs and the outputs of the linear transformation. Thus, the coefficients in the matrix $M_T$ we obtained were with respect to the standard bases.

In particular, we assumed that the outputs of $T$ were given to us as column vectors in terms of the standard basis for $\mathbb{R}^m$. If the outputs were given to us in some other basis $B_W$, then the coefficients of the matrix $M_T$ would be in terms of $B_W$.

A non-standard basis $B_V$ could also be used for the input space $\mathbb{R}^n$, in which case to construct the matrix $M_T$ we would have to “probe” $T$ with each of the vectors $\vec{b}_i \in B_V$. Furthermore, in order to compute $T$ as “the matrix product with the matrix produced by $B_V$-probing,” we would have to express the input vectors $\vec{v}$ in terms of its coefficients with respect to $B_V$.

Because of this freedom regarding the choice of which basis to use, it would be wrong to say that a linear transformation is a matrix. Indeed, the same linear transformation $T$ would correspond to different matrices if different bases are used. We say that the linear transformation $T$ corresponds to a matrix $M$ for a given choice of input and output bases. We write $_{B_W}[M_T]_{B_V}$, in order to show the explicit dependence of the coefficients in the matrix $M_T$ on the choice of bases. With the exception of problems which involve the “change of basis,” you can always assume that the standard bases are used.

Invertible transformations

We will now revisit the properties of invertible matrices and connect it with the notion of an invertible transformation. We can think of the multiplication by a matrix $M$ as “doing” something to vectors, and thus the matrix $M^{-1}$ must be doing the opposite thing to put the vector back in its place again: \[ M^{-1} M \vec{v} = \vec{v}. \]

For simple $M$'s you can “see” what $M$ does. For example, the matrix \[ M = \begin{bmatrix}2 & 0 \nl 0 & 1 \end{bmatrix}, \] corresponds to a stretching of space by a factor of 2 in the $x$-direction, while the $y$-direction remains untouched. The inverse transformation corresponds to a shrinkage by a factor of 2 in the $x$-direction: \[ M^{-1} = \begin{bmatrix}\frac{1}{2} & 0 \nl 0 & 1 \end{bmatrix}. \] In general it is hard to see what the matrix $M$ does exactly since it is some arbitrary linear combination of the coefficients of the input vector.

The key thing to remember is that if $M$ is invertible, it is because when you get the output $\vec{w}$ from $\vec{w} = M\vec{v}$, the knowledge of $\vec{w}$ allows you to get back to the original $\vec{v}$ you started from, since $M^{-1}\vec{w} = \vec{v}$.

By the correspondence $\vec{w} = T(\vec{v}) \Leftrightarrow \vec{w} = M_T\vec{v}$, we can identify the class of invertible linear transformation $T$ for which there exists a $T^{-1}$ such that $T^{-1}(T(\vec{v}))=\vec{v}$. This gives us another interpretation for some of the equivalence statements in the invertible matrix theorem:

  1. $T\colon \mathbb{R}^n \to \mathbb{R}^n$ is invertible.

$\quad \Leftrightarrow \quad$

  $M_T \in \mathbb{R}^{n \times n}$ is invertible.
- $T$ is //injective// (one-to-one function). 
  $\quad \Leftrightarrow \quad$
  $M_T\vec{v}_1 \neq M_T\vec{v}_2$ for all $\vec{v}_1 \neq \vec{v}_2$.
- The linear transformation $T$ is //surjective// (onto).
  $\quad \Leftrightarrow \quad$
  $\mathcal{C}(M_T) = \mathbb{R}^n$.
- The linear transformation $T$ is //bijective// (one-to-one correspondence). 
  $\quad \Leftrightarrow \quad$
  For each $\vec{w} \in \mathbb{R}^n$, there exists a unique $\vec{v} \in \mathbb{R}^n$,
  such that $M_T\vec{v} = \vec{w}$.
- The null space of $T$ is zero-dimensional $\textrm{Null}(T) =\{ \vec{0} \}$ 
  $\quad \Leftrightarrow \quad$
  $\mathcal{N}(M_T) = \{ \vec{0} \}$.

When $M$ is not invertible, it means that it must send some vectors to the zero vector: $M\vec{v} = 0$. When this happens there is no way to get back the $\vec{v}$ you started from, i.e., there is no matrix $M^{-1}$ such that $M^{-1} \vec{0} = \vec{v}$, since $B \vec{0} = \vec{0}$ for all matrices $B$.

TODO: explain better the above par, and the par before the list…

Affine transformations

An affine transformation is a function $A:\mathbb{R}^n \to \mathbb{R}^m$ which is the combination of a linear transformation $T$ followed by a translation by a fixed vector $\vec{b}$: \[ \vec{y} = A(\vec{x}) = T(\vec{x}) + \vec{b}. \] By the $T \Leftrightarrow M_T$ equivalence we can write the formula for an affine transformation as \[ \vec{y} = A(\vec{x}) = M_T\vec{x} + \vec{b}, \] where the linear transformation is performed as a matrix product $M_T\vec{x}$ and then we add a vector $\vec{b}$. This is the vector generalization of the affine function equation $y=f(x)=mx+b$.

Discussion

The most general linear transformation

In this section we learned that a linear transformation can be represented as matrix multiplication. Are there other ways to represent linear transformations? To study this question, let's analyze from first principles the most general form that linear transformation $T\colon \mathbb{R}^n \to\mathbb{R}^m$ can take. We will use $V=\mathbb{R}^3$ and $W=\mathbb{R}^2$ to keep things simple.

Let us first consider the first coefficients $w_1$ of the output vector $\vec{w} = T(\vec{v})$, when the input vector is $\vec{v}$. The fact that $T$ is linear, means that $w_1$ can be an arbitrary mixture of the input vector coefficients $v_1,v_2,v_3$: \[ w_1 = \alpha_1 v_1 + \alpha_2 v_2 + \alpha_3 v_3. \] Similarly, the second component must be some other arbitrary linear combination of the input coefficients $w_2 = \beta_1 v_1 + \beta_2 v_2 + \beta_3 v_3$. Thus, we have that the most general linear transformation $T \colon V \to W$ can be written as: \[ \begin{align*} w_1 &= \alpha_1 v_1 + \alpha_2 v_2 + \alpha_3 v_3, \nl w_2 &= \beta_1 v_1 + \beta_2 v_2 + \beta_3 v_3. \end{align*} \]

This is precisely the kind of expression that can be expressed as a matrix product: \[ T(\vec{v}) = \begin{bmatrix} w_1 \nl w_2 \nl \end{bmatrix} = \begin{bmatrix} \alpha_1 & \alpha_2 & \alpha_3 \nl \beta_1 & \beta_2 & \beta_3 \end{bmatrix} \begin{bmatrix} v_1 \nl v_2 \nl v_3 \nl \end{bmatrix} = M_T \vec{v}. \]

In fact, the reason why the matrix product is defined the way it is because it allows us to express linear transformations so easily.

Links

[ Nice visual examples of 2D linear transformations ]
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

NOINDENT [ More on null space and range space and dimension counting ]
http://en.wikibooks.org/wiki/Linear_Algebra/Rangespace_and_Nullspace

NOINDENT [ Rotations as three shear operations ]
http://datagenetics.com/blog/august32013/index.html

 
home about buy book