linear_algebra:applications

minireference.com/linear_algebra/applications

The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).

Applications

Applications

lots of good examples here
http://isites.harvard.edu/fs/docs/icb.topic1011412.files/applications.pdf

RREF solve eqns

ML (decomp and eigenvectors google page rank?)

Linear programming

http://en.wikipedia.org/wiki/Linear_programming
http://en.wikipedia.org/wiki/Simplex_algorithm

[ An application of linear programming in game theory ]
http://blog.alabidan.me/?p=240

Solving systems of equations

Example from circuits

When you learn about circuits, you will use Ohm's law $V=RI$ which tells you the drop in potential that occurs when a current $I$ runs through a resistor of $R$ [ $\Omega$ ] (Ohm's). Voltage is measured in Volts [V], current is measured in Amprères [A] so [ $\Omega$ ]=[V/A].

Given a complicated electric circuit in which several voltage sources (batteries) and resistors (light bulbs) are connected, it can be quite difficult to “solve for” all the voltages and currents in the circuit. More precisely, it can be hard if you don't know about linear algebra.

If you know linear algebra you can solve the circuit using row operations (Gauss-Jordan elimination) in one or two minutes. Let me show you an example. Using the Kirchoff's Voltage Law for each loop (The KVL states that the sum of the voltage gains and drops along any loop in the circuit must add up to zero), we obtain the following equations: $\begin{align*} +10 - R_1I_1 + 5 - R_2(I_1-I_2) &= 0, \nl +R_2(I_1-I_2) - R_3 I_2 + 20 &= 0. \end{align*}$ You can rearrange these into the form: $\begin{align*} (R_1+R_2) I_1 - R_2 I_2 &= 15 \nl R_2I_1 - (R_2 + R_3)I_2 &= -20. \end{align*}$ You can now use standard techniques from linear algebra (row operations) to solve this system of equations in just a few seconds.

Sidenote: the notion of linear independence of the equations you need to solve manifests in an interesting way with circuits. We must choose the KVL equations to describe the current flowing in linearly independent loops. For example, there are actually three loops in a circuit with two loops which share some elements: the voltage gains/drops around the first loop, the voltages around the second loop, and also the voltages around both loops taken together. It would seem then, that we have a system of three equations in two unknowns. However, the three equations are not independent: the KVM equation for the outer loop is equal to the sum of the first two loops.

Least squares approximate solution

Recall that an equation of the form $A\vec{x}=b$ could have exactly one solution (if $A$ is invertible), infinitely many solutions (if $A$ has a null space), or no solutions at all.

Let's analyze what happens in the case where there ar enosWe will analyze The case

no exact solution, but can come up with an approximate solution

he cool direct applications of linear algebra to machine learning. Suppose you are given the data $D = \left[\;\;\;\; \begin{array}{rcl} - & \vec{r}_1 & - \nl - & \vec{r}_2 & - \nl - & \vec{r}_3 & - \nl & \vdots & \nl - & \vec{r}_N & - \end{array} \;\;\;\;\right].$ Each row $\vec{r}_i$ is an n-vector $\vec{r}_i=(a_{i1}, a_{i2}, \ldots, a_{in}, b_i)$ . Each row consists of some observation data. We want to predict future $b_j$ given the future $\vec{a}_j$ , given that we have seen ${r_i}_{i=1...N}$ The data set consists of $N$ data rows $\vec{r}_i$ where both $\vec{a}_i$ and $b_i$ are known.

One simple model for $b_j$ given $\vec{a}_i = \vec{a}_i = (a_{i1}, a_{i2}, \ldots, a_{in})$ is a linear model with $n$ parameters $m_1,m_2,\ldots,m_n$ : $y_m(x_1,x_2,\ldots,x_n) = m_1x_1 + m_2x_2 + \cdots + m_nx_n = \vec{m} \cdot \vec{x}.$ If the model is good then $y_m(\vec{a}_i)$ approximates $b_i$ well. But how well?

Enter error term: $e_i(\vec{m}) = \left| y_m(\vec{a}_i) - b_i \right|^2,$ the squared absolute value of the difference between the model's prediction and the actual output—hence the name error term. Our goal is to make the sum $S$ of all the error terms as small as possible. $S(\vec{m}) = \sum_{i=1}^{i=M} e_i(\vec{m}).$ Note that the “total squared error” is a function of the model parameters $\vec{m}$ . At this point we have reached a level of complexity that becomes difficult to follow. Linear algebra to the rescue! We can express the “vector prediction” of the model y_m in “one shot” in terms of the following matrix equation: $A\vec{m} = \vec{b},$ where $A$ is an $N \times n$ matrix (contains the $a_{ij}$ part of the data), $\vec{m}$ is an $n \times 1$ vector (model parameters—the unknown), and $\vec{b}$ is an $N \times 1$ vector (contains the $b_{i}$ part of the data)

To find \vec{m}, we must solve this matrix equation, however A is not a square matrix: A is a tall skinny matrix $N >> n$ , so there is no $A^{-1}$ . Okay so we don't have a $A^{-1}$ to throw at the equation $A\vec{m}=\vec{b}$ to cancel the $A$ , but what else could we throw at it. Let's throw $A^T$ at it! $\begin{align*} \underbrace{A^T A}_{N} \vec{m} & = A^T \vec{b} \nl N \vec{m} & = A^T \vec{b} \end{align*}$ Now the thing to observe is that if N is invertible, then we can find an approximation $\vec{m}^*$ using $\vec{m}^* = N^{-1} A^T \vec{b} = (A^T A)^{-1}A^T \vec{b}.$ This solution to the problem is known as the “least squares fit” solution. This name comes from the fact that this solution is equal to the output of the following optimization problem $\vec{m}^* = \mathop{\textrm{argmin}}_{\vec{m}} S(\vec{m})$

Proof: http://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)

Technical detail: the matrix $N=A^TA$ is invertible if and only if the columns of A are linearly independent.

When you have to do a “linear regression” model of data matrix $X$ and labels $\vec{y}$ , the best (in the sense of least squared error) linear model is $\vec{m} = (X^T X)^{-1} X^T \vec{y}$ .

Error correcting codes

where the vector coefficients are the raw data bits you want to transmit and the matrix is called an encoding matrix.

Cryptography

Network coding

Pirate material

In the bittorrent scheme, a large file $F$ is split into tiny pieces $F=\{ m_1, m_2, m_3, \ldots, m_N\}$ and the different pieces are shared by the peers of the network. The download is complete when you have collected all the pieces $m_1$ through $m_N$ . Of course, you can remain connected.

Suppose that a network coding scheme is used instead, and people share mixtures of packets. For example, you could receive $m_1 \oplus m_2$ (xor of $m_1$ and $m_2$ ) from one peer, $m_1 \oplus m_2 \oplus m_3$ from another peer and $m_2$ from a third peer.

Can your recover the first three pieces of the file $\{ m_1, m_2, m_3\}$ ? Yes you can, thanks to self-inverse the property of XOR.

$m_1 = (m_1 \oplus m_2) \oplus (m_2)$ and then once you have $m_1$ and $m_2$ you can do $m_3 = (m_1 \oplus m_2 \oplus m_3) \oplus (m_1) \oplus (m_2).$

Q: In general, if you receive $M$ arbitrary combinations of packets, how do you know you can extract the packets?

A: You can if the matrix is invertible. (over the binary field )

Probability density

The power of the signal is a probability density

$\Pr\{ \text{finding electron at} \ \ \vec{r} \ \} = |\psi(\vec{r})|^2$ Verify that it is well normalized $\begin{align*} P_{total} &= \int\!\!\int\!\!\int |\psi(\vec{r})|^2 \ d^3\vec{r} \nl &= \int_0^\infty\int_0^{2\pi}\int_0^\pi |\psi(r,\vartheta,\varphi)|^2 \ r^2 \ \sin \varphi d\varphi d\vartheta dr \nl &= \int_0^\infty \frac{4}{a^3} \exp\left(\frac{2 r}{a}\right) r^2 \ dr = 1 \nl &= ?? \int_0^\infty \frac{4}{a^3} \exp\left(\frac{2 r}{a}\right) r^2 \ dr = 1 \textrm{ if } Re(a)>0 \end{align*}$