What is linearity? What does a linear expression look like? Consider the following arbitrary function which contains terms with different powers of the input variable $x$: \[ f(x) = \frac{a}{x^3} \; + \; \frac{b}{x^2} \; + \; \frac{c}{x} \; + \; d \; + \; \underbrace{mx}_{\textrm{linear term}} \; + \; e x^2 \; + \; fx^3. \] The term $mx$ is the only linear term—it contains $x$ to the first power. All other terms are non-linear.
A single-variable function takes as input a real number $x$ and outputs a real number $y$. The signature of this class of functions is \[ f \colon \mathbb{R} \to \mathbb{R}. \]
The most general linear function from $\mathbb{R}$ to $\mathbb{R}$ looks like this: \[ y \equiv f(x) = mx, \] where $m \in \mathbb{R}$ is some constant, which we call the coefficient of $x$. The action of a linear function is to multiply the input by a constant—this is not too complicated, right?
Given the linear functions $f(x)=2x$ and $g(y)=3y$, what is the equation of the function $h(x) \equiv g\circ f \:(x) = g(f(x))$? The composition of the functions $f(x)=2x$ and $g(y)=3y$ is the function $h(x) =g(f(x))= 3(2x)=6x$. Note the composition of two linear functions is also a linear function whose coefficient is equal to the product of the coefficients of the two constituent functions.
A function is linear if, for any two inputs $x_1$ and $x_2$ and constants $\alpha$ and $\beta$, the following equation is true: \[ f(\alpha x_1 + \beta x_2) = \alpha f(x_1) + \beta f(x_2). \] A linear combination of inputs gets mapped to the same linear combination of outputs.
Consider the equation of a line: \[ l(x) = mx+b, \] where the constant $m$ corresponds to the slope of the line and the constant $b =f(0)$ is the $y$-intercept of the line. A line $l(x)=mx+b$ with $b\neq 0$ is not a linear function. This is a bit weird, but if you don't trust me you just have to check: \[ l(\alpha x_1 + \beta x_2) = m(\alpha x_1 + \beta x_2)+b \neq m(\alpha x_1)+b + m(\beta x_2) + b = \alpha l(x_1) + \beta l(x_2). \] A function with a linear part plus some constant is called an affine transformation. These are cool too, but a bit off topic since the focus of our attention is on linear functions.
The study of linear algebra is the study of all things linear. In particular we will learn how to work with functions that take multiple variables as inputs. Consider the set of functions that take on as inputs two real numbers and give a real number as output: \[ f \colon \mathbb{R}\times\mathbb{R} \to \mathbb{R}. \] The most general linear function of two variables is \[ f(x,y) = m_xx + m_yy. \] You can think of $m_x$ as the $x$-slope and $m_y$ as the $y$-slope of the function. We say $m_x$ is the $x$-coefficient of and $m_y$ the $y$-coefficient in the linear expression $m_xx + m_yy$.
A linear expression in the variables $x_1$, $x_2$, and $x_3$ has the form:
\[
a_1 x_1 + a_2 x_2 + a_3 x_3,
\]
where $a_1$, $a_2$, and $a_3$ are arbitrary constants.
Note the new terminology used ”expr
is linear in $v$” to refer to the
expressions in which the variable $v$ appears only raised to the
first power in expr
.
A linear equation in the variables $x_1$, $x_2$, and $x_3$ has the form \[ a_1 x_1 + a_2 x_2 + a_3 x_3 = c. \] This equation is linear because it contains no nonlinear terms in $x_i$. Note that the equation $\frac{1}{a_1} x_1 + a_2^6 x_2 + \sqrt{a_3} x_3 = c$, contains non-linear factors, but is still linear in $x_1$, $x_2$, and $x_3$.
Linear equations are very versatile. Suppose you know that the following equation is an accurate model of some real-world phenomenon: \[ 4k -2m + 8p = 10, \] where the $k$, $m$, and $p$ correspond to three variables of interest. You can think of this equation as describing the variable $m$ as a function of the variables $k$ and $p$: \[ m(k,p) = 2k + 4p - 5. \] Using this function you can predict the value of $m$ given the knowledge of the quantities $k$ and $p$.
Another option would be to think of $k$ as a function of $m$ and $p$: $k(m,p) = 10 +\frac{m}{2} - 2p$. This model would be useful if you know the quantities $m$ and $p$ and you want to predict the value of the variable $k$.
The most general linear equation in $x$ and $y$, \[ Ax + By = C \qquad B \neq 0, \] corresponds to the equation of a line $y=mx+b$ in the Cartesian plane. The slope of this line is $m=\frac{-A}{B}$ and its $y$-intercept is $\frac{C}{B}$. In the special case when $B=0$, the linear expression corresponds to a vertical line with equation $x=\frac{C}{A}$.
The most general linear equation in $x$, $y$, and $z$, \[ Ax + By + Cz = D, \] corresponds to the equation of a plane in a three-dimensional space. Assuming $C\neq 0$, we can rewrite this equation so that $z$ (the “height” of the plane) is a function of the coordinates $x$ and $y$: $z(x,y) = b + m_x x + m_y y$. The slope of the plane in the $x$-direction is $m_x= - \frac{A}{C}$ and $m_y = - \frac{B}{C}$ in the $y$-direction. The $z$-intercept of the plane is $b=\frac{D}{C}$.
When we us a linear function as a mathematical model for a non-linear real-world phenomenon, we say the function represents a linear model or a first-order approximation. Let's analyze in a little more detail what that means.
In calculus, we learn that functions can be represented as infinite Taylor series: \[ f(x) = \textrm{taylor}(f(x)) = a_0 + a_1t + a_2t^2 + a_3t^3 + \cdots = \sum_{n=0}^\infty a_n x^n, \] where the coefficients $a_n$ depend on the $n$th derivative of the function $f(x)$. The Taylor series is only equal to the function $f(x)$ if infinitely many terms in the series are calculated. If we sum together only a finite terms of the series, we obtain a Taylor series approximation. The first-order Taylor series approximation to $f(x)$ is \[ f(x) \approx \textrm{taylor}_1(f(x)) = a_0 + a_1x = f(0) + f'(0)x. \] The above equation describes the best approximation to $f(x)$ near $x=0$, by a line of the form $l(x)=mx+b$. To build a linear model of a function $f(x)$, all you need to measure is its initial value $f(0)$, and its rate of change $f'(0)$.
For a function $F(x,y,z)$ that takes many variables as inputs, the first-order Taylor series approximation is \[ F(x,y,z) \approx b + m_x x + m_y y + m_z z. \] Except for the constant term, the function has the form of a linear expression. The “first order approximation” to a function of $n$ variables $F(x_1,x_2,\ldots, x_n)$ has the form $b + m_1x_1 + m_2x_2 + \cdots + m_nx_n$.
In linear algebra, we learn about many new mathematical objects and define functions that operate on these objects. In all the different scenarios we will see, the notion of linearity $f(\alpha x_1 + \beta x_2) = \alpha f(x_1) + \beta f(x_2)$ play a key role.
We begin our journey of all things linear in the next section with the study of systems of linear equations.