The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."

The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.

I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).


Table of Contents

<texit info> author=Ivan Savov title=Minireference backgroundtext=off </texit>

MATH and PHYSICS

The MATH and PHYSICS minireference is a comprehensive course on Calculus, Mechanics, Linear algebra, Electricity and Magnetism. All the important topics are explained in an intuitive and concise manner so that you the student can learn more efficiently. When there is only a couple of days left before the exam, what would you like to read: 400 page in a regular textbook or 50 pages in the minireference?

About $\ \ $ | $\ \ $ Table of contents $\ \ $ | $\ \ $ Buy in print

Math fundamentals

As soon as the keyword “math” comes-up during a conversation, people start to feel uneasy. There are a number of common strategies that people use to escape this subject of conversation. The most common approach is to say something like “I always hated math”, or “I am terrible at math”, which is a clear social cue that a change of subject is requested. Another approach is to generally be sympathetic to the idea of mathematics, so long as it appears in the third person: “she solved the equation” is fine but “I solved the equation” is not thinkable. The usual motivation for this mathematics pour les autres approach is that it is highly specialized knowledge that does not contain any true value for the general audience. A variant of the above is to believe that a special kind of brain is required in order to do math.

Mathematical knowledge is actually really cool. Knowing math is like having analytic superpowers. You can use the power of abstraction to see the math behind any real world situation. And once in the math world you can jot down some numbers and functions on a piece of paper and you can calculate the answer. Unfortunately, this is not the image that most people have of mathematics. Math is taught usually taught with a lot of focus placed on the mechanical steps. Mindlessly number crunching and following steps without understanding what they are doing is not cool. If this is how you learned about the basic ideas of math, I can't blame you if you hate it, as it is kind of boring.

Often times, my students ask me to review some basic notion from high school math which is needed for a more advanced topic. This chapter is a collection of short review articles that cover a lot of useful topics from high school math.

Topics math

This chapter should help you learn most of the useful concepts from the high school math curriculum, and in particular all the prerequisite topics for University-level math and physics courses.

Solving equations

Most math skills boil down to being able to manipulate and solve equations. To solve an equation means to find the value of the unknown in the equation.

Check this shit out: \[ x^2-4=45. \]

To solve the above equation is to answer the question “What is $x$?” More precisely, we want to find the number which can take the place of $x$ in the equation so that the equality holds. In other words, we are asking \[ \text{"Which number times itself minus four gives 45?"} \]

That is quite a mouthful don't you think? To remedy this verbosity, mathematicians often use specialized mathematical symbols. The problem is that the specialized symbols used by mathematicians are confuse people. Sometimes even the simplest concepts are inaccessible if you don't know what the symbols mean.

What are your feelings about math, dear reader? Are you afraid of it? Do you have anxiety attacks because you think it will be too difficult for you? Chill! Relax my brothers and sisters. There is nothing to it. Nobody can magically guess what the solution is immediately. You have to break the problem down into simpler steps.

To find $x$, we can manipulate the original equation until we transform it to a different equation (as true as the first) that looks like this: \[ x= just \ some \ numbers. \]

That's what it means to solve. The equation is solved because you could type the numbers on the right hand side of the equation into a calculator and get the exact value of $x$.

To get $x$, all you have to do is make the right manipulations on the original equation to get it to the final form. The only requirement is that the manipulations you make transform one true equation into another true equation.

Before we continue our discussion, let us take the time to clarify what the equality symbol $=$ means. It means that all that is to the left of $=$ is equal to all that is to the right of $=$. To keep this equality statement true, you have to do everything that you want to do to the left side also to the right side.

In our example from earlier, the first simplifying step will be to add the number four to both sides of the equation: \[ x^2-4 +4 =45 +4, \] which simplifies to \[ x^2 =49. \] You must agree that the expression looks simpler now. How did I know to do this operation? I was trying to “undo” the effects of the operation $-4$. We undo an operation by applying its inverse. In the case where the operation is subtraction of some amount, the inverse operation is the addition of the same amount.

Now we are getting closer to our goal, namely to isolate $x$ on one side of the equation and have just numbers on the other side. What is the next step? Well if you know about functions and their inverses, then you would know that the inverse of $x^2$ ($x$ squared) is to take the square root $\sqrt{ }$ like this: \[ \sqrt{x^2} = \sqrt{49}. \] Notice that I applied the inverse operation on both sides of the equation. If we don't do the same thing on both sides we would be breaking the equality!

We are done now, since we have isolated $x$ with just numbers on the other side: \[ x = \pm 7. \]

What is up with the $\pm$ symbol? It means that both $x=7$ and $x=-7$ satisfy the above equation. Seven squared is 49, and so is $(-7)^2 = 49$ because two negatives cancel out.

If you feel comfortable with the notions of high school math and you could have solved the equation $x^2-4=25$ on your own, then you should consider skipping ahead to Chapter 2. If on the other hand you are wondering how the squiggle killed the power two, then this chapter is for you! In the next sections we will review all the essential concepts from high school math which you will need for the rest of the book. First let me tell you about the different kinds of numbers.

Numbers

We will start the exposition like a philosophy paper and define precisely what we are going to be talking about. At the beginning of all matters we have to define the players in the world of math: numbers.

Definitions

Numbers are the basic objects which you can type into a calculator and which you use to calculate things. Mathematicians like to classify the different kinds of number-like objects into sets:

  • The Naturals: $\mathbb{N} = \{0,1,2,3,4,5,6,7, \ldots \}$,
  • The Integers: $\mathbb{Z} = \{\ldots, -3,-2,-1,0,1,2,3 , \ldots \}$,
  • The Rationals: $\mathbb{Q} = \{-1,0,0.125,1,1.5, \frac{5}{3}, \frac{22}{7}, \ldots \} $,
  • The Reals: $\mathbb{R} = \{-1,0,1,e,\pi, -1.539..,\ 4.94.., \ \ldots \}$,
  • The Complex numbers: $\mathbb{C} = \{ -1, 0, 1, i, 1+i, 2+3i, \ldots \}$.

These categories of numbers should be somewhat familiar to you. Think of them as neat classification labels for everything that you would normally call a number. Each item in the above list is a set. A set is a collection of items of the same kind. Each collection has a name and a precise definition. We don't need to go into the details of sets and set notation for our purposes, but you have to be aware of the different categories. Note also that each of the sets in the above list contains all the sets above it.

Why do you need so many different sets of numbers? The answer is partly historical and partly mathematical. Each of the set of numbers is associated with more and more advanced mathematical problems.

The simplest kind of numbers are the natural numbers $\mathbb{N}$, which are sufficient for all your math needs if all you are going to do is count things. How many goats? Five goats here and six goats there so the total is 11. The sum of any two natural numbers is also a natural number.

However, as soon as you start to use subtraction (the inverse operation of addition), you start to run into negative numbers, which are numbers outside of the set of natural numbers. If the only mathematical operations you will ever use are addition and subtraction then the set of integers $\mathbb{Z} = \{ \ldots, -2, -1, 0, 1, 2, \ldots \}$ would be sufficient. Think about it. Any integer plus or minus any other integer is still an integer.

You can do a lot of interesting math with integers. There is an entire field in math called number theory which deals with integers. However, if you restrict yourself to integers you would be limiting yourself somewhat. You can't use the notion of 2.5 goats for example. You would get totally confused by the menu at Rotisserie Romados which offers $\frac{1}{4}$ of a chicken.

If you want to use division in your mathematical calculations then you will need the rationals $\mathbb{Q}$. The rationals are the set of quotients of two integers: \[ \mathbb{Q} = \{ \text{ all } z \text{ such that } z=\frac{x}{y}, x \text{ is in } \mathbb{Z}, y \text{ is in } \mathbb{N}, y \neq 0 \}. \] You can add, subtract, multiply and divide rational numbers and the result will always be a rational number. However even rationals are not enough for all of math!

In geometry, we can obtain quantities like $\sqrt{2}$ (the diagonal of a square with side 1) and $\pi$ (the ratio between a circle's circumference and its diameter) which are irrational. There are no integers $x$ and $y$ such that $\sqrt{2}=\frac{x}{y}$, therefore, $\sqrt{2}$ is not part of $\mathbb{Q}$. We say that $\sqrt{2}$ is irrational. An irrational number has an infinitely long decimal expansion. For example, $\pi = 3.1415926535897931..$ where the dots indicate that the decimal expansion of $\pi$ continues all the way to infinity.

If you add the irrational numbers to the rationals you get all the useful numbers, which we call the set of real numbers $\mathbb{R}$. The set $\mathbb{R}$ contains the integers, the fractions $\mathbb{Q}$, as well as irrational numbers like $\sqrt{2}=1.4142135..$. You will see that using the reals you can compute pretty much anything you want. From here on in the text, if I say number I will mean an element of the set of real numbers $\mathbb{R}$.

The only thing you can't do with the reals is take the square root of a negative number—you need the complex numbers for that. We defer the discussion on $\mathbb{C}$ until Chapter 3.

Operations on numbers

Addition

You can add and subtract numbers. I will assume you are familiar with this kind of stuff. \[ 2+5=7,\ 45+56=101,\ 65-66=-1,\ 9999 + 1 = 10000,\ \ldots \]

The visual way to think of addition is the number line. Adding numbers is like adding sticks together: the resulting stick has length equal to the sum of the two constituent sticks.

Addition is commutative, which means that $a+b=b+a$. It is also associative, which means that if you have a long summation like $a+b+c$ you can compute it in any order $(a+b)+c$ or $a+(b+c)$ and you will get the same answer.

Subtraction is the inverse operation of addition.

Multiplication

You can also multiply numbers together. \[ ab = \underbrace{a+a+\cdots+a}_{b \ times}=\underbrace{b+b+\cdots+b}_{a \ times}. \] Note that multiplication can be defined in terms of repeated addition.

The visual way to think about multiplication is through the concept of area. The area of a rectangle of base $a$ and height $b$ is equal to $ab$. A rectangle which has height equal to its base is a square, so this why we call $aa=a^2$ “$a$ squared.”

Multiplication of numbers is also commutative $ab=ba$, and associative $abc=(ab)c=a(bc)$. In modern notation, no special symbol is used to denote multiplication; we simply put the two factors next to each other and say that the multiplication is implicit. Some other ways to denote multiplication are $a\cdot b$, $a\times b$ and, on computer systems, $a*b$.

Division

Division is the inverse of multiplication. \[ a/b = \frac{a}{b} = \text{ one } b^{th} \text{ of } a. \] Whatever $a$ is, you need to divide it into $b$ equal pieces and take one such piece. Some texts denote division by $a\div b$.

Note that you cannot divide by $0$. Try it on your calculator or computer. It will say error divide by zero, because it simply doesn't make sense. What would it mean to divide something into zero equal pieces?

Exponentiation

Very often you have to multiply things together many times. We call that exponentiation and denote that with a superscript: \[ a^b = \underbrace{aaa\cdots a}_{b\ times}. \]

We can also have negative exponents. The negative in the exponent does not mean “subtract”, but rather “divide by”: \[ a^{-b}=\frac{1}{a^b}=\frac{1}{\underbrace{aaa\cdots a}_{b\ times}}. \]

An exponent which is a fraction means that it is some sort of square-root-like operation: \[ a^{\frac{1}{2}} \equiv \sqrt{a} \equiv \sqrt[2]{a}, \qquad a^{\frac{1}{3}} \equiv \sqrt[3]{a}, \qquad a^{\frac{1}{4}} \equiv \sqrt[4]{a} = a^{\frac{1}{2}\frac{1}{2}}=\left(a^{\frac{1}{2}}\right)^{\frac{1}{2}} = \sqrt{\sqrt{a}}. \] Square root $\sqrt{x}$ is the inverse operation of $x^2$. Similarly, for any $n$ we define the function $\sqrt[n]{x}$ (the $n$th root of $x$) to be the inverse function of $x^n$.

It is worth clarifying what “taking the $n$th root” means and what this operation can be used for. The $n$th root of $a$ is a number which, when multiplied together $n$ times, will give $a$. So for example a cube root satisfies \[ \sqrt[3]{a} \sqrt[3]{a} \sqrt[3]{a} = \left( \sqrt[3]{a} \right)^3 = a = \sqrt[3]{a^3}. \] Do you see now why $\sqrt[3]{x}$ and $x^3$ are inverse operations?

The fractional exponent notation makes the meaning of roots much more explicit: \[ \sqrt[n]{a} \equiv a^{\frac{1}{n}}, \] which means that $n$th root is equal to one $n$th of a number with respect to multiplication. Thus, if we want the whole number, we have to multiply the number $a^{\frac{1}{n}}$ times itself $n$ times: \[ \underbrace{a^{\frac{1}{n}}a^{\frac{1}{n}}a^{\frac{1}{n}}a^{\frac{1}{n}} \cdots a^{\frac{1}{n}}a^{\frac{1}{n}}}_{n\ times} = \left(a^{\frac{1}{n}}\right)^n = a^{\frac{n}{n}} = a^1 = a. \] The $n$-fold product of $\frac{1}{n}$ fractional exponents of any number products the number with exponent one, therefore the inverse operation of $\sqrt[n]{x}$ is $x^n$.

The commutative law of multiplication $ab=ba$ implies that we can see any fraction $\frac{a}{b}$ in two different ways $\frac{a}{b}=a\frac{1}{b}=\frac{1}{b}a$. First we multiply by $a$ and then divide the result by $b$, or first we divide by $b$ and then we multiply the result by $a$. This means that when we have a fraction in the exponent, we can write the answer in two equivalent ways: \[ a^{\frac{2}{3} }=\sqrt[3]{a^2} = (\sqrt[3]{a})^2, \qquad a^{-\frac{1}{2}}=\frac{1}{a^{\frac{1}{2}}} = \frac{1}{\sqrt{a}}, \qquad a^{\frac{m}{n}} = \left(\sqrt[n]{a}\right)^m = \sqrt[n]{a^m}. \]

Make sure the above notation makes sense to you. As an exercises try to compute $5^{\frac{4}{3}}$ on your calculator, and check that you get around 8.54987973.. as an answer.

Operator precedence

There is a standard convention for the order in which mathematical operations have to be performed. The three basic operations have the following precedence:

  1. Exponents and roots.
  2. Products and divisions.
  3. Additions and subtractions.

This means that the expression $5\times3^2+13$ is interpreted as “first take the square of $3$, then multiply by $5$ and then add $13$.” If you want the operations to be carried out in a different order, say you wanted to multiply $5$ times $3$ first and then take the square you should use parentheses: $(5\times 3)^2 + 13$, which now shows that the square acts on $(5 \times 3)$ as a whole and not on $3$ alone.

Other operations

We can define all kinds of operations on numbers. The above three are special since they have a very simple intuitive feel to them, but we can define arbitrary transformations on numbers. We call those functions. Before we learn about functions, let us talk about variables first.

Functions

Your function vocabulary determines how well you will be able to express yourself mathematically in the same way that your English vocabulary determines how well you can express yourself in English.

The purpose of the following pages is to embiggen your vocabulary a bit so you won't be caught with your pants down when the teacher tries to pull some trick on you at the final. I give you the minimum necessary, but I recommend you explore these functions on your own via wikipedia and by plotting their graphs on Wolfram alpha.

To “know” a function you have to understand and connect several different aspects of the function. First you have to know its mathematical properties (what does it do, what is its inverse) and at the same time have a good idea of its graph, i.e., what it looks like if you plot $x$ versus $f(x)$ in the Cartesian plane. It is also really good idea if you can remember the function values for some important inputs.

Definition

A function is a mathematical object that takes inputs and gives outputs. We use the notation \[ f \colon X \to Y, \] to denote a functions from the set $X$ to the set $Y$. In this book, we will study mostly functions which take real numbers as inputs and give real numbers as outputs: $f\colon\mathbb{R} \to \mathbb{R}$.

We now define some technical terms used to describe the input and output sets.

  • The domain of a function is the set of allowed input values.
  • The image or range of the function $f$ is the set of all possible

output values of the function.

  • The codomain of a function is the type of outputs that the functions has.

To illustrate the subtle difference between the image of a function and its codomain, let us consider the function $f(x)=x^2$. The quadratic function is of the form $f\colon\mathbb{R} \to \mathbb{R}$. The domain is $\mathbb{R}$ (it takes real numbers as inputs) and the codomain is $\mathbb{R}$ (the outputs are real numbers too), however, not all outputs are possible. Indeed, the image the function $f$ consists only of the positive numbers $\mathbb{R}_+$. Note that the word “range” is also sometimes used refer to the function codomain.

A function is not a number, it is a mapping from numbers to numbers. If you specify a given $x$ as input, we denote as $f(x)$ is the output value of $f$ for that input. Here is a graphical representation of a function with domain $A$ and codomain $B$.

The function corresponds to the arrow in the above picture.

We say that “$f$ maps $x$ to $y=f(x)$” and use the following terminology to classify the type of mapping that a function performs:

  • A function is one-to-one or injective if it maps different inputs to different outputs.
  • A function is onto or surjective if it covers the entire output set,

i.e., if the image of the function is equal to the function codomain.

  • A function is bijective if it is both injective and surjective.

In this case $f$ is a one-to-one correspondence between the input

  set and the output set: for each input of the 
  possible outputs $y \in Y$ there exists (surjective part) exactly one input $x \in X$,
  such that $f(x)=y$ (injective part).

The term injective is a 1940s allusion inviting us to think of injective functions as some form of fluid flow. Since fluids cannot be compressed, the output space must be at least as large as the input space. A modern synonym for injective functions is to say that they are two-to-two. If you imagine two specks of paint inserted somewhere in the “input fluid”, then an injective function will lead to two distinct specks of paint in the “output fluid.” In contrast, functions which are not injective could map several different inputs to the same output. For example $f(x)=x^2$ is not injective since the inputs $2$ and $-2$ both get mapped to output value $4$.

Function names

Mathematicians have defined symbols $+$, $-$, $\times$ (usually omitted) and $\div$ (usually denoted as a fraction) for most important functions used in everyday life. We also use the weird surd notation to denote $n$th root $\sqrt[n]{\ }$ and the superscript notation to denote exponents. All other functions are identified and used by their name. If I want to compute the cosine of the angle $60^\circ$ (a function which describes the ratio between the length of one side of a right-angle triangle and the hypotenuse), then I would write $\cos(60^\circ)$, which means that we want the value of the $\cos$ function for the input $60^\circ$.

Incidentally, for that specific angle the function $\cos$ has a nice value: $\cos(60^\circ)=\frac{1}{2}$. This means that seeing $\cos(60^\circ)$ somewhere in an equation is the same as seeing $0.5$ there. For other values of the function like say $\cos(33.13^\circ)$, you will need to use a calculator. A scientific calculator will have a $\cos$ button on it for that purpose.

Handles on functions

When you learn about functions you learn about different “handles” onto these mathematical objects. Most often you will have the function equation, which is a precise way to calculate the output when you know the input. This is an important handle, especially when you will be doing arithmetic, but it is much more important to “feel” the function.

How do you get a feel for some function?

One way is to look at list of input-output pairs $\{ \{ \text{input}=x_1, \text{output}=f(x_1) \},$ $\{ \text{input}=x_2,$ $\text{output}=f(x_2) \},$ $\{ \text{input}=x_3, \text{output}=f(x_3) \}, \ldots \}$. A more compact notation for the input-output pairs $\{ (x_1,f(x_1)),$ $(x_2,f(x_2)),$ $(x_3,f(x_3)), \ldots \}$. You can make a little table of values for yourself, pick some random inputs and record the output of the function in the second column: \[ \begin{align*} \textrm{input}=x \qquad &\rightarrow \qquad f(x)=\textrm{output} \nl 0 \qquad &\rightarrow \qquad f(0) \nl 1 \qquad &\rightarrow \qquad f(1) \nl 55 \qquad &\rightarrow \qquad f(55) \nl x_4 \qquad &\rightarrow \qquad f(x_4) \end{align*} \]

Apart from random numbers it is also generally a good idea to check the value of the function at $x=0$, $x=1$, $x=100$, $x=-1$ and any other important looking $x$ value.

One of the best ways to feel a function is to look at its graph. A graph is a line on a piece of paper that passes through all input-output pairs of the function. What? What line? What points? Ok let's backtrack a little. Imagine that you have a piece of paper you have drawn a coordinate system on the paper.

The horizontal axis will be used to measure $x$, this is also called the abscissa. The vertical axis will be used to measure $f(x)$, but because writing out $f(x)$ all the time is long and tedious, we will invent a short single-letter alias to denote the output value of $f$ as follows: \[ y \equiv f(x) = \text{output}. \]

Now you can take each of the input-output pairs for the function $f$ and think of them as points $(x,y)$ in the coordinate system. Thus the graph of a function is a graphical representation of everything the function does. If you understand the simple “drawing” on this page, you will basically understand everything there is to know about the function.

Another way to feel functions is through the properties of the function: either the way it is defined, or its relation to other functions. This boils down to memorizing facts about the function and its relations to other functions. An example of a mathematical fact is $\sin(30^\circ)=\frac{1}{2}$. An example of a mathematical relation is the equation $\sin^2 x + \cos^2 x =1$, which is a link between the $\sin$ and the $\cos$ functions.

The last part may sound contrary to my initial promise about the book saying that I will not make you memorize stuff for nothing. Well, this is not for nothing. The more you know about any function, the more “paths” you have in your brain that connect to that function. Real math knowledge is not memorization but an establishment of a graph of associations between different areas of knowledge in your brain. Each concept is a node in this graph, and each fact you know about this concept is an edge in the graph. Analytical thought is the usage of this graph to produce calculations and mathematical arguments (proofs). For example, knowing the fact $\sin(30^\circ)=\frac{1}{2}$ about $\sin$ and the relationship $\sin^2 x + \cos^2 x = 1$ between $\sin$ and $\cos$, you could show that $\cos(30^\circ)=\frac{\sqrt{3}}{2}$. Note that the notation $\sin^2(x)$ means $(\sin(x))^2$.

To develop mathematical skills, it is therefore important to practice this path-building between related concepts by solving exercises and reading and writing mathematical proofs. My textbook can only show you the paths between the concepts, it is up to you to practice the exercises in the back of each chapter to develop the actual skills.

Example: Quadratic function

Consider the function from the real numbers ($\mathbb{R}$) to the real numbers ($\mathbb{R}$) \[ f \colon \mathbb{R} \to \mathbb{R} \] given by \[ f(x)=x^2+2x+3. \] The value of $f$ when $x=1$ is $f(1)=1^2+2(1)+3=1+2+3=6$. When $x=2$, we have $f(2)=2^2+2(2)+3=4+4+3=11$. What is the value of $f$ when $x=0$?

Example: Exponential function

Consider the exponential function with base two: \[ f(x) = 2^x. \] This function is of crucial importance in computer systems. When $x=1$, $f(1)=2^1=2$. When $x$ is 2 we have $f(2)=2^2=4$. The function is therefore described by the following input-output pairs: $(0,1)$, $(1,2)$, $(2,4)$, $(3,8)$, $(4,16)$, $(5,32)$, $(6,64)$, $(7,128)$, $(8,256)$, $(9,512)$, $(10,1024)$, $(11, 2048)$, $(12,4096)$, etc. (RAM memory chips come in powers of two because the memory space is exponential in the number of “address lines” on the chip.) Some important input-output pairs for the exponential function are $(0,1)$, because by definition any number to the power 0 is equal to 1, and $(-1,\frac{1}{2^1}=\frac{1}{2}), (-2,\frac{1}{2^2}=\frac{1}{4}$), because negative exponents tells you that you should dividing by that number this many times instead of multiplying.

Function inverse

Function maps inputs x to outputs y, whereas the function inverse maps y back to x. Recall that a bijective function is a one-to-one correspondence between the set of inputs and the set of output values. If $f$ is a bijective function, then there exists an inverse function $f^{-1}$, which performs the inverse mapping of $f$. Thus, if you start from some $x$, apply $f$ and then apply $f^{-1}$, you will get back to the original input $x$: \[ x = f^{-1}\!\left( \; f(x) \; \right). \] This is represented graphically in the diagram on the right.

Function composition

The composition of two functions is another function. We can combine two simple functions to build a more complicated function by chaining them together. The resulting function is denoted \[ z = f\!\circ\!g \, (x) \equiv z = f\!\left( \: g(x) \: \right). \]

The diagram on the left shows a function $g:A\to B$ acting on some input $x$ to produce an intermediary value $y \in B$, which is then input to the function $f:B \to C$ to produce the final output value $z = f(y) = f(g(x))$.

The composition of applying $g$ first followed by $f$ is a function of the form: $f\circ g: A \to C$ defined through the equation $f\circ g(x) = f(g(x))$. Note that “first” in the context of function composition means the first to first to touch the input.

Discussion

In the next sections, we will look into the different functions that you will be dealing with. What we present here is far from and exhaustive list, but if you get a hold of these ones, you will be able to solve any problem a teacher can throw at you.

Links

[ Tank game where you specify the function of the projectile trajectory ]
http://www.graphwar.com/play.html

NOINDENT [ Gallery of function graphs ]
http://mpmath.googlecode.com/svn/gallery/gallery.html

Functions and their inverses

As we saw in the section on solving equations, the ability to “undo” functions is a key skill to have when solving equations.

Example

Suppose you have to solve for $x$ in the equation \[ f(x) = c. \] where $f$ is some function and $c$ is some constant. Our goal is to isolate $x$ on one side of the equation but there is the function $f$ standing in our way.

The way to get rid of $f$ is to apply the inverse function (denoted $f^{-1}$) which will “undo” the effects of $f$. We find that: \[ f^{-1}\!\left( f(x) \right) = x = f^{-1}\left( c \right). \] By definition the inverse function $f^{-1}$ does the opposite of what the function $f$ does so together they cancel each other out. We have $f^{-1}(f(x))=x$ for any number $x$.

Provided everything is kosher (the function $f^{-1}$ must be defined for the input $c$), the manipulation we made above was valid and we have obtained the answer $x=f^{-1}( c)$.

\[ \ \]

Note the new notation for denoting the function inverse $f^{-1}$ that we introduced in the above example. This notation is borrowed from the notion of “inverse number”. Multiplication by the number $d^{-1}$ is the inverse operation of multiplication by the number $d$: $d^{-1}dx=1x=x$. In the case of functions, however, the negative one exponent does not mean the inverse number $\frac{1}{f(x)}=(f(x))^{-1}$ but functions inverse, i.e., the number $f^{-1}(y)$ is equal to the number $x$ such that $f(x)=y$.

You have to be careful because sometimes the applying the inverse leads to multiple solutions. For example, the function $f(x)=x^2$ maps two input values ($x$ and $-x$) to the same output value $x^2=f(x)=f(-x)$. The inverse function of $f(x)=x^2$ is $f^{-1}(x)=\sqrt{x}$, but both $x=+\sqrt{c}$ and $x=-\sqrt{c}$ would be solutions to the equation $x^2=c$. A shorthand notation to indicate the solutions for this equation is $x=\pm c$.

Formulas

Here is a list of common functions and their inverses:

\[ \begin{align*} \textrm{function } f(x) & \ \Leftrightarrow \ \ \textrm{inverse } f^{-1}(x) \nl x+2 & \ \Leftrightarrow \ \ x-2 \nl 2x & \ \Leftrightarrow \ \ \frac{1}{2}x \nl -x & \ \Leftrightarrow \ \ -x \nl x^2 & \ \Leftrightarrow \ \ \pm\sqrt{x} \nl 2^x & \ \Leftrightarrow \ \ \log_{2}(x) \nl 3x+5 & \ \Leftrightarrow \ \ \frac{1}{3}(x-5) \nl a^x & \ \Leftrightarrow \ \ \log_a(x) \nl \exp(x)=e^x & \ \Leftrightarrow \ \ \ln(x)=\log_e(x) \nl \sin(x) & \ \Leftrightarrow \ \ \arcsin(x)=\sin^{-1}(x) \nl \cos(x) & \ \Leftrightarrow \ \ \arccos(x)=\cos^{-1}(x) \end{align*} \]

The function-inverse relationship is reflexive. This means that if you see a function on one side of the above table (no matter which), then its inverse is on the opposite side.

Example

Let's say your teacher doesn't like you and right away on the first day of classes, he gives you a serious equation and wants you to find $x$: \[ \log_5\left(3 + \sqrt{6\sqrt{x}-7} \right) = 34+\sin(5.5)-\Psi(1). \] Do you see now what I meant when I said that the teacher doesn't like you?

First note that it doesn't matter what $\Psi$ is, since $x$ is on the other side of the equation. We can just keep copying $\Psi(1)$ from line to line and throw the ball back to the teacher in the end: “My answer is in terms of your variables dude. You have to figure out what the hell $\Psi$ is since you brought it up in the first place.” The same goes with $\sin(5.5)$. If you don't have a calculator, don't worry about it. We will just keep the expression $\sin(5.5)$ instead of trying to find its numerical value. In general, you should try to work with variables as much as possible and leave the numerical computations for the last step.

OK, enough beating about the bush. Let's just find $x$ and get it over with! On the right side of the equation, we have the sum of a bunch of terms and no $x$ in them so we will just leave them as they are. On the left-hand side, the outer most function is a logarithm base $5$. Cool. No problem. Looking in the table of inverse functions we find that the exponential function is the inverse of the logarithm: $a^x \Leftrightarrow \log_a(x)$. To get rid of the $\log_5$ we must apply the exponential function base five to both sides: \[ 5^{ \log_5\left(3 + \sqrt{6\sqrt{x}-7} \right) } = 5^{ 34+\sin(5.5)-\Psi(1) }, \] which simplifies to: \[ 3 + \sqrt{6\sqrt{x}-7} = 5^{ 34+\sin(5.5)-\Psi(1) }, \] since $5^x$ canceled the $\log_5 x$.

From here on it is going to be like if Bruce Lee walked into a place with lots of bad guys. Addition of $3$ is undone by subtracting $3$ on both sides: \[ \sqrt{6\sqrt{x}-7} = 5^{ 34+\sin(5.5)-\Psi(1) } - 3. \] To undo a square root you take the square \[ 6\sqrt{x}-7 = \left(5^{ 34+\sin(5.5)-\Psi(1) } - 3\right)^2. \] Add $7$ to both sides \[ 6\sqrt{x} = \left(5^{ 34+\sin(5.5)-\Psi(1) } - 3\right)^2+7. \] Divide by $6$: \[ \sqrt{x} = \frac{1}{6}\left(\left(5^{ 34+\sin(5.5)-\Psi(1) } - 3\right)^2+7\right), \] and then we square again to get the final answer: \[ \begin{align*} x &= \left[\frac{1}{6}\left(\left(5^{ 34+\sin(5.5)-\Psi(1) } - 3\right)^2+7\right) \right]^2. \end{align*} \]

Did you see what I was doing in each step? Next time a function stands in your way, hit it with its inverse, so that it knows not to ever challenge you again.

Discussion

The recipe I have outlined above is not universal. Sometimes $x$ isn't alone on one side. Sometimes $x$ appears in several places in the same equation so can't just work your way towards $x$ as shown above. You need other techniques for solving equations like that.

The bad news is that there is no general formula for solving complicated equations. The good news is that the above technique of “digging towards $x$” is sufficient for 80% of what you are going to be doing. You can get another 15% if you learn how to solve the quadratic equation: \[ ax^2 +bx + c = 0. \]

Solving third order equations $ax^3+bx^2+cx+d=0$ with pen and paper is also possible, but at this point you really might as well start using a computer to solve for the unknown(s).

There are all kinds of other equations which you can learn how to solve: equations with multiple variables, equations with logarithms, equations with exponentials, and equations with trigonometric functions. The principle of digging towards the unknown and applying the function inverse is very important so be sure to practice it.

Basic rules of algebra

It's important for you to know the general rules for manipulating numbers and variables (algebra) so we will do a little refresher on these concepts to make sure you feel comfortable on that front. We will also review some important algebra tricks like factoring and completing the square which are useful when solving equations.

When an expression contains multiple things added together, we call those things terms. Furthermore, terms are usually composed of many things multiplied together. If we can write a number $x$ as $x=abc$, we say that $x$ factors into $a$, $b$ and $c$. We call $a$, $b$ and $c$ the factors of $x$.

Given any four numbers $a,b,c$ and $d$, we can use the following algebra properties:

  1. Associative property: $a+b+c=(a+b)+c=a+(b+c)$ and $abc=(ab)c=a(bc)$.
  2. Commutative property: $a+b=b+a$ and $ab=ba$.
  3. Distributive property: $a(b+c)=ab+ac$.

We use the distributive property every time we expand a bracket. For example $a(b+c+d)=ab + ac + ad$. The opposite operation of expanding is called factoring and consists of taking out the common parts of an expression to the front of a bracket: $ac+ac = a(b+c)$. We will discuss both of these operations in this section and illustrate what they are used for.

Expanding brackets

The distributive property is useful when you are dealing with polynomials: \[ (x+3)(x+2)=x(x+2) + 3(x+2)= x^2 +x2 +3x + 6. \] We can now use the commutative property on the second term $x2=2x$, and then combine the two $x$ terms into a single one to obtain \[ (x+3)(x+2)= x^2 + 5x + 6. \]

This calculation shown above happens so often that it is good idea to see it in more abstract form: \[ (x+a)(x+b) = x(x+b) + a(x+b) = x^2 + (a+b)x + ab. \] The product of two linear terms (expressions of the form $x+?$) is equal to a quadratic expression. Furthermore, observe that the middle term on the right-hand side contains the sum of the two constants on the left-hand side while the third term contains the their product.

It is a very common for people to get this wrong and write down false equations like $(x+a)(x+b)=x^2+ab$ or $(x+a)(x+b)=x^2+a+b$ or some variation of the above. You will never make such a mistake if you keep in mind the distributive property and expand the expression using a step-by-step approach. As a second example, consider the slightly more complicated algebraic expression and its expansion: \[ \begin{align*} (x+a)(bx^2+cx+d) &= x(bx^2+cx+d) + a(bx^2+cx+d) \nl &= bx^3+cx^2+dx + abx^2 +acx +ad \nl &= bx^3+ (c+ab)x^2+(d+ac)x +ad. \end{align*} \] Note how we grouped together all the terms which contain $x^2$ in one term and all the terms which contain $x$ in a second term. This is a common pattern when dealing with expressions which contain different powers of $x$.

Example

Suppose we are asked to solve for $t$ in the following equation \[ 7(3 + 4t) = 11(6t - 4). \] The unknown $t$ appears on both sides of the equation so it is not immediately obvious how to proceed.

To solve for $t$ in the above equation, we have to bring all the $t$ terms to one side and all the constant terms to the other side. The first step towards this goal is to expand the two brackets to obtain \[ 21 + 28t = 66t - 44. \] Now we move things around to get all the $t$s on the right-hand side and all the constants on the left-hand side \[ 21 + 44 = 66t - 28t. \] We see that $t$ is contained in both terms on the right-hand side so we can rewrite the equation as \[ 21 + 44 = (66 - 28)t. \] The answer is now obvious $t = \frac{21 + 44}{66 - 28} = \frac{65}{38}$.

Factoring

Factoring means to take out some common part in a complicated expression so as to make it more compact. Suppose you are given the expression $6x^2y + 15x$ and you are asked to simplify it by “taking out” common factors. The expression has two terms and when we split each terms into it constituent factors we obtain: \[ 6x^2y + 15x = (3)(2)(x)(x)y + (5)(3)x. \] We see that the factors $x$ and $3$ appear in both terms. This means we can factor them out to the front like this: \[ 6x^2y + 15x = 3x(2xy+5). \] The expression on the right is easier to read than the expression on the right since it shows that the $3x$ part is common to both terms.

Here is another example of where factoring can help us simplify an expression: \[ 2x^2y + 2x + 4x = 2x(xy+1+2) = 2x(xy+3). \]

Quadratic factoring

When dealing with a quadratic function, it is often useful to rewrite it as a product of two factors. Suppose you are given the quadratic function $f(x)=x^2-5x+6$ and asked to describe its properties. What are the roots of this function, i.e., for what values of $x$ is this function equal to zero? For which values of $x$ is the function positive and for which values is it negative?

When looking at the expression $f(x)=x^2-5x+6$, the properties of the function are not immediately apparent. However, if we factor the expression $x^2+5x+6$, we will be able to see its properties more clearly. To factor a quadratic expression is to express it as product of two factors: \[ f(x) = x^2-5x+6 = (x-2)(x-3). \] We can now see immediately that its solutions (roots) are at $x_1=2$ and $x_2=3$. You can also see that, for $x>3$, the function is positive since both factors will be positive. For $x<2$ both factors will be negative, but a negative times a negative gives positive, so the function will be positive overall. For values of $x$ such that $2<x<3$, the first factor will be positive, and the second negative so the overall function will be negative.

For some simple quadratics like the above one you can simply guess what the factors will be. For more complicated quadratic expressions, you need to use the quadratic formula. This will be the subject of the next section. For now let us continue with more algebra tricks.

Completing the square

Any quadratic expression $Ax^2+Bx+C$ can be written in the form $A(x-h)^2+k$. This is because all quadratic functions with the same quadratic coefficient are essentially shifted versions of each other. By completing the square we are making these shifts explicit. The value of $h$ is how much the function is shifted to the right and the value $k$ is the vertical shift.

Let's try to find the values $A,k,h$ for the quadratic expression discussed in the previous section: \[ x^2+5x+6 = A(x-h)^2+k = A(x^2-2hx + h^2) + k = Ax^2 - 2Ahx + Ah^2 + k. \]

By focussing on the quadratic terms on both sides of the equation we see that $A=1$, so we have \[ x^2+\underline{5x}+6 = x^2 \underline{-2hx} + h^2 + k. \] Next we look at the terms multiplying $x$ (underlined), and we see that $h=-2.5$, so we obtain \[ x^2+5x+\underline{6} = x^2 - 2(-2.5)x + \underline{(-2.5)^2 + k}. \] Finally, we pick a value of $k$ which would make the constant terms (underlined again) match \[ k = 6 - (-2.5)^2 = 6 - (2.5)^2 = 6 - \left(\frac{5}{2}\right)^2 = 6\times\frac{4}{4} - \frac{25}{4} = \frac{24 - 25}{4} = \frac{-1}{4}. \] This is how we complete the square, to obtain: \[ x^2+5x+6 = (x+2.5)^2 - \frac{1}{4}. \] The right-hand side in the above expression tells us that our function is equivalent to the basic function $x^2$, shifted $2.5$ units to the left, and $\frac{1}{4}$ units downwards. This would be really useful information if you ever had to draw this function, since it is easy to plot the basic graph of $x^2$ and then shift it appropriately.

It is important that you become comfortable with the procedure for completing the square outlined above. It is not very difficult, but it requires you to think carefully about the unknowns $h$ and $k$ and to choose their values appropriately. There is a simple rule you can remember for completing the square in an expression of the form $x^2+bx+c=(x-h)^2+k$: you have to use half of the coefficient of the $x$ term inside the bracket, i.e., $h=-\frac{b}{2}$. You can then work out both sides of the equation and choose $k$ so that the constant terms match. Take out a pen and a piece of paper now and verify that you can correctly complete the square in the following expressions $x^{2} - 6 x + 13=(x-3)^2 + 4$ and $x^{2} + 4 x + 1=(x + 2)^2 -3$.

Solving quadratic equations

What would you do if you were asked to find $x$ in the equation $x^2 = 45x + 23$? This is called a quadratic equation since it contains the unknown variable $x$ squared. The name name comes from the Latin quadratus, which means square. Quadratic equations come up very often so mathematicians came up with a general formula for solving these equations. We will learn about this formula in this section.

Before we can apply the formula, we need to rewrite the equation in the form \[ ax^2 + bx + c = 0, \] where we moved all the numbers and $x$s to one side and left only $0$ on the other side. This is the called the standard form of the quadratic equation. For example, to get the expression $x^2 = 45x + 23$ into the standard form, we can subtract $45x+23$ from both sides of the equation to obtain $x^2 - 45x - 23 = 0$. What are the values of $x$ that satisfy this formula?

Claim

The solutions to the equation \[ ax^2 + bx + c = 0, \] are \[ x_1 = \frac{-b + \sqrt{b^2-4ac} }{2a} \ \ \text{ and } \ \ x_2 = \frac{-b - \sqrt{b^2-4ac} }{2a}. \]

Let us now see how this formula is used to solve the equation $x^2 - 45x - 23 = 0$. Finding the two solutions is a simple mechanical task of identifying $a$, $b$ and $c$ and plugging these numbers into the formula: \[ x_1 = \frac{45 + \sqrt{45^2-4(1)(-23)} }{2} = 45.5054\ldots, \] \[ x_2 = \frac{45 - \sqrt{45^2-4(1)(-23)} }{2} = -0.5054\ldots. \]

Proof of claim

This is an important proof. You should know how to derive the quadratic formula in case your younger brother asks you one day to derive the formula from first principles. To derive this formula, we will use the completing-the-square technique which we saw in the previous section. Don't bail out on me now, the proof is only two pages.

Starting from the equation $ax^2 + bx + c = 0$, our first step will be to move $c$ to the other side of the equation \[ ax^2 + bx = -c, \] and then to divide by $a$ on both sides \[ x^2 + \frac{b}{a}x = -\frac{c}{a}. \]

Now we must complete the square on the left-hand side, which is to say we ask the question: what are the values of $h$ and $k$ for this equation to hold \[ (x-h)^2 + k = x^2 + \frac{b}{a}x = -\frac{c}{a}? \] To find the values for $h$ and $k$, we will expand the left-hand side to obtain $(x-h)^2 + k= x^2 -2hx +h^2+k$. We can now identify $h$ by looking at the coefficients in front of $x$ on both sides of the equation. We have $-2h=\frac{b}{a}$ and hence $h=-\frac{b}{2a}$.

So what do we have so far: \[ \left(x + \frac{b}{2a} \right)^2 = \left(x + \frac{b}{2a} \right)\!\!\left(x + \frac{b}{2a} \right) = x^2 + \frac{b}{2a}x + x\frac{b}{2a} + \frac{b^2}{4a^2} = x^2 + \frac{b}{a}x + \frac{b^2}{4a^2}. \] If we want to figure out what $k$ is, we just have to move that last term to the other side: \[ \left(x + \frac{b}{2a} \right)^2 - \frac{b^2}{4a^2} = x^2 + \frac{b}{a}x. \]

We can now continue with the proof where we left off \[ x^2 + \frac{b}{a}x = -\frac{c}{a}. \] We replace the left-hand side by the complete-the-square expression and obtain \[ \left(x + \frac{b}{2a} \right)^2 - \frac{b^2}{4a^2} = -\frac{c}{a}. \] From here on, we can use the standard procedure for solving equations. We put all the constants on the right-hand side \[ \left(x + \frac{b}{2a} \right)^2 = -\frac{c}{a} + \frac{b^2}{4a^2}. \] Next we take the square root of both sides. Since the square function maps both positive and negative numbers to the same value, this step will give us two solutions: \[ x + \frac{b}{2a} = \pm \sqrt{ -\frac{c}{a} + \frac{b^2}{4a^2} }. \] Let's take a moment to cleanup the mess on the right-hand side a bit: \[ \sqrt{ -\frac{c}{a} + \frac{b^2}{4a^2} } = \sqrt{ -\frac{(4a)c}{(4a)a} + \frac{b^2}{4a^2} } = \sqrt{ \frac{- 4ac + b^2}{4a^2} } = \frac{\sqrt{b^2 -4ac} }{ 2a }. \]

Thus we have: \[ x + \frac{b}{2a} = \pm \frac{\sqrt{b^2 -4ac} }{ 2a }, \] which is just one step away from the final answer \[ x = \frac{-b}{2a} \pm \frac{\sqrt{b^2 -4ac} }{ 2a } = \frac{-b \pm \sqrt{b^2 -4ac} }{ 2a }. \] This completes the proof.

Alternative proof of claim

To have a proof we don't necessarily need to show the derivation of the formula as we did. The claim was that $x_1$ and $x_2$ are solutions. To prove the claim we could have simply plugged $x_1$ and $x_2$ into the quadratic equation and verified that we get zero. Verify on your own.

Applications

The Golden Ratio

The golden ratio, usually denoted $\varphi=\frac{1+\sqrt{5}}{2}=1.6180339\ldots$ is a very important proportion in geometry, art, aesthetics, biology and mysticism. It comes about from the solution to the quadratic equation \[ x^2 -x -1 = 0. \]

Using the quadratic formula we get the two solutions: \[ x_1 = \frac{1+\sqrt{5}}{2} = \varphi, \qquad x_2 = \frac{1-\sqrt{5}}{2} = - \frac{1}{\varphi}. \]

You can learn more about the various contexts in which the golden ratio appears from the excellent wikipedia article on the subject. We will also see the golden ratio come up again several times in the remainder of the book.

Explanations

Multiple solutions

Often times, we are interested in only one of the two solutions to the quadratic equation. It will usually be obvious from the context of the problem which of the two solutions should be kept and which should be discarded. For example, the time of flight of a ball thrown in the air from a height of $3$ meters with an initial velocity of $12$ meters per second is obtained by solving a quadratic equation $0=(-4.9)t^2+12t+3$. The two solutions of the quadratic equation are $t_1=-0.229$ and $t_2=2.678$. The first answer $t_1$ corresponds to a time in the past so must be rejected as invalid. The correct answer is $t_2$. The ball will hit the ground after $t=2.678$ seconds.

Relation to factoring

In the previous section we discussed the quadratic factoring operation by which we could rewrite a quadratic function as the product of two terms $f(x)=ax^2+bx+c=(x-x_1)(x-x_2)$. The two numbers $x_1$ and $x_2$ are called the roots of the function: this is where the function $f(x)$ touches the $x$ axis.

Using the quadratic equation you now have the ability to factor any quadratic equation. Just use the quadratic formula to find the two solutions $x_1$ and $x_2$ and then you can rewrite the expression as $(x-x_1)(x-x_2)$.

Some quadratic expression cannot be factored, however. These correspond to quadratic functions whose graphs do not touch the $x$ axis. They have no solutions (no roots). There is a quick test you can use to check if a quadratic function $f(x)=ax^2+bx+c$ has roots (touches or crosses the $x$ axis) or doesn't have roots (never touches the $x$ axis). If $b^2-4ac>0$ then the function $f$ has two roots. If $b^2-4ac=0$, the function has only one root. This corresponds to the special case when the function touches the $x$ axis only at one point. If $b^2-4ac<0$, the function has no real roots. If you try to use the formula for finding the solutions, you will fail because taking the square root of a negative number is not allowed. Think about it—how could you square a number and obtain a negative number?

Polynomials

The polynomials are a very simple and useful family of functions. For example quadratic polynomials of the form $f(x) = ax^2 + bx +c$ often arise in the description of physics phenomena.

Definitions

  • $x$: the variable
  • $f(x)$: the polynomial. We sometimes sometimes denote polynomials $P(x)$ to

distinguish them from generic function $f(x)$.

  • degree of $f(x)$: the largest power of $x$ that appears in the polynomial
  • roots of $f(x)$: the values of $x$ for which $f(x)=0$

Polynomials

The most general polynomial of the first degree is a line $f(x) = mx + b$, where $m$ and $b$ are arbitrary constants.

The most general polynomial of second degree is $f(x) = a_2 x^2 + a_1 x + a_0$, where again $a_0$, $a_1$ and $a_2$ are arbitrary constants. We call $a_k$ the coefficient of $x^k$ since this is the number that appears in front of it.

By now you should be able to guess that a third degree polynomial will look like $f(x) = a_3 x^3 + a_2 x^2 + a_1 x + a_0$.

In general, a polynomial of degree $n$ has equation: \[ f(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_2 x^2 + a_1 x + a_0. \] or if you want to use the sum notation we can write it as: \[ f(x) = \sum_{k=0}^n a_kx^k, \] where $\Sigma$ (the capital Greek letter sigma) stands for summation.

Solving polynomial equations

Very often you will have to solve a polynomial equations of the form: \[ A(x) = B(x), \] where $A(x)$ and $B(x)$ are both polynomials. Remember that solving means to find the value of $x$ which makes the equality true.

For example, say the revenue of your company, as function of the number of products sold $x$ is given by $R(x)=2x^2 + 2x$ and the costs you incur to produce $x$ objects is $C(x)=x^2+5x+10$. A very natural question to ask is the amount of product you need to produce to break even, i.e., to make your revenue equal your costs $R(x)=C(x)$. To find the break-even $x$, you will have to solve the following equation: \[ 2x^2 + 2x = x^2+5x+10. \]

This may seem complicated since there are $x$s all over the place and it is not clear how to find the value of $x$ that makes this equation true. No worries though, we can turn this equation into the “standard form” and then use the quadratic equation. To do this, we will move all the terms to one side until we have just zero on the other side: \[ \begin{align} 2x^2 + 2x \ \ \ -x^2 &= x^2+5x+10 \ \ \ -x^2 \nl x^2 + 2x \ \ \ -5x &= 5x+10 \ \ \ -5x \nl x^2 - 3x \ \ \ -10 &= 10 \ \ \ -10 \nl x^2 - 3x -10 &= 0. \end{align} \]

Remember that if we do the same thing on both sides of the equation, it remains true. Therefore, the values of $x$ that satisfy \[ x^2 - 3x -10 = 0, \] namely $x=-2$ and $x=5$, will also satisfy \[ 2x^2 + 2x = x^2+5x+10, \] which was the original problem that we were trying to solve.

This “shuffling of terms” approach will work for any polynomial equation $A(x)=B(x)$. We can always rewrite it as some $C(x)=0$, where $C(x)$ is a new polynomial that has as coefficients the difference of the coefficients of $A$ and $B$. Don't worry about which side you move all the coefficients to because $C(x)=0$ and $0=-C(x)$ have exactly the same solutions. Furthermore, the degree of the polynomial $C$ can be no greater than that of $A$ or $B$.

The form $C(x)=0$ is the standard form of a polynomial and, as you will see shortly, there are formulas which you can use to find the solution(s).

Formulas

The formula for solving the polynomial equation $P(x)=0$ depend on the degree of the polynomial in question.

First

For first degree: \[ P_1(x) = mx + b = 0, \] the solution is $x=b/m$. Just move $b$ to the other side and divide by $m$.

Second

For second degree: \[ P_2(x) = ax^2 + bx + c = 0, \] the solutions are $x_1=\frac{-b + \sqrt{ b^2 -4ac}}{2a}$ and $x_2=\frac{-b - \sqrt{b^2-4ac}}{2a}$.

Note that if $b^2-4ac < 0$, the solutions involve taking the square root of a negative number. In those cases, we say that no real solutions exist.

Third

The solutions to the cubic polynomial equation \[ P_3(x) = x^3 + ax^2 + bx + c = 0, \] are given by \[ x_1 = \sqrt[3]{ q + \sqrt{p} } \ \ + \ \sqrt[3]{ q - \sqrt{p} } \ -\ \frac{a}{3}, \] and \[ x_{2,3} = \left( \frac{ -1 \pm \sqrt{3}i }{2} \right)\sqrt[3]{ q + \sqrt{p} } \ \ + \ \left( \frac{ -1 \pm \sqrt{3}i }{2} \right) \sqrt[3]{ q - \sqrt{p} } \ - \ \frac{a}{3}, \] where $q \equiv \frac{-a^3}{27}+ \frac{ab}{6} - \frac{c}{2}$ and $p \equiv q^2 + \left(\frac{b}{3}-\frac{a^2}{9}\right)^3$.

Note that, in my entire career as an engineer, physicist and computer scientist, I have never used the cubic equation to solve a problem by hand. In math homework problems and exams you will not be asked to solve equations of higher than second degree, so don't bother memorizing the solutions of the cubic equation. I included the formula here just for completeness.

Higher degrees

There is also a formula for polynomials of degree $4$, but it is complicated. For polynomials with order $\geq 5$, there does not exist a general analytical solution.

Using a computer

When solving real world problems, you will often run into much more complicated equations. For anything more complicated than the quadratic equation, I recommend that you use a computer algebra system like sympy to find the solutions. Go to http://live.sympy.org and type in:

 >>> solve( x**2 - 3*x +2, x)      [ shift + Enter ]
 [1, 2]

Indeed $x^2-3x+2=(x-1)(x-2)$ so $x=1$ and $x=2$ are the two solutions.

Substitution trick

Sometimes you can solve polynomials of fourth degree by using the quadratic formula. Say you are asked to solve for $x$ in \[ g(x) = x^4 - 3x^2 -10 = 0. \] Imagine this comes up on your exam. Clearly you can't just type it into a computer, since you are not allowed the use of a computer, yet the teacher expects you to solve this. The trick is to substitute $y=x^2$ and rewrite the same equation as: \[ g(y) = y^2 - 3y -10 = 0, \] which you can now solve by the quadratic formula. If you obtain the solutions $y=\alpha$ and $y=\beta$, then the solutions to the original fourth degree polynomial are $x=\sqrt{\alpha}$ and $x=\sqrt{\beta}$ since $y=x^2$.

Of course, I am not on an exam, so I am allowed to use a computer:

 >>> solve(y**2 - 3*y -10, y)
 [-2, 5]
 >>> solve(x**4 - 3*x**2 -10 , x)
 [sqrt(2)i, -sqrt(2)i, -sqrt(5) , sqrt(5) ]

Note how a 2nd degree polynomial has two roots and a fourth degree polynomial has four roots, two of which are imaginary, since we had to take the square root of a negative number to obtain them. We write $i=\sqrt{-1}$. If this was asked on an exam though, you should probably just report the two real solutions: $\sqrt{5}$ and $-\sqrt{5}$ and not talk about the imaginary solutions since you are not supposed to know about them yet. If you feel impatient though, and you want to know about the complex numbers right now you can skip ahead to the section on complex numbers.

Logarithms

The word “logarithm” makes most people think about some mythical mathematical beast. Surely logarithms are many headed, breathe fire and are extremely difficult to understand. Nonsense! Logarithms are simple. It will take you at most a couple of pages to get used to manipulating them, and that is a good thing because logarithms are used all over the place.

For example, the strength of your sound system is measured in logarithmic units called decibels $[\textrm{dB}]$. This is because your ear is sensitive only to exponential differences in sound intensity. Logarithms allow us to compare very large numbers and very small numbers on the same scale. If we were measuring sound in linear units instead of logarithmic units then your sound system volume control would have to go from $1$ to $1048576$. That would be weird no? This is why we use the logarithmic scale for the volume notches. Using a logarithmic scale, we can go from sound intensity level $1$ to sound intensity level $1048576$ in 20 “progressive” steps. Assume each notch doubles the sound intensity instead of increasing it by a fixed amount, the first notch corresponds to $2$, the second notch is $4$ (still probably inaudible) but by the time you get to sixth notch you are at $2^6=64$ sound intensity (audible music). The tenth notch corresponds to sound intensity $2^{10}=1024$ (medium strength sound) and the finally the twentieth notch will be max power $2^{20}=1048576$ (at this point the neighbours will come knocking to complain).

Definitions

You are probably familiar with these concepts already:

  • $b^x$: the exponential function base $b$
  • $\exp(x)=e^x$: the exponential function base $e$, Euler's number
  • $2^x$: exponential function base $2$
  • $f(x)$: the notion of a function $f:\mathbb{R}\to\mathbb{R}$
  • $f^{-1}(x)$: the inverse function of $f(x)$. It is defined in terms of

$f(x)$ such that the following holds $f^{-1}(f(x))=x$, i.e.,

  if you apply $f$ to some number and get the output $y$,
  and then you pass $y$ through $f^{-1}$ the output will be $x$ again.
  The inverse function $f^{-1}$ undoes the effects of the function $f$.

NOINDENT In this section we will play with the following new concepts:

  • $\log_b(x)$: logarithm of $x$ base $b$. This is the inverse function of $b^x$
  • $\ln(x)$; the “natural” logarithm base $e$. This is the inverse of $e^x$
  • $\log_2(x)$: the logarithm base $2$ is is the inverse of $2^x$

I say play, because there is nothing much new to learn here: logarithms are just a clever way to talk about the size of number – i.e., how many digits the number has.

Formulas

The main thing to realize is that $\log$s don't really exist on their own. They are defined as the inverses of the corresponding exponential function. The following statements are equivalent: \[ \log_b(x)=m \ \ \ \ \ \Leftrightarrow \ \ \ \ \ b^m=x. \]

For logarithms with base $e$ one writes $\ln(x)$ for “logarithme naturel” because $e$ is the “natural” base. Another special base is $10$ because we use the decimal system for our numbers. $\log_{10}(x)$ tells you roughly the size of the number $x$—how many digits the number has.

Example

When someone working for the system (say someone with a high paying job in the financial sector) boasts about his or her “six-figure” salary, they are really talking about the $\log$ of how much money they make. The “number of figures” $N_S$ in you salary is calculated as one plus the logarithm base ten of your salary $S$. The formula is \[ N_S = 1 + \log_{10}(S). \] So a salary of $S=100\:000$ corresponds to $N_S=1+\log_{10}(100\:000)=1+5=6$ figures. What will be the smallest “seven figure” salary? We have to solve for $S$ given $N_S=7$ in the formula. We get $7 = 1+\log_{10}(S)$ which means that $6=\log_{10}(S)$ and using the inverse relationship between logarithm base ten and exponentiation base ten we find that $S=10^6 = 1\:000\:000$. One million per year. Yes, for this kind of money I see how someone might want to work for the system. But I don't think most system pawns ever make it to the seven figure level. Even at the higher ranks, the salaries are more in the $1+\log_{10}(250\:000) = 1+5.397=6.397$ digits range. There you have it. Some of the smartest people out there selling their brains out to the finance sector for some lousy $0.397$ extra digits. What wankers! And who said you need to have a six digit salary in the first place? Why not make $1+\log_{10}(44\:000)=5.64$ digits as a teacher and do something with your life that actually matters?

Properties

Let us now discuss two important properties that you will need to use when dealing with logarithms. Pay attention because the arithmetic rules for logarithms are very different from the usual rules for numbers. Intuitively, you can think of logarithms as a convenient of referring to the exponents of numbers. The following properties are the logarithmic analogues of the properties of exponents

Property 1

The first property states that the sum of two logarithms is equal to the logarithm of the product of the arguments: \[ \log(x)+\log(y)=\log(xy). \] From this property, we can derive two other useful ones: \[ \log(x^k)=k\log(x), \] and \[ \log(x)-\log(y)=\log\left(\frac{x}{y}\right). \]

Proof: For all three equations above we have to show that the expression on the left is equal to the expression on the right. We have only been acquainted with logarithms for a very short time, so we don't know each other that well. In fact, the only thing we know about $\log$s is the inverse relationship with the exponential function. So the only way to prove this property is to use this relationship.

The following statement is true for any base $b$: \[ b^m b^n = b^{m+n}, \] which follows from first principles. Exponentiation means multiplying together the base many times. If you count the total number of $b$s on the left side you will see that there is a total of $m+n$ of them, which is what we have on the right.

If you define some new variables $x$ and $y$ such that $b^m=x$ and $b^n=y$ then the above equation will read \[ xy = b^{m+n}, \] if you take the logarithm of both sides you get \[ \log_b(xy) = \log_b\left( b^{m+n} \right) = m + n = \log_b(x) + \log_b(y). \] In the last step we used the definition of the $\log$ function again which states that $b^m=x \ \ \Leftrightarrow \ \ m=\log_b(x)$ and $b^n=y \ \ \Leftrightarrow \ \ n=\log_b(y)$.

Property 2

We will now discuss the rule for changing from one base to another. Is a relation between $\log_{10}(S)$ and $\log_2(S)$?

There is. We can express the logarithm in any base $B$ in terms of a ratio of logarithms in another base $b$. The general formula is: \[ \log_{B}(x) = \frac{\log_b(x)}{\log_b(B)}. \]

This means that: \[ \log_{10}(S) =\frac{\log_{10}(S)}{1} =\frac{\log_{10}(S)}{\log_{10}(10)} = \frac{\log_{2}(S)}{\log_{2}(10)}=\frac{\ln(S)}{\ln(10)}. \]

This property is very useful in case when you want to compute $\log_{7}$, but your calculator only gives you $\log_{10}$. You can simulate $\log_7(x)$ by computing $\log_{10}(x)$ and dividing by $\log_{10}(7)$.

Geometry

Triangles

The area of a triangle is equal to $\frac{1}{2}$ times the length of the base times the height: \[ A = \frac{1}{2} a h_a. \] Note that $h_a$ is the height of the triangle relative to the side $a$.

The perimeter of the triangle is: \[ P = a + b + c. \]

Consider now a triangle with internal angles $\alpha$, $\beta$ and $\gamma$. The sum of the inner angles in any triangle is equal to two right angles: $\alpha+\beta+\gamma=180^\circ$.

The sine law is: \[ \frac{a}{\sin(\alpha)}=\frac{b}{\sin(\beta)}=\frac{c}{\sin(\gamma)}, \] where $\alpha$ is the angle opposite to $a$, $\beta$ is the angle opposite to $b$ and $\gamma$ is the angle opposite to $c$.

The cosine rules are: \[ \begin{align} a^2 & =b^2+c^2-2bc\cos(\alpha), \nl b^2 & =a^2+c^2-2ac\cos(\beta), \nl c^2 & =a^2+b^2-2ab\cos(\gamma). \end{align} \]

Sphere

A sphere is described by the equation \[ x^2 + y^2 + z^2 = r^2. \]

Surface area: \[ A = 4\pi r^2. \]

Volume: \[ V = \frac{4}{3}\pi r^3. \]

Cylinder

 A cylinder of radius r and height h.

The surface area of a cylinder consists of the top and bottom circular surfaces plus the area of the side of the cylinder: \[ A = 2 \left( \pi r^2 \right) + (2\pi r) h. \]

The volume is given by product of the area of the base times the height of the cylinder: \[ V = \left(\pi r^2 \right)h. \]

Example

You open the hood of your car and see 2.0L written on top of the engine. The 2[L] refers to the total volume of the four pistons, which are cylindrical in shape. You look in the owner's manual and find out that the diameter of each piston (bore) is 87.5[mm] and the height of each piston (stroke) is 83.1[mm]. Verify that the total volume of the cylinder displacement of your engine is indeed 1998789[mm$^3$] $\approx 2$[L].

Links

[ A formula for calculating the distance between two points on a sphere ]
http://www.movable-type.co.uk/scripts/latlong.html

Definitions

Calculus is the study of functions $f(x)$ over the real numbers $\mathbb{R}$: \[ f: \mathbb{R} \to \mathbb{R}. \] The function $f$ takes as input some number, usually called $x$ and gives as output another number $f(x)=y$. You are familiar with many functions and have used them in many problems.

In this chapter we will learn about different operations that can be performed on functions. It worth understanding these operations because of the numerous applications which they have.

Differential calculus

Differential calculus is all about derivatives:

  • $f'(x)$: the derivative of $f(x)$ is the rate of change of $f$ at $x$.

The derivative is also a function of the form

  \[
     f': \mathbb{R} \to \mathbb{R},
  \]
  The output of $f'(x)$ represents the //slope// of 
  a line parallel (tangent) to $f$ at the point $(x,f(x))$.

Integral calculus

Integral calculus is all about integration:

  • $\int_a^b f(x)\:dx$: the integral of $f(x)$ from $x=a$ to $x=b$

corresponds to the area under $f(x)$ between $a$ and $b$:

  \[
      A(a,b) = \int_a^b f(x) \: dx.
  \]
  The $\int$ sign is a mnemonic for //sum//.
  The integral is the "sum" of $f(x)$ over that interval. 
* $F(x)=\int f(x)\:dx$: the anti-derivative of the function $f(x)$ 
  contains the information about the area under the curve for 
  //all// limits of integration.
  The area under $f(x)$ between $a$ and $b$ is computed as the
  difference between $F(b)$ and $F(a)$:
  \[
     A(a,b) = \int_a^b f(x)\;dx = F(b)-F(a).
  \]
  

Sequences and series

Functions are usually defined for continuous inputs $x\in \mathbb{R}$, but there are also functions which are defined only for natural numbers $n \in \mathbb{N}$. Sequences are the discrete analogue functions.

  • $a_n$: sequence of numbers $\{ a_0, a_1, a_2, a_3, a_4, \ldots \}$.

You can think about each sequence as a function

  \[
     a: \mathbb{N} \to \mathbb{R},
  \]
  where the input $n$ is an integer (index into the sequence) and
  the output is $a_n$ which could be any number.

NOINDENT The integral of a sequence is called a series.

  • $\sum$: sum.

The summation sign is the short way to express

  the sum of several objects:
  \[
    a_3 + a_4 + a_5 + a_6 + a_7 
    \equiv \sum_{3 \leq i \leq 7} a_i 
    \equiv \sum_{i=3}^{7} a_i.
  \]
  Note that summations could go up to infinity.
* $\sum a_i$: the series corresponds to the running total of a sequence until $n$:
  \[
     S_n = \sum_{i=1}^{n} a_i  = a_1 + a_2 + \cdots + a_{n-1} + a_n.
  \]
* $f(x)=\sum_{i=0}^\infty a_i x^i$: a //power series// is a series
  which contains powers of some variable $x$.
  Power series give us a way to express any function $f(x)$ as
  an infinitely long polynomial. 
  For example, the power series of $\sin(x)$ is
  \[
    \sin(x) 
       = x - \frac{x^3}{3!}  + \frac{x^5}{5!} 
          - \frac{x^7}{7!} + \frac{x^9}{9!}+ \ldots.
  \]

Don't worry if you don't understand all the notions and the new notation in the above paragraphs. I just wanted to present all the calculus actors in the first scene. We will talk about each of them in more detail in the following sections.

Limits

Actually, we have not mentioned the main actor yet: the limit. In calculus, we do a lot of limit arguments in which we take some positive number $\epsilon>0$ and we make it progressively smaller and smaller:

  • $\displaystyle\lim_{\epsilon \to 0}$: the mathematically rigorous

way of saying that the number $\epsilon$ becomes smaller and smaller. We can also take limits to infinity, that is, we imagine some number $N$ and we make that number bigger and bigger:

  • $\displaystyle\lim_{N \to \infty}$: the mathematical

way of saying that the number $N$ will get larger and larger.

Indeed, it wouldn't be wrong to say that calculus is the study of the infinitely small and the infinitely many. Working with infinitely small quantities an infinitely large numbers can be tricky business but it is extremely important that you become comfortable with the concept of a limit which is the rigorous way of talking about infinity. Before we learn about derivatives, integrals and series we will spend some time learning about limits.

Limits

To understand the ideas behind derivatives and integrals, you need to understand what a limit is and how to deal with the infinitely small, infinitely large and the infinitely many. In practice, using calculus doesn't actually involve taking limits since we will learn direct formulas and algebraic rules that are more convenient than doing limits. Do not skip this section though just because it is “not on the exam”. If you do so, you will not know what I mean when I write things like $0,\infty$ and $\lim$ in later sections.

Introduction in three acts

Zeno's paradox

The ancient greek philosopher Zeno once came up with the following argument. Suppose an archer shoots an arrow and sends it flying towards a target. After some time it will have travelled half the distance, and then at some later time it will have travelled the half of the remaining distance and so on always getting closer to the target. Zeno observed that no matter how little distance remains to the target, there will always be some later instant when the arrow will have travelled half of that distance. Thus, he reasoned, the arrow must keep getting closer and closer to the target, but never reaches it.

Zeno, my brothers and sisters, was making some sort of limit argument, but he didn't do it right. We have to commend him for thinking about such things centuries before calculus was invented (17th century), but shouldn't repeat his mistake. We better learn how to take limits, because limits are important. I mean a wrong argument about limits could get you killed for God's sake! Imagine if Zeno tried to verify experimentally his theory about the arrow by placing himself in front of one such arrow!

Two monks

Two young monks were sitting in silence in a Zen garden one autumn afternoon.
“Can something be so small as to become nothing?” asked one of the monks, braking the silence.
“No,” replied the second monk, “if it is something then it is not nothing.”
“Yes, but what if no matter how close you look you cannot see it, yet you know it is not nothing?”, asked the first monk, desiring to see his reasoning to the end.
The second monk didn't know what to say, but then he found a counterargument. “What if, though I cannot see it with my naked eye, I could see it using a magnifying glass?”.
The first monk was happy to hear this question, because he had already prepared a response for it. “If I know that you will be looking with a magnifying glass, then I will make it so small that you cannot see with you magnifying glass.”
“What if I use a microscope then?”
“I can make the thing so small that even with a microscope you cannot see it.”
“What about an electron microscope?”
“Even then, I can make it smaller, yet still not zero.” said the first monk victoriously and then proceeded to add “In fact, for any magnifying device you can come up with, you just tell me the resolution and I can make the thing smaller than can be seen”.
They went back to concentrating on their breathing.

Epsilon and delta

The monks have the right reasoning but didn't have the right language to express what they mean. Zeno has the right language, the wonderful Greek language with letters like $\epsilon$ and $\delta$, but he didn't have the right reasoning. We need to combine aspects of both of the above stories to understand limits.

Let's analyze first Zeno's paradox. The poor brother didn't know about physics and the uniform velocity equation of motion. If an object is moving with constant speed $v$ (we ignore the effects of air friction on the arrow), then its position $x$ as a function of time is given by \[ x(t) = vt+x_i, \] where $x_i$ is the initial location where the object starts from at $t=0$. Suppose that the archer who fired the arrow was at the origin $x_i=0$ and that the target is at $x=L$ metres. The arrow will hit the target exactly at $t=L/v$ seconds. Shlook!

It is true that there are times when the arrow will be $\frac{1}{2}$, $\frac{1}{4}$, $\frac{1}{8}$th, $\frac{1}{16}$th, and so forth distance from the target. In fact there infinitely many of those fractional time instants before the arrow hits, but that is beside the point. Zeno's misconception is that he thought that these infinitely many timestamps couldn't all fit in the timeline since it is finite. No such problem exists though. Any non-zero interval on the number line contains infinitely many numbers ($\mathbb{Q}$ or $\mathbb{R}$).

Now let's get to the monks conversation. The first monk was talking about the function $f(x)=\frac{1}{x}$. This function becomes smaller and smaller but it never actually becomes zero: \[ \frac{1}{x} \neq 0, \textrm{ even for very large values of } x, \] which is what the monk told us.

Remember that the monk also claimed that the function $f(x)$ can be made arbitrarily small. He wants to show that, in the limit of large values of $x$, the function $f(x)$ goes to zero. Written in math this becomes \[ \lim_{x\to \infty}\frac{1}{x}=0. \]

To convince the second monk that he can really make $f(x)$ arbitrarily small, he invents the following game. The second monk announces a precision $\epsilon$ at which he will be convinced. The first monk then has to choose an $S_\epsilon$ such that for all $x > S_\epsilon$ we will have \[ \left| \frac{1}{x} - 0 \right| < \epsilon. \] The above expression indicates that $\frac{1}{x}\approx 0$ at least up to a precision of $\epsilon$.

The second monk will have no choice but to agree that indeed $\frac{1}{x}$ goes to 0 since the argument can be repeated for any required precision $\epsilon >0$. By showing that the function $f(x)$ approaches $0$ arbitrary closely for large values of $x$, we have proven that $\lim_{x\to \infty}f(x)=0$.

If a function f(x) has a limit L as x goes to infinity, then starting from some point x=S, f(x) will be at most epsilon different from L. More generally, the function $f(x)$ can converge to any number $L$ for as $x$ takes on larger and larger values: \[ \lim_{x \to \infty} f(x) = L. \] The above expressions means that, for any precision $\epsilon>0$, there exists a starting point $S_\epsilon$, after which $f(x)$ equals its limit $L$ to within $\epsilon$ precision: \[ \left|f(x) - L\right| <\epsilon, \qquad \forall x \geq S_\epsilon. \]

Example

You are asked to calculate $\lim_{x\to \infty} \frac{2x+1}{x}$, that is you are given the function $f(x)=\frac{2x+1}{x}$ and you have to figure out what the function looks like for very large values of $x$. Note that we can rewrite the function as $\frac{2x+1}{x}=2+\frac{1}{x}$ which will make it easier to see what is going on: \[ \lim_{x\to \infty} \frac{2x+1}{x} = \lim_{x\to \infty}\left( 2 + \frac{1}{x} \right) = 2 + \lim_{x\to \infty}\left( \frac{1}{x} \right) = 2 + 0, \] since $\frac{1}{x}$ tends to zero for large values of $x$.

In a first calculus course you are not required to prove statements like $\lim_{x\to \infty}\frac{1}{x}=0$, you can just assume that the result is obvious. As the denominator $x$ becomes larger and larger, the fraction $\frac{1}{x}$ becomes smaller and smaller.

Types of limits

Limits to infinity

\[ \lim_{x\to \infty} f(x) \] what happens to $f(x)$ for very large values of $x$.

Limits to a number

The limit of $f(x)$ approaching $x=a$ from above (from the right) is denoted: \[ \lim_{x\to a^+} f(x) \] Similarly, the expression \[ \lim_{x\to a^-} f(x) \] describes what happens to $f(x)$ as $x$ approaches $a$ from below (from the left), i.e., with values like $x=a-\delta$, with $\delta>0, \delta \to 0$. If both limits from the left and from the right of some number are equal, then we can talk about the limit as $x\to a$ without specifying the direction: \[ \lim_{x\to a} f(x) = \lim_{x\to a^+} f(x) = \lim_{x\to a^-} f(x). \]

Example 2

You now asked to calculate $\lim_{x\to 5} \frac{2x+1}{x}$. \[ \lim_{x\to 5} \frac{2x+1}{x} = \frac{2(5)+1}{5} = \frac{11}{5}. \]

Example 3

Find $\lim_{x\to 0} \frac{2x+1}{x}$. If we just plug $x=0$ into the fraction we get an error divide by zero $\frac{2(0)+1}{0}$ so a more careful treatment will be required.

Consider first the limit from the right $\lim_{x\to 0+} \frac{2x+1}{x}$. We want to approach the value $x=0$ with small positive numbers. The best way to carry out the calculation is to define some small positive number $\delta>0$, to choose $x=\delta$, and to compute the limit: \[ \lim_{\delta\to 0} \frac{2(\delta)+1}{\delta} = 2 + \lim_{\delta\to 0} \frac{1}{\delta} = 2 + \infty = \infty. \] We took it for granted that $\lim_{\delta\to 0} \frac{1}{\delta}=\infty$. Intuitively, we can imagine how we get closer and closer to $x=0$ in the limit. When $\delta=10^{-3}$ the function value will be $\frac{1}{\delta}=10^3$. When $\delta=10^{-6}$, $\frac{1}{\delta}=10^6$. As $\delta \to 0$ the function will blow up—$f(x)$ will go up all the way to infinity.

If we take the limit from the left (small negative values of $x$) we get \[ \lim_{\delta\to 0} f(-\delta) =\frac{2(-\delta)+1}{-\delta}= -\infty. \] Therefore, since $\lim_{x\to 0^+}f(x)$ does not equal $\lim_{x\to 0^-} f(x)$, we say that $\lim_{x\to 0} f(x)$ does not exist.

Continuity

A function $f(x)$ is continuous at $a$ if the limit of $f$ as $x\to a$ converges to $f(a)$: \[ \lim_{x \to a} f(x) = f(a). \]

Most functions we will study in calculus are continuous, but not all functions are. For example, functions which make sudden jumps are not continuous. Another examples is the function $f(x)=\frac{2x+1}{x}$ which is discontinuous at $x=0$ (because the limit $\lim_{x \to 0} f(x)$ doesn't exist and $f(0)$ is not defined). Note that $f(x)$ is continuous everywhere else on the real line.

Formulas

We now switch gears into reference mode, as I will state a whole bunch known formulas for limits of various kinds of functions. You are not meant to know why these limit formulas are true, but simply understand what they mean.

The following statements tell you about the relative sizes of functions. If the limit of the ratio of two functions is equal to $1$, then these functions must behave similarly in the limit. If the limit of the ratio goes to zero, then one function must be much larger than the other in the limit.

Limits of trigonometric functions: \[ \lim_{x\rightarrow0}\frac{\sin(x)}{x}=1,\quad \lim_{x\rightarrow0} \cos(x)=1,\quad \lim_{x\rightarrow 0}\frac{1-\cos x }{x}=0, \quad \lim_{x\rightarrow0}\frac{\tan(x)}{x}=1. \]

The number $e$ is defined as one of the following limits: \[ e \equiv \lim_{n\rightarrow\infty}\left(1+\frac{1}{n}\right)^n = \lim_{\epsilon\to 0 }(1+\epsilon)^{1/\epsilon}. \] The first limit corresponds to a compound interest calculation, with annual interest rate of $100\%$ and compounding performed infinitely often.

For future reference, we state some other limits involving the exponential function: \[ \lim_{x\rightarrow0}\frac{{\rm e}^x-1}{x}=1,\qquad \quad \lim_{n\rightarrow\infty}\left(1+\frac{x}{n}\right)^n={\rm e}^x. \]

These are some limits involving logarithms: \[ \lim_{x\rightarrow 0^+}x^a\ln(x)=0,\qquad \lim_{x\rightarrow\infty}\frac{\ln^p(x)}{x^a}=0, \ \forall p < \infty \] \[ \lim_{x\rightarrow0}\frac{\ln(x+a)}{x}=a,\qquad \lim_{x\rightarrow0}\left(a^{1/x}-1\right)=\ln(a). \]

A polynomial of degree $p$ and the exponential function base $a$ with $a > 1$ both go to infinity as $x$ goes to infinity: \[ \lim_{x\rightarrow\infty} x^p= \infty, \qquad \qquad \lim_{x\rightarrow\infty} a^x= \infty. \] Though both functions go to infinity, the exponential function does so much faster, so their relative ratio goes to zero: \[ \lim_{x\rightarrow\infty}\frac{x^p}{a^x}=0, \qquad \mbox{for all } p \in \mathbb{R}, |a|>1. \] In computer science, people make a big deal of this distinction when comparing the running time of algorithms. We say that a function is computable if the number of steps it takes to compute that function is polynomial in the size of the input. If the algorithm takes an exponential number of steps, then for all intents and purposes it is useless, because if you give it a large enough input the function will take longer than the age of the universe to finish.

Other limits: \[ \lim_{x\rightarrow0}\frac{\arcsin(x)}{x}=1,\qquad \lim_{x\rightarrow\infty}\sqrt[x]{x}=1. \]

Limit rules

If you are taking the limit of a fraction $\frac{f(x)}{g(x)}$, and you have $\lim_{x\to\infty}f(x)=0$ and $\lim_{x\to\infty}g(x)=\infty$, then we can informally write: \[ \lim_{x\to \infty} \frac{f(x)}{g(x)} = \frac{\lim_{x\to \infty} f(x)}{ \lim_{x\to \infty} g(x)} = \frac{0}{\infty} = 0, \] since both functions are helping to drive the fraction to zero.

Alternately if you ever get a fraction of the form $\frac{\infty}{0}$ as a limit, then both functions are helping to make the fraction grow to infinity so we have $\frac{\infty}{0} = \infty$.

L'Hopital's rule

Sometimes when evaluating limits of fractions $\frac{f(x)}{g(x)}$, you might end up with a fraction like \[ \frac{0}{0}, \qquad \text{or} \qquad \frac{\infty}{\infty}. \] These are undecidable conditions. Is the effect of the numerator stronger or the effect of the denominator stronger?

One way to find out, is to compare the ratio of their derivatives. This is called L'Hopital's rule: \[ \lim_{x\rightarrow a}\frac{f(x)}{g(x)} \ \ \ \overset{\textrm{H.R.}}{=} \ \ \ \lim_{x\rightarrow a}\frac{f'(x)}{g'(x)}. \]

Derivatives

The derivative of a function $f(x)$ is another function, which we will call $f'(x)$ that tells you the slope of $f(x)$. For example, the constant function $f(x)=c$ has slope $f'(x)=0$, since a constant function is flat. What is the derivative of a line $f(x)=mx+b$? The derivative is the slope right, so we must have $f'(x)=m$. What about more complicated functions?

Definition

The derivative of a function is defined as: \[ f'(x) \equiv \lim_{ \epsilon \rightarrow 0}\frac{f(x+\epsilon)-f(x)}{\epsilon}. \] You can think of $\epsilon$ as a really small number. I mean really small. The above formula is nothing more than the rise-over-run rule for calculating the slope of a line, \[ \frac{ rise } { run } = \frac{ \Delta y } { \Delta x } = \frac{y_f - y_i}{x_f - x_i} = \frac{f(x+\epsilon)\ - \ f(x)}{x + \epsilon \ -\ x}, \] but by taking $\epsilon$ to be really small, we will get the slope at the point $x$.

Derivatives occur so often in math that people have come up with many different notations for them. Don't be fooled by that. All of them mean the same thing $Df(x) = f'(x)=\frac{df}{dx}=\dot{f}=\nabla f$.

Applications

Knowing how to take derivatives is very useful in life. Given some phenomenon described by $f(x)$ you can say how it changes over time. Many times we don't actually care about the value of $f'(x)$, just its sign. If the derivative is positive $f'(x) > 0$, then the function is increasing. If $f'(x) < 0$ then the function is decreasing.

When the function is flat at a certain $x$ then $f'(x)=0$. The points where $f'(x)=0$ (the roots of $f'(x)$) are very important for finding the maximum and minimum values of $f(x)$. Recall how we calculated the maximum height $h$ that projectile reaches by first finding the time $t_{top}$ when its velocity in the $y$ direction was zero $y^\prime(t_{top})=v(t_{top})=0$ and then substituting this time in $y(t)$ to obtain $h=\max\{ y(t) \} =y(t_{top})$.

Example

Now let's take a derivative of $f(x)=2x^2 + 3$ to see how that complicated-looking formula works: \[ f'(x)=\lim_{\epsilon \rightarrow 0} \frac{f(x+\epsilon)-f(x)}{\epsilon} = \lim_{\epsilon \rightarrow 0} \frac{2(x+\epsilon)^2+3 \ \ - \ \ 2x^2 + 3}{\epsilon}. \] Let's simplify the right-hand side a bit \[ \frac{2x^2+ 4x\epsilon +\epsilon^2 - 2x^2}{\epsilon} = \frac{4x\epsilon +\epsilon^2}{\epsilon}= \frac{4x\epsilon}{\epsilon} + \frac{\epsilon^2}{\epsilon}. \] Now when we take the limit, the second term disappears: \[ f'(x) = \lim_{\epsilon \rightarrow 0} \left( \frac{4x\epsilon}{\epsilon} + \frac{\epsilon^2}{\epsilon} \right) = 4x + 0. \] Congratulations, you have just taken your first derivative! The calculations were not that complicated, but it was pretty long and tedious. The good news is that you only need to calculate the derivative from first principles only once. Once you find a derivative formula for a particular function, you can use the formula every time you see a function of that form.

A derivative formula

\[ f(x) = x^n \qquad \Rightarrow \qquad f'(x) = n x^{n-1}. \]

Example

Use the above formula to find the derivatives of the following three functions: \[ f(x) = x^{10}, \quad g(x) = \sqrt{x^3}, \qquad h(x) = \frac{1}{x^3}. \] In the first case, we use the formula directly to find the derivative $f'(x)=10x^9$. In the second case, we first use the fact that square root is equivalent to an exponent of $\frac{1}{2}$ to rewrite the function as $g(x)=x^{\frac{3}{2} }$, then using the formula we find that $g'(x)=\frac{3}{2}x^{\frac{1}{2} } =\frac{3}{2}\sqrt{x}$. We can also rewrite the third function as $h(x)=x^{-3}$ and then compute the derivative $h'(x)=-3x^{-4}=\frac{-3}{x^4}$ using the formula.

Discussion

In the next section we will develop derivative formulas for other functions.

Formulas to memorize

\[ \begin{align*} F(x) & \ - \textrm{ diff. } \to \quad F'(x) \nl \int f(x)\;dx & \ \ \leftarrow \textrm{ int. } - \quad f(x) \nl a &\qquad\qquad\qquad 0 \nl x &\qquad\qquad\qquad 1 \nl af(x) &\qquad\qquad\qquad af'(x) \nl f(x)+g(x) &\qquad\qquad\qquad f'(x)+g'(x) \nl x^n &\qquad\qquad\qquad nx^{n-1} \nl 1/x=x^{-1} &\qquad\qquad\qquad -x^{-2} \nl \sqrt{x}=x^{\frac{1}{2}} &\qquad\qquad\qquad \frac{1}{2}x^{-\frac{1}{2}} \nl {\rm e}^x &\qquad\qquad\qquad {\rm e}^x \nl a^x &\qquad\qquad\qquad a^x\ln(a) \nl \ln(x) &\qquad\qquad\qquad 1/x \nl \log_a(x) &\qquad\qquad\qquad (x\ln(a))^{-1} \nl \sin(x) &\qquad\qquad\qquad \cos(x) \nl \cos(x) &\qquad\qquad\qquad -\sin(x) \nl \tan(x) &\qquad\qquad\qquad \sec^2(x)\equiv\cos^{-2}(x) \nl \csc(x) \equiv \frac{1}{\sin(x)} &\qquad\qquad\qquad -\sin^{-2}(x)\cos(x) \nl \sec(x) \equiv \frac{1}{\cos(x)} &\qquad\qquad\qquad \tan(x)\sec(x) \nl \cot(x) \equiv \frac{1}{\tan(x)} &\qquad\qquad\qquad -\csc^2(x) \nl \sinh(x) &\qquad\qquad\qquad \cosh(x) \nl \cosh(x) &\qquad\qquad\qquad \sinh(x) \nl \sin^{-1}(x) &\qquad\qquad\qquad \frac{1}{\sqrt{1-x^2}} \nl \cos^{-1}(x) &\qquad\qquad\qquad \frac{-1}{\sqrt{1-x^2}} \nl \tan^{-1}(x) &\qquad\qquad\qquad \frac{1}{1+x^2} \end{align*} \]

Fundamental theorem of calculus

Though it may not be apparent at first, the study of derivatives (Calculus I) and integrals (Calculus II) are intimately related. Differentiation and integration are inverse operations.

You have previously studied the inverse relationship for functions. Recall that for any bijective function $f$ (a one-to-one relationship) there exists an inverse functions $f^{-1}$ which undoes the effects of $f$: \[ (f^{-1}\!\circ f) (x) \equiv f^{-1}(f(x)) = x. \] and \[ (f \circ f^{-1}) (y) \equiv f(f^{-1}(y)) = y. \] The circle $\circ$ stands for composition of functions, i.e., first you apply one function and then you apply the second function. When you apply a function followed by its inverse to some input you get back the original input.

The integral is the “inverse operation” to the derivative. If perform the integral operation followed by the derivative operation on some function, you will get back the same function. This is stated more formally as the Fundamental Theorem of Calculus.

Statement

Let $f(x)$ be a continuous function and let $F(x)$ be its antiderivative on the interval $[a,b]$: \[ F(x) = \int_a^x f(t) \; dt, \] then, the derivative of $F(x)$ is equal to $f(x)$: \[ F'(x) = f(x), \] for any $x \in (a,b)$.

Thus, we see that differentiation is the inverse operation of integration. We obtained $F(x)$ by integrating $f(x)$. If we then take the derivative of $F(x)$ we get back to $f(x)$. It works the other way too. If you integrate a function and then take its derivative, you get back to the original function. Differential calculus and integral calculus are two sides of the same coin. If you understand this fact, then you understand something very deep about calculus.

Note that $F(x)$ is not a unique anti-derivative. We can add an arbitrary constant $C$ to $F(x)$ and it will still satisfy the above conditions since the derivative of a constant is zero.

Formulas

If you are given some function $f(x)$, you take its integral and then take the derivative of the result, you will get back the same function: \[ \left(\frac{d}{dx} \circ \int dx \right) f(x) \equiv \frac{d}{dx} \int_a^x f(t) dt = f(x). \] Alternately, you can first take the derivative, and then take the integral, and you will get back the function (up to a constant): \[ \left( \int dx \circ \frac{d}{dx}\right) f(x) \equiv \int_a^x f'(t) dt = f(x) - f(a). \]

Note that we had to use a dummy variable $t$ inside the integral since $x$ is used in the limit. Indeed, all integrals are functions of their limits and the inner variable is not important: we could write $\int_a^x f(y)\;dy$ or $\int_a^x f(z)\;dz$ or even $\int_a^x f(\xi)\;d\xi$ and the answer for all of these will be $F(x)-F(a)$.

Discussion

As a consequence of the Fundamental theorem, you can reuse all your knowledge of differential calculus to solve integrals.

Example: Reverse engineering

Suppose you are asked find this integral: \[ \int x^2 dx. \] Using the Fundamental theorem, we can rephrase this question as the search for some function $F(x)$ such that \[ F'(x) = x^2. \] Now since you remember your derivative formulas well, you will guess right away that $F(x)$ must contain a $x^3$ term. This is because you get back quadratic term when you take the derivative of cubic term. So we must have $F(x)=cx^3$, for some constant $c$. We must pick the constant that makes this work out: \[ F'(x) = 3cx^2 = x^2, \] therefore $c=\frac{1}{3}$ and the integral is: \[ \int x^2 dx = \frac{1}{3}x^3 + C. \] Did you see what just happened? We were able to take an integral using only derivative formulas and “reverse engineering”. You can check that, indeed, $\frac{d}{dx}\left[\frac{1}{3}x^3\right] = x^2$.

You can also use the Fundamental theorem to check your answers.

Example: Integral verification

Suppose a friend tells you that \[ \int \ln(x) dx = x\ln(x) - x + C, \] but he is a shady character and you don't trust him. How can you check his answer? If you had a smartphone handy, you can check on live.sympy.org, but what if you just have pen and paper? If $x\ln(x) - x$ is really the antiderivative of $\ln(x)$, then by the Fundamental theorem of calculus, if we take the derivative we should get back $\ln(x)$. Let's check: \[ \frac{d}{dx}\!\left[ x\ln(x) - x \right] = \underbrace{\frac{d}{dx}\!\left[x\right]\ln(x)+ x \left[\frac{d}{dx} \ln(x) \right]}_{\text{product rule} } - \frac{d}{dx}\left[ x \right] = 1\ln(x) + x\frac{1}{x} - 1 = \ln(x). \] OK, so your friend is correct.

Optimization: calculus' killer app

The reason why you need to learn about derivatives is that this skill will allow you to optimize any function. Suppose you have control over the input of the function $f(x)$ and you want to pick the best value of $x$. Best usually means maximum (if the function measures something good like profits) or minimum (if the function describes something bad like costs).

Example

The drug boss for the whole of lower Chicago area has recently had a lot of problems with the police intercepting his people on the street. It is clear that the more drugs he sells the more, money he will make, but if he starts to sell too much, the police arrests start to become more frequent and he loses money.

Fed up with this situation, he decides he needs to find the optimal amount of drugs to put out on the streets: as much as possible, but not too much for the police raids to kick in. So one day he tells his brothers and sisters in crime to leave the room and picks up a pencil and a piece of paper to do some calculus.

If $x$ is the amount of drugs he puts out on the street every day, then the amount of money he makes is given by the function: \[ f(x) = 3000x e^{-0.25x}, \] where the linear part $3000x$ represents his profits if there is no police and the $e^{-0.25x}$ represents the effects of the police stepping up their actions when more drugs is pumped on the street.

The graph of the drug profits as a function of amount of drugs sold.

Looking at the function he asks “What is the value of $x$ which will give me the most profits from my criminal dealings?” Stated mathematically, he is asking for \[ \mathop{\text{argmax}}_x \ 3000x e^{-0.25x} \ = \ ?, \] which is read “find the value of the argument $x$ that gives the maximum value of $f(x)$.”

He remembers the steps required to find the maximum of a function from a conversation with a crooked stock trader he met in prison. First he must take the derivative of the function. Because the function is a product of two functions, he has to use the product rule $(fg)' = f'g+fg'$. When he takes the derivative of $f(x)$ he gets: \[ f'(x) = 3000e^{-0.25x} + 3000x(-0.25)e^{-0.25x}. \]

Whenever $f'(x)=0$ this means the function $f(x)$ has zero slope. A maximum is just the kind of place where there is zero slope: think of the peak of a mountain that has steep slopes to the left and to right, but right at the peak it is momentarily horizontal.

So when is the derivative zero? \[ f'(x) = 3000e^{-0.25x} + 3000x(-0.25)e^{-0.25x} = 0. \] We can factor out the $3000$ and the exponential function to get \[ 3000e^{-0.25x}( 1 -0.25x) = 0. \] Now $3000\neq0$ and the exponential function $e^{-0.25x}$ is never equal to zero either so it must be the term in the bracket which is equal to zero: \[ (1 -0.25x) = 0, \] or $x=4$. The slope of $f(x)$ is equal to zero when $x=4$. This correspond to the peak of the curve.

Right then and there the crime boss called his posse back into the room and proudly announced that from now on his organization will put out exactly four kilograms of drugs on the street per day.
“Boss, how much will we make per day if we sell four kilograms?”, asks one of the gangsters in sweatpants.
“We will make the maximum possible!”, replies the boss.
“Yes I know Boss, but how much money is the maximum?”
The dude in sweatpants is asking a good question. It is one thing to know where the maximum occurs and it is another to know the value of the function at this point. He is asking the following mathematical question: \[ \max_x \ 3000x e^{-0.25x} \ = \ ?. \] Since we already know the value $x^*=4$ where the maximum occurs, we simply have to plug it into the function $f(x)$ to get: \[ \max_x f(x) = f(4) = 3000(4)e^{-0.25(4)} = \frac{12000}{e} \approx 4414.55. \] After that conversation, everyone, including the boss, started to question their choice of occupation in life. Is crime really worth it when you do the numbers?

As you may know, the system is obsessed with this whole optimization thing. Optimize to make more profits, optimize to minimize costs, optimize stealing of natural resources from Third World countries, optimize anything that moves basically. Therefore, the system wants you, the young and powerful generation of the future, to learn this important skill and become faithful employees in the corporations. They want you to know so that you can help them optimize things, so that the whole enterprise will continue to run smoothly.

Mathematics makes no value judgments about what should and should not be optimized; this part is up to you. If, like me, you don't want to use optimization for system shit, you can use calculus for science. It doesn't matter whether it will be physics or medicine or building your own business, it is all good. Just stay away from the system. Please do this for me.

Riemann sum

We defined the integral operation $\int f(x)\;dx$ as the inverse operation of $\frac{d}{dx}$, but it is important to know how to think of the integral operation on its own. No course on calculus would be complete without a telling of the classical “rectangles story” of integral calculus.

Definitions

  • $x$: $\in \mathbb{R}$, the argument of the function.
  • $f(x)$: a function $f \colon \mathbb{R} \to \mathbb{R}$.
  • $x_i$: where the sum starts, i.e., some given point on the $x$ axis.
  • $x_f$: where the sum stops.
  • $A(x_i,x_f)$: Exact value of the area under the curve $f(x)$ from $x=x_i$ to $x=x_f$.
  • $S_n(x_i,x_f)$: An approximation to the area $A$ in terms of

$n$ rectangles.

  • $s_k$: Area of $k$-th rectangle when counting from the left.

In the picture on the right, we are approximating the function $f(x)=x^3-5x^2+x+10$ between $x_i=-1$ and $x_f=4$ using $n=12$ rectangles. The sum of the areas of the 12 rectangles is what we call $S_{12}(-1,4)$. We say that $S_{12}(-1,4) \approx A(-1,4)$.

Formulas

The main formula you need to know is that the combined area approximation is given by the sum of the areas of the little rectangles: \[ S_n = \sum_{k=1}^{n} s_k. \]

Each of the little rectangles has an area $s_k$ given by its height multiplied by its width. The height of each rectangle will vary, but the width is constant. Why constant? Riemann figured that having each rectangle with a constant width $\Delta x$ would make it very easy to calculate the approximation. The total length of the interval from $x_i$ to $x_f$ is $(x_f-x_i)$. If we divide this length into $n$ equally spaced segments, each of width $\Delta x$ given by: \[ \Delta x = \frac{x_f - x_i}{n}. \]

OK, we have the formula for the width figured out, let's see what the height will be for the $k$-th rectangle, where $k$ is our counter from left to right in the sequence of rectangles. The height of the function varies as we move along the $x$ axis. For the rectangles, we pick isolated “samples” of $f(x)$ for the following values \[ x_k = x_i + k\Delta x, \textrm{ for } k \in \{ 1, 2, 3, \ldots, n \}, \] all of them equally spaced $\Delta x$ apart.

The area of each rectangle is height times width: \[ s_k = f(x_i + k\Delta x)\Delta x. \]

Now, my dear students, I want you to stare at the above equation and do some simple calculations to check that you understand. There is no point in continuing if you are just taking my word for it. Verify that when $k=1$, the formula gives the area of the first little rectangle. Verify also that when $k=n$, the formula for the $x_n$ gives the right value ($x_f$).

Ok let's put our formula for $s_k$ in the sum where it belongs. The Riemann sum approximation using $n$ rectangles is given by \[ S_n = \sum_{k=1}^{n} f(x_i + k\Delta x)\Delta x, \] where $\Delta x =\frac{|x_f - x_i|}{n}$.

Let us get back to the picture where we try to approximate the area under the curve $f(x)=x^3-5x^2+x+10$ by using 12 pieces.

For this scenario the value we would get for the 12-rectangle approximation to the area under the curve with \[ S_{12} = \sum_{k=1}^{12} f(x_i + k\Delta x)\Delta x = 11.802662. \] You shouldn't trust me though, but always check for yourself using live.sympy.org by typing in the following expressions:

 >>> n=12.0; xk = -1 + k*5/n; sk = (xk**3-5*xk**2+xk+10)*(5/n);
 >>> summation( sk, (k,1,n) )
      11.802662...

More is better

Who cares though? This is such a crappy approximation! You can clearly see that some rectangles lie outside of the curve (overestimates), and some are too far inside (underestimates). You might be wondering why I wasted so much of your time to achieve such a lousy approximation. We have not been wasting our time. You see, the Riemann sum formula $S_n$ gets better and better as you cut the region into smaller and smaller rectangles.


With $n=25$, we get a more fine grained approximation in which the sum of the rectangles is given by: \[ S_{25} = \sum_{k=1}^{25} f(x_i + k\Delta x)\Delta x = 12.4. \]

Riemann sum with $n=50$ rectangles Then for $n=50$ we get: \[ S_{50} = 12.6625. \]

Riemann sum with $n=100$ rectangles For $n=100$ the sum of the rectangles areas is starting to look pretttttty much like the function. The calculation gives us $S_{100} = 12.790625$.

For $n=1000$ we get $S_{1000} = 12.9041562$ which is very close to the actual value of the area under the curve: \[ A(-1,4) = 12.91666\ldots \]

You see in the long run, when $n$ gets really large the rectangle approximation (Riemann sum) can be made arbitrarily good. Imagine you cut the region into $n=10000$ rectangles, wouldn't $S_{10000}(-1,4)$ be a pretty accurate approximation of the actual area $A(-1,4)$?

Integral

The fact that you can approximate the area under the curve with a bunch of rectangles is what integral calculus is all about. Instead of mucking about with bigger and bigger values of $n$, mathematicians go right away for the kill and make $n$ go to infinity.

In the limit of $n \to \infty$, you can get arbitrarily close approximations to the area under the curve. All this time, that which we were calling $A(-1,4)$ was actually the “integral” of $f(x)$ between $x=-1$ and $x=4$, or written mathematically: \[ A(-1,4) \equiv \int_{-1}^4 f(x)\;dx \equiv \lim_{n \to \infty} S_{n} = \lim_{n \to \infty} \sum_{k=1}^{n} f(x_i + k\Delta x)\Delta x. \]

While it is not computationally practical to make $n \to \infty$, we can convince ourselves that the approximation becomes better and better as $n$ becomes larger. For example the approximation using $n=1$M rectangles is accurate up to the fourth decimal place as can be verified using the following commands on live.sympy.org:

 >>> n=1000000.0; xk = -1 + k*5/n; sk = (xk**3-5*xk**2+xk+10)*(5/n);
 >>> summation( sk, (k,1,n) )
      12.9166541666563
 >>> integrate( x**3-5*x**2+x+10, (x,-1,4) ).evalf()
      12.9166666666667

In practice, when we want to compute the area under the curve, we don't use Riemann sums. There are formulas for directly calculating the integrals of functions. In fact, you already know the integration formulas: they are simply the derivative formulas used in the opposite direction. In the next section we will discuss the derivative-integral inverse relationship in more details.

Links

Techniques of integration

The operation of “taking the integral” of some function is usually much more complicated than that of taking the derivative. In fact, you can take the derivative of any function – no matter how complex – simply by using the product rule, the chain rule and the derivative formulas. The same is not true for integrals.

There are plenty of integrals for which there is no closed form solution, which means that the function doesn't have an anti-derivative. There simply doesn't exist a simple procedure to follow, such that you input a function and you “turn the crank” until the integral comes out. Integration is a bit of an art.

What can we integrate then and how? Back in the day, scientists used to collect big tables with integral formulas for various complicated functions. That is what you can lookup-integrate.

There are also some integration techniques which can help you make complicated integrals simpler. Think of the techniques below, as adapters you need to use for cases when the function you are trying to integrate doesn't appear in your table of integrals, but a similar one is in the table.

The intended audience for this chapter are Calculus II students. This is exactly the kind of skills which you will be asked to show on the final. Instead of using the table of integrals to lookup some complicated integral, you have know how to make your own table.

For people interested in learning physics, I will honestly tell you that if you skip this section you won't miss much. You should just read the section on substitution which is the important one, but don't bother reading the details of all the recipes for integrating things. For most intents and purposes, once you understand what an integral is, you can use a computer to calculate it. A good tool for this is the computer algebra system at live.sympy.org.

 >>> integrate( sin(x) )
      -cos(x)
 
 >>> integrate( x**2*exp(x) )
      x**2*exp(x) - 2*x*exp(x) + 2*exp(x)

You can use sympy for all your integration needs.

For those of you reading this book for general culture and who want to understand what calculus is without having to write a final exam on it, consider the next couple of pages as an ethnographic survol of the academic realities in which bright first year students are forced to integrate things they don't want to integrate and this for many long hours. Just picture some unlucky science student locked up in her room doing calculus and hundreds of dangling integrals grabbing at her with their hooks, keeping her away from her friends.

Actually, it is not that bad. There are, like, four tricks to learn and if you practice you can learn all of them in a week or so. Mastering these four tricks is essentially the entire Calculus II class. If you understand the material in this section, you will be done with integral calculus and you will have two months to chill.

Substitution

Say you are integrating some complicated function which contains a square root $\sqrt{x}$. You are wondering how to go about computing this integral: \[ \int \frac{1}{x - \sqrt{x}} \; dx \ = \ ? \]

Sometimes you can simplify the integral by substituting a new variable in the expression. Let $u=\sqrt{x}$. Substitution is like search-and-replace in a word processor. Every time you see the expression $\sqrt{x}$, you have to replace it with $u$: \[ \int \frac{1}{x - \sqrt{x}} \; dx = \int \frac{1}{u^2 - u} \; dx. \] Note that we also replaced $x=(\sqrt{x})^2$ with $u^2$.

We are not done yet. When you change from the $x$ variable to the $u$ variable, you have to be thorough. You have to change the $dx$ to a $du$ also. Can we just replace $dx$ with $du$? Unfortunately no, otherwise it would be like saying that the “short step” $du$ is equal in length to the “short step” $dx$, which is only true for the trivial substitution $u=x$.

To find the relation between the infinitesimals we take the derivative: \[ u(x) = \sqrt{x} \quad \Rightarrow \quad u'(x) = \frac{du}{dx} = \frac{1}{2\sqrt{x}}. \] For the next step, I need you to stop thinking about the expression $\frac{du}{dx}$ as a whole, but think about it as a rise-over-run fraction which can be split. Lets take the run $dx$ to the other side of the equation: \[ du = \frac{1}{2\sqrt{x}} \; dx, \] and to isolate $dx$, we multiply both sides by $2\sqrt{x}$: \[ dx = 2\sqrt{x} \; du = 2u \; du, \] where in the last step we used the fact that $u=\sqrt{x}$ again.

Now we have an expression for $dx$ entirely in terms of $u$'s. Let's see what that gives: \[ \int \frac{1}{x - \sqrt{x}} \; dx = \int \frac{1}{u^2 - u} 2u \; du = \int \frac{2}{u - 1} \; du. \]

We can now recognize the general form $\frac{1}{x}$ which has integral $\ln(x)$, but we have to account for the $-1$ shift inside the function. The integral therefore is: \[ \int \frac{1}{x - \sqrt{x}} \; dx = \int \frac{2}{u - 1} \; du = 2\ln(u-1) = 2\ln(\sqrt{x}-1). \] Note that in the last step we changed back to the $x$ variable, to give the final answer. The variable $u$ exists only in our calculation. We invented it out of thin air, when we said “Let $u=\sqrt{x}$” in the beginning. It is only natural to convert back to the original variable $x$ in the last step.

Notice what happened thanks to the substitution? The integral got simpler since we got rid of the square roots. On the outside we had just an extra $u$ appearing, which ends up cancelling with the $u$ in the denominator making things even simpler. In practice, substituting inside $f$ is the easy part. The hard part is making sure that our choice of substitution leads to a replacement for $dx$ which helps to make the integral simpler.

For definite integrals, i.e., integrals that have explicit limits, there is an extra step that we need to take when changing variables: we have to change the $x$ limits of integration to $u$ limits. In our expression, when changing to the $u$ variable, we would have to write: \[ \int_a^b \frac{1}{x - \sqrt{x}} \; dx = \int_{u(a)}^{u(b)} \frac{2}{u - 1} \; du. \] If the integral had asked for the integral between $x_i=4$ and $x_f=9$, then the new limits will be $u_i=\sqrt{4}=2$ and $u_f=\sqrt{9}=3$, so we will have: \[ \int_4^9 \frac{1}{x - \sqrt{x}} \; dx = \int_{2}^{3} \frac{2}{u - 1} \; du = 2\ln(u-1)\bigg|_2^3 = 2(\ln(2) - \ln(1)) = 2\ln(2). \]

OK, so let's recap. Substitution involves three steps:

  1. Replace all occurrences of $u(x)$ with $u$.
  2. Replace $dx$ with $\frac{1}{u'(x)}du$.
  3. If there are limits, replace the $x$ limits with $u$ limits.

If the resulting integral is simpler to solve then good for you!

Example

We are asked to find $\int \tan(x)\; dx$. We know that $\tan(x)=\frac{\sin(x)}{\cos(x)}$, so we can use the substitution $u=\cos(x)$, $du=-\sin(x)dx$ as follows: \[ \begin{eqnarray} \int \tan(x)dx &=& \int \frac{\sin(x)}{\cos(x)} dx \nl &=& \int \frac{-1}{u} du \nl &=& -\ln |u| + C \nl &=& -\ln |\cos(x) | + C. \end{eqnarray} \]

Integrals of trig functions

Because $\sin$, $\cos$, $\tan$ and the other trig functions are related, we can often express one function in terms of another in order to simplify integrals.

Recall the trigonometric identity: \[ \cos^2(x) + \sin^2(x) = 1, \] which is the statement of Pythagoras theorem.

If we choose to make the substitution $u=\sin(x)$, then we can replace all kinds of trigonometric terms with the new variable $u$: \[ \begin{align*} \sin^2(x) &= u^2, \nl \cos^2(x) &= 1 - \sin^2(x) = 1 - u^2, \nl \tan^2(x) &= \frac{\sin^2(x)}{\cos^2(x)} = \frac{u^2}{1-u^2}. \end{align*} \]

Of course the change of variable $u=\sin(x)$ means that you have to change the $du=u'(x) dx= \cos(x) dx$ so there better be something to cancel this $\cos(x)$ term in the integral.

Let me show you one example when things work out perfectly. Suppose $m$ is some arbitrary number, and you have to integrate: \[ \int \left(\sin(x)\right)^{m}\cos^{3}(x) \; dx \equiv \int \sin^{m}(x)\cos^{3}(x) \; dx. \] This integral contains $m$ powers of the $\sin$ function and a three powers of the $\cos$ function. Let us split the $\cos$ term into two parts: \[ \int \sin^{m}(x)\cos^{3}(x) \; dx = \int \sin^{m}(x) \cos^{2}(x) \cos(x) \; dx. \]

Making the change of variable $u=\sin(x)$, $du=\cos(x)dx$ means that we can replace $\sin^m(x)$ by $u^m$, and $\cos^2(x)=1-u^2$ in the above expression to get: \[ \int \sin^{m}(x) \cos^{2}(x) \cos(x) \; dx = \int u^{m} \left(1-u^2\right) \cos(x) \; dx. \]

Conveniently we happen to have $du= \cos(x)dx$ so the complete change of variable step is: \[ \begin{align*} \int \sin^{m}(x) \cos^{2}(x) \cos(x) \; dx & = \int u^{m} \left(1-u^2\right) \; du. \end{align*} \] This is what I was talking about earlier about “having an extra $\cos(x)$” to cancel the one that will appear from the $dx \to du$ change.

What is the answer then? It is a simple integral of a polynomial: \[ \begin{align*} \int u^{m} \left(1-u^2\right) \; du & = \int \left( u^{m} - u^{m+2} \right) \; du \nl & = \frac{1}{m+1}u^{m+1} - \frac{1}{m+3}u^{m+3} \nl & = \frac{1}{m+1}\sin^{m+1}(x) - \frac{1}{m+3}\sin^{m+3}(x). \end{align*} \]

You might be wondering how useful this substitution technique really is. I mean, how often do you have to integrate such a particular combinations of $\sin$ and $\cos$ powers so that the substitution works out perfectly. You would surprised! Sins and cos functions are used a lot in this thing called the Fourier transform, which is a way of expressing a sound wave $f(t)$ in terms of the frequencies it contains. Also on exams, they love to test this kinds of things. Teachers often want to check if you can do integrals and substitutions and they check if you remember all the trigonometric identities, which you are supposed to have learned in high school.

What other trigonometric functions should you know how to integrate? On an exam you should try any possible substitution you can think of, combined with any trigonometric identity that seems to simplify things. Some common ones are described below.

Cos

Just as we can substitute $\sin$, we can also substitute $u=\cos(x)$ and use $\sin^2(x)=1-u^2$. Again, this substitution only makes sense if you have a $\sin$ left over somewhere in the integral to cancel with the $du = -\sin(x)dx$.

Tan and sec

We can get some more mileage out of $\cos^2(x) + \sin^2(x) = 1$. If we divide both sides by $\cos^2(x)$ we get: \[ 1 + \tan^2(x) = \sec^2(x) \equiv \frac{1}{\cos^2(x)}, \] which is useful because $u=\tan(x)$ gives $du=\sec^2(x)dx$ so you can often “kill off” even powers of $\sec^2(x)$ in integrals of the form \[ \int\tan^m(x)\sec^n(x)\,dx. \]

Even powers of sin and cos

There are other trigonometric identities called half-angle and double-angle formulas which give you formulas like: \[ \sin^2(x)=\frac{1}{2}(1-\cos(2x)), \qquad \cos^2(x)=\frac{1}{2}(1+\cos(2x)). \]

These are useful if you have to integrate even powers of $\sin$ and $\cos$.

Example

Let's see how we would find $I=\int\sin^2(x)\cos^4(x)\,dx$: \[ \begin{eqnarray} I &=& \int\sin^2(x)\cos^4(x)\;dx \nl &=& \int \left( {1 \over 2}(1 - \cos(2x)) \right) \left( {1 \over 2}(1 + \cos(2x)) \right)^2 \;dx, \nl &=& \frac{1}{8} \int \left( 1 - \cos^2(2x) + \cos(2x)- \cos^3(2x) \right) \;dx. \nl & = & \frac{1}{8} \int \left( 1 - \cos^2(2x) + \cos(2x) -\cos^2(2x) \cos(2x) \right)\; dx \nl & = & \frac{1}{8} \int \left( 1 - \frac{1}{2} (1 + \cos(4x)) + \underline{\cos(2x)} - (\underline{1}-\sin^2(2x))\underline{\cos(2x)} \right) \; dx \nl & = & \frac{1}{8} \int \left( \frac{1}{2} - \frac{1}{2} \cos(4x) + \underbrace{\sin^2(2x)}_{u^2}\cos(2x) \right) \;dx \nl & = & \frac{1}{8} \left( \frac{x}{2} - \frac{\sin(4x)}{8} + \frac{\sin^3(2x)}{6} \right) \nl &=& \frac{x}{16}-\frac{\sin(4x)}{64} + \frac{\sin^3(2x)}{48}+C. \end{eqnarray} \]

There is no limit to the number of combinations of simplification steps you can try. On a homework question or an exam, the teacher will ask for something simple. You just have to find the right substitution.

Sneaky example

Sometime, the substitution is not obvious at all, as in the case of $\int \sec(x)dx$. To find the integral you need to know the following trick: multiply and divide by $\tan(x) +\sec(x) $.

What we get is \[ \begin{eqnarray} \int \sec(x) \, dx &=& \int \sec(x)\ 1 \, dx \nl &=& \int \sec(x)\frac{\tan(x) +\sec(x)}{\tan(x) +\sec(x)} \; dx \nl &=& \int \frac{\sec^2(x) + \sec(x) \tan(x)}{\tan(x) +\sec(x)} \; dx\nl &=& \int \frac{1}{u} du \nl &=& \ln |u| + C \nl &=& \ln |\tan(x) + \sec(x) | + C, \end{eqnarray} \] where in the fourth line we used the substitution $u=\tan(x)+\sec(x)$ and $du = (\sec^2(x) + \tan(x)\sec(x))dx$.

I highly recommend you view and practice all the examples you can get your hands on. Don't bother memorizing any recipes though, you will do just as well with trial and error.

Trig substitution

Often times when doing integrals for physics we get terms of the form $\sqrt{a^2-x^2}$, $\sqrt{a^2+x^2}$ or $\sqrt{x^2-a^2}$ which are not easy to handle. In each of the above three cases, we can do a trig substitution, in which we substitute $x$ with one of the trigonometric functions $a\sin(\theta)$, $a\tan(\theta)$ or $a\sec(\theta)$, and the resulting integral becomes much simpler.

Sine substitution

Consider an integral which contains an expression of the form $\sqrt{a^2-x^2}$. If we use the substitution $x=a\sin \theta$, the complicated square-root expression will get simpler: \[ \sqrt{a^2-x^2} = \sqrt{a^2-a^2\sin^2\theta} = a\sqrt{1-\sin^2\theta} = a\cos\theta, \] because we have $\cos^2\theta = 1 - \sin^2\theta$. The transformed integral now involves a trigonometric function which we know how to integrate.

Sine substitution triangle. Once we find the integral in terms of $\theta$, we have to convert the various $\theta$ expressions in the answer back to the original variables $x$ and $a$: \[ \sin\theta = \frac{x}{a}, \ \ \cos\theta = \frac{\sqrt{a^2-x^2}}{a}, \ \ \tan\theta = \frac{x}{\sqrt{a^2-x^2}}, \ \ \] \[ \csc\theta = \frac{a}{x}, \ \ \sec\theta = \frac{a}{\sqrt{a^2-x^2}}, \ \ \cot\theta = \frac{\sqrt{a^2-x^2}}{x}. \ \ \]

Example 1

Suppose you are asked to calculate $\int \sqrt{1-x^2}\; dx$.

We will approach the problem by making the substitution \[ x=\sin \theta, \qquad dx=\cos \theta \; d\theta, \] which is the simplest case of the sine substitution with $a=1$.

Triangle for inverse substitution. We proceed as follows: \[ \begin{eqnarray} \int \sqrt{1-x^2} \; dx & = & \int \sqrt{1-\sin^2 \theta} \cos \theta \; d\theta \nl & = & \int \cos^2 \theta \; d\theta \nl & = & \frac{1}{2} \int \left[ 1+ \cos 2\theta \right] \; d\theta \nl & = & \frac{1}{2}\theta +\frac{1}{4}\sin2\theta \nl & = & \frac{1}{2}\theta +\frac{1}{2}\sin\theta\cos\theta \nl & = & \frac{1}{2}\sin^{-1}\!\left(x \right) +\frac{1}{2}\frac{x}{1}\frac{\sqrt{1-x^2}}{1}. \end{eqnarray} \]

Note how in the last step we used the triangle diagram to “read off” the values of $\theta$, $\sin\theta$ and $\cos\theta$ from the triangle. The substitution $x = \sin\theta$ means the hypotenuse in the diagram should be of length 1, and the opposite side is of length $x$.

Example 2

We want to compute $\int \sqrt{ \frac{a+x}{a-x}} \; dx$. We can rewrite this fraction as follows: \[ \sqrt{\frac{a+x}{a-x}} = \sqrt{\frac{a+x}{a-x} \frac{1}{1}} = \sqrt{\frac{a+x}{a-x} \frac{a+x}{a+x}} =\frac{a+x}{\sqrt{a^2-x^2}}. \]

Next we can make the substitution \[ x=a \sin \theta, \qquad dx=a\cos \theta d\theta, \]

Sine substitution triangle. \[ \begin{eqnarray} \int \frac{a+x}{\sqrt{a^2-x^2}} dx & = & \int \frac{a+a\sin \theta}{a\cos \theta} a \cos \theta \, d\theta \nl & = & a \int \left[ 1+ \sin \theta \right] d\theta \nl & = & a \left[ \theta - \cos \theta \right] \nl & = & a\sin^{-1}\left(\frac{x}{a}\right) - a\frac{\sqrt{a^2-x^2}}{a} \nl & = & a\sin^{-1}\left(\frac{x}{a}\right) - \sqrt{a^2-x^2}. \end{eqnarray} \]

Tan substitution

When an integral contains $\sqrt{a^2+x^2}$, we use the substitution: \[ x = a \tan \theta, \qquad dx = a \sec^2 \theta d\theta. \]

Because of the identity $1+\tan^2\theta=\sec^2\theta$, the square root expression will simplify drastically: \[ \sqrt{a^2+x^2} = \sqrt{a^2+a^2 \tan^2 \theta} = a\sqrt{1+\tan^2 \theta} = a \sec \theta. \] Simplification is a good thing. You are much more likely to be able to find the integral in terms of $\theta$, using trig identities, than in terms of $\sqrt{a^2+x^2}$.

Once you calculate the integral in terms of $\theta$, you will want to convert the answer back into $x$ coordinates. To do this, you need to use a triangle labeled according to our substitution: \[ \tan\theta = \frac{x}{a} = \frac{\text{opp}}{\text{adj}}. \] The equivalent of $\sin\theta$ in terms of $x$ is going to be $\sin\theta \equiv \frac{\text{opp}}{\text{hyp}} = \frac{x}{\sqrt{a^2+x^2}}$. Similarly, the other trigonometric functions are defined as various ratios of $a$, $x$ and $\sqrt{a^2+x^2}$.

Example

Calculate $\int\frac{1}{x^2+1}\,dx$.

The denominator of this function is equal to $\left(\sqrt{1+x^2}\right)^2$. This suggests that we try to substitute $\displaystyle x=\tan \theta\,$ and use the identity $\displaystyle 1 + \tan^2 \theta =\sec^2 \theta\,$. With this substitution, we obtain that $\displaystyle dx= \sec^2 \theta\, d\theta$ and thus: \[ \begin{align} \int\frac{1}{x^2+1}\,dx & =\int\frac{1}{\tan^2 \theta+1} \sec^2 \theta\,d\theta \nl & =\int\frac{1}{\sec^2 \theta} \sec^2 \theta\,d\theta \nl & =\int 1\;d\theta \nl &=\theta \nl &=\tan^{-1}(x) + C. \end{align} \]

Obfuscated example

What if we don't have $x^2 + 1$ in the denominator (a second degree polynomial with a missing linear term), but a full second degree polynomial like: \[ \frac{1}{y^2 - 6y + 10}. \] How would you integrate something like this? If there were no $-2y$, you would be able to use the tan substitution as above – or perhaps you can lookup the formula $\int \frac{1}{x^2+1}dx = \tan^{-1}(x)$ in the table of integrals. But there is no formula for \[ \int \frac{1}{y^2 - 6y + 10} \; dy, \] in the table so how should you proceed.

We will use the good old substitution technique $u=\ldots$ and a high-school algebra trick called “completing the square” in order to rewrite the fraction inside the integral so that it looks like $(y-h)^2 + k$, i.e., with no middle term.

The first step is to find “by inspection” the values of $h$ and $k$: \[ \frac{1}{y^2 - 6y + 10} = \frac{1}{(y-h)^2+k} = \frac{1}{(y-3)^2+1}. \] The “square completed” quadratic expression has no linear term, which is what we wanted. We can now use the substitution $x=y-3$ and $dx=dy$ to obtain an integral which we know how to solve: \[ \!\int \!\! \frac{1}{y^2 - 6y + 10}\; dy \!= \!\int \!\! \frac{1}{(y-3)^2+1}\; dy \!= \!\int \!\!\frac{1}{x^2+1}\; dx = \tan^{-1}(x) = \tan^{-1}(y-3). \]

Sec substitution

In the last two sections we learned how to deal with $\sqrt{a^2-x^2}$, $\sqrt{x^2+a^2}$ and so only the last option remains: $\sqrt{x^2-a^2}$.

Recall the trigonometric identity $1+\tan^2\theta=\sec^2\theta$, or rewritten differently we get \[ \sec^2\theta - 1 = \tan^2\theta. \]

The appropriate substitution for terms like $\sqrt{x^2-a^2}$, is the following: \[ x = a \sec \theta, \qquad dx = a \tan \theta \sec \theta \; d\theta. \]

The substitution method and procedure is the same as in both previous cases, so we will not get into the details. We label the sides of the triangle in the appropriate fashion, namely: \[ \sec\theta = \frac{x}{a} = \frac{\text{hyp}}{\text{opp}}, \] and use this triangle when we are converting back from $\theta$ to $x$ in the final steps.

Interlude

By now, things are starting to get pretty tight for your Calculus teacher. You are starting to know how to “handle” any kind of integral he can throw at you: polynomials, fractions with $x^2$ plus or minus $a^2$ and square roots. He can't even use the dirty old trigonometric tricks, with the $\sin$, the $\cos$ and the $\tan$ since you know that too. What options are there left for him to come up with an integral that you wouldn't know how to solve?

OK, I am exaggerating, but you should at least feel, by now, that you know how to do some integrals that you didn't know before. Just remember to come back to this section when you are hit with some complicated integral. When this happens, check to see which of the examples in this section looks the most similar and use the same approach. Don't bother memorizing the steps in each problem. The substitution $u=\ldots$ may be different from any problem that you have seen so far. You should think of “integration techniques” like general recipe ideas which you must adapt depending on the ingredients that you have to work with.

The most important integration techniques is substation. Recall the steps involved: (1) the change of variable $u=\ldots$, (2) the associated $dx$ to $du$ change and (3) the change in the limits of integration required for definite integrals. With medium to advanced substitution skills you will get at least an 80% on your Calculus II final.

Where is the remaining 20% of the exam going to come from? There are two more recipes to go. I know all these tricks that I have been throwing at you during the last ten pages may seem arduous and difficult to understand, but this is what you got yourself into when you signed-up for the course “Integral Calculus”: there are integrals and you calculate them.

The good news is that we are almost done. There is just one more “trick” to go, and finally I will tell you about “integration by parts”, which is kind of the analogue of the product rule for derivatives $(fg)'=f'g + fg'$.

Partial fractions

Suppose you have to integrate a rational function $\frac{P(x)}{Q(x)}$, where $P$ and $Q$ are polynomials.

For example, you could be asked to integrate \[ \frac{P(x)}{Q(x)} = \frac{Dx+E}{Fx^2 + G x + H}, \] where $D$, $E$, $F$, $G$ and $H$ are arbitrary constants. To get even more specific, let's say you are asked to calculate: \[ \int {3x+ 1 \over x^2+x} \; dx. \]

By magical powers, I can transform the function in this integral into two partial fractions as follows: \[ \int {3x+ 1 \over x^2+x} \; dx = \int \left( \frac{1}{x} + \frac{2}{x+1} \right) \; dx = \int \frac{1}{x} \; dx \ + \ \int \frac{2}{x+1} \; dx, \] in which both terms will give something $\ln$-like when integrated (since $\frac{d}{dx}\ln(x)=\frac{1}{x}$). The final answer is: \[ \int {3x+ 1 \over x^2+x} \; dx = \ln \left| x \right| + 2 \ln \left| x+1 \right| + C. \]

How did I split the problem into partial fractions? Is it really magic or is there a method? There is a little bit of both. The method part is that I assumed that there exist constants $A$ and $B$ such that \[ {3x+ 1 \over x^2+x}={3x+ 1 \over x(x+1)}= {A \over x}+ {B \over x+1}, \] and then I solved the above equation for $A$ and $B$, by computing the sum of the two fractions: \[ {3x+1 \over x(x+1)} = {{A(x+1) + Bx} \over {x(x+1)}}. \]

The magic part is the fact that you can solve for two unknowns in one equation. The relevant part of the equation is just the numerator because both sides have the same denominator. To find $A$ and $B$ we have to solve \[ 3x+1 = (3)x + (1)1 = A(x+1)+Bx = (A+B)x + (A)1. \] To solve this you just have to group the unknown constants into bunches and then read off their value from the equation. The bunch of numbers in front of the constant 1 on the left-hand side is (1) and the coefficient of 1 on the right-hand side is $A$, so $A=1$. Similarly you can deduce that $B=2$ from $A+B=3$ having found that $A=1$ in the first step.

Another way of looking at this, is that the equation \[ 3x+1 = A(x+1)+Bx \] must hold for all values of the variable $x$. If we put in $x=0$ we get $1 = A$ and putting $x=-1$ gives $-2=-B$ so $B=2$.

The above problem highlights the power of the partial fractions method for attacking integrals of polynomial fractions $\frac{P(x)}{Q(x)}$. Most of the work goes into some high-school math (factoring and finding unknowns) and then you do some simple calculus steps once you have split the problem into partial fractions. Some people call this method separation of quotients, but whatever you call it, it is clear that having a way to split a fraction into multiple parts is a good thing: \[ \frac{3x+ 1}{x^2+x} = \frac{A}{x} + \frac{B}{x+1}. \]

How many parts are there going to be for a fraction $\frac{P(x)}{Q(x)}$? What will each part look like? The answer is that there will be as many as the degree of the polynomial $Q(x)$, which is in the denominator of the fraction. Each part will consist of one of the factors of $Q(x)$.

Here is the general procedure:

  1. Split the denominator $Q(x)$ into the product of parts (factorize),

and for each part assume an appropriate partial fraction term

  on the right.
  You will get three types of fractions:
  * Simple factors like $(x-\alpha)^1$. For each of these
    you should //assume// a partial fraction of the form:
    \[
     \frac{A}{x-\alpha},
    \]
    as in the above example.
  * Repeated factors like $(x-\beta)^n$ for which we have to
    assume $n$ different terms on the right-hand side:
    \[
     \frac{B}{x-\beta} + \frac{C}{(x-\beta)^2} + \cdots + \frac{F}{(x-\beta)^n}.
    \]
  * If the denominator contains a portion $ax^2+bx+c$ that cannot be factored, like 
    $x^2+1$ for example, we have to keep it as whole
    and assume that a term of the form:
    \[
     \frac{Gx + H}{ax^2+bx+c}
    \]
    exists on the right-hand side. A polynomial $ax^2+bx+c$ cannot be factored
    if $b^2 < 4ac$, which means it has no real roots $r_1$, $r_2$
    such that $ax^2+bx+c=(x-r_1)(x-r_2)$. 
- Add together all the parts on the right-hand side by first
  cross multiplying them to set all the fractions to a
  common denominator. If you followed the steps 
  correctly in Part 1, the //least common denominator// (LCD) will turn 
  out to be $Q(x)$,
  so both sides will have the same denominator.
  Solve for the unknown coefficients $A, B, C, \ldots$
  in the numerators. Find the coefficients 
  of each power of $x$ on the right-hand side and set them
  equal to the corresponding coefficient in the numerator $P(x)$ of the left-hand side.
  
- Use the appropriate integral formula for each kind of term:
  * For simple factors we have 
    \[
     \int \frac{1}{x-\alpha} \; dx= A \ln|x-\alpha| + C.
    \]
  * For higher powers in the denominator we have
    \[
     \int \frac{1}{(x-\beta)^m} \; dx= \frac{1-m}{(x-\beta)^{m-1}} + C.
    \]
  * For the quadratic denominator terms with "matching" numerator
    terms we can obtain:
    \[
     \int \frac{2ax+b}{ax^2+bx+c} \; dx= \ln|ax^2+bx+c| + C.
    \]
    For quadratic terms with just a constant on top we use
    a two step substitution process.
    First we change to a complete-the-square variable $y=x-h$:
    \[
     \int \frac{1}{ax^2+bx+c} \; dx
     =
     \int \frac{1/a}{(x-h)^2+k} \; dx
     =
     \frac{1}{a}\int \frac{1}{y^2+k} \; dy,
    \]
    and then we use a trig substitution $y = \sqrt{k}\tan\theta$ to get
    \[
     \frac{1}{a} \int \frac{1}{y^2+k} \; dy = 
     \frac{\sqrt{k}}{a}\tan^{-1}\!\!\left(\frac{y}{\sqrt{k}} \right) =
     \frac{\sqrt{k}}{a}\tan^{-1}\!\!\left(\frac{x-h}{\sqrt{k}} \right).
    \]

Example

Find $\int {1 \over (x+1)(x+2)^2}dx$?

Here $P(x)=1$ and $Q(x)=(x+1)(x+2)^2$. If I wanted to be sneaky, I could have asked for $\int {1 \over x^3+5x^2+8x+4}dx$, instead – which is actually the same question, but you have to do the factoring yourself.

According to the recipe outlined above, we have to look for a split fraction of the form: \[ \frac{1}{(x+1)(x+2)^2}=\frac{A}{x+1}+\frac{B}{x+2}+\frac{C}{(x+2)^2}. \] To make the equation more explicit, let us add the fractions on the right. We set all of them to a the least common denominator and add up: \[ \begin{align} \frac{1}{(x+1)(x+2)^2} & =\frac{A}{x+1}+\frac{B}{x+2}+\frac{C}{(x+2)^2} \nl &= \frac{A(x+2)^2}{(x+1)(x+2)^2}+\frac{B(x+1)(x+2)}{(x+1)(x+2)^2}+\frac{C(x+1)}{(x+1)(x+2)^2} \nl & = \frac{A(x+2)^2+B(x+1)(x+2)+C(x+1)}{(x+1)(x+2)^2}. \end{align} \]

The denominators are the same on both sides in the above equation, so we can focus our attention on the numerator: \[ A(x+2)^2+B(x+1)(x+2)+C(x+1) = 1. \] We choose three different values of $x$ in order to find the values of $A$, $B$ and $C$: \[ \begin{matrix} x=0 & 1= 2^2A +2B+C \nl x=-1 & 1=A \nl x=-2 & 1= -C \end{matrix} \] so $A=1$, $B=-1$, $C=-1$, and thus \[ \frac{1}{(x+1)(x+2)^2}=\frac{1}{x+1}-\frac{1}{x+2}-\frac{1}{(x+2)^2}. \]

We can now calculate the integral by integrating each of the terms: \[ \int \frac{1}{(x+1)(x+2)^2} dx= \ln(x+1) - \ln({x+2}) + \frac{1}{x+2} +C. \]

Integration by parts

Suppose you have to integrate the product of two functions. If one of the functions happens to look like the derivative of a function that you recognize, then you can do the following trick: \[ \int f(x) g'(x) \; dx \ \ = \ \ f(x) g(x) \ \ \ \ - \int f'(x)g(x) \; dx. \]

This means that you can shift the work to evaluating a different integral where one function is replaced by its derivative and another is replaced by its integral.

Derivatives tend to simplify functions whereas integrals make functions more complicated, so such shifting of work can be quite beneficial: you will save yourself some work on integrating the $f$ part, but you will do more work on the $g$ part.

It is easier to remember the integration by parts formula in the shorthand notation: \[ \int u\; dv = uv - \int v\; du. \] In fact, you can think of integration by parts as a form of “double substitution”, where you replace $u$ and $dv$ at the same time. To be sure of what is going on, I recommend you always make a little table like this: \[ \begin{align} u &= & \qquad dv &= \nl du &= & \qquad v &= \end{align} \] and fill in the blanks. The first row consists of the two parts that you see in your original problem. Then you differentiate in the left column, and integrate in the right column. If you do this, using the integration by parts formula will be really easy since you have all your expressions ready.

For definite integrals the integration by parts rule needs to take into account the evaluation at the limits: \[ \int_a^b u\; dv = \left(uv\right)\Big|_a^b \ \ - \ \ \int_a^b v \; du, \] which tells us to evaluate the difference of the value of $uv$ at the two endpoints and then subtract the switched integral with the same endpoints.

Example 1

Find $\int x e^x \, dx$. We identify the good candidates for $u$ and $dv$ in the original expression, and perform all the work necessary for the substitution: \[ \begin{align} u &=x & \qquad dv &= e^x \; dx, \nl du &=dx & \qquad v &= e^x. \end{align} \] Next we apply the integration by parts formula \[ \int u\; dv = uv - \int v\; du, \] to get the following: \[ \begin{align} \int xe^x \, dx &= x e^x - \int e^x \; dx \nl &= x e^x - e^x + C. \end{align} \]

Example 2

Find $\int x \sin x \; dx$. We choose $u=x$ and $dv=\sin x dx$. With these choices, we have $du=dx$ and $v=-\cos x$, and integrating by parts we get: \[ \begin{align} \int x \sin x \, dx &= -x \cos x - \int \left(-\cos x\right) \; dx \nl &= -x \cos x + \int \cos x \; dx \nl &= -x \cos x + \sin x + C. \end{align} \]

Example 3

Often times, you have to integrate by parts multiple times. To calculate $\int x^2 e^x \, dx$, we start by choosing: \[ \begin{align} u &=x^2 & \qquad dv &= e^x \; dx \nl du &= 2x \; dx & \qquad v &= e^x, \end{align} \] which gives the following after integration by parts: \[ \int x^2 e^x \; dx = x^2 e^x \ - \ 2 \int x e^x \; dx. \] We apply integration by parts again on the remaining integral this time using $u=x$ and $dv=e^x\; dx$, which gives $du = dx$ and $v=e^x$.

\[ \begin{align} \int x^2 e^x \; dx &= x^2 e^x - 2 \int x e^x \; dx \nl &= x^2 e^x - 2\left(x e^x - \int e^x \; dx \right) \nl &= x^2 e^x - 2x e^x + 2e^x + C. \end{align} \]

By now I hope you are starting to see that this integration by parts thing is good. If you always write down the substitutions clearly (who is who in $\int u dv$), and use the formula correctly ($=uv-\int v du$) you can do damage to any integral. Sometimes the choice of $u$ and $dv$ you make might not be good: if the integral $\int v du$ is not simpler than the original $\int u dv$ then what is the point of integrating by parts?

Sometimes, however, you can get into a weird self-referential loop when doing integration by parts. After a couple of integration-by-parts steps you might end up back with an integral you started with! The way out of this loop is best shown by example.

Example 4

Evaluate the integral $ \int \sin(x) e^x\; dx$. First we let $u = \sin(x) $ and $dv=e^x \; dx$, which gives $dv=\cos(x)dx$ and $v=e^x$. Using integration by parts gives \[ \int \sin(x) e^x\, dx = e^x\sin(x)- \int \cos(x)e^x\, dx. \]

We integrate by parts again. This time we set $u = \cos(x)$, $dv=e^x dx$ and $du=-\sin(x)dx$, $v=e^x$. We obtain \[ \underbrace{ \int \sin(x) e^x\, dx}_I \ = \ e^x\sin(x) - e^x\cos(x)\ \ -\ \ \underbrace{\int e^x \sin(x)\, dx}_I. \] Do you see the Ouroboros? We could continue integrating by parts indefinitely like that.

Let us define clearly what we are doing here. The question asked us to find $I$ where \[ I = \int \sin(x) e^x\, dx, \] and after doing two integration by parts steps we obtain the following equation: \[ I = e^x\sin(x) - e^x\cos(x) - I. \] OK, good. Now just move all the I's to one side: \[ 2I = e^x\sin(x) - e^x\cos(x), \] or finally \[ \int \sin(x) e^x\, dx = I = \frac{1}{2} e^x\left(\sin(x) - \cos(x) \right) +C. \]

Derivation of the Integration by parts formula

Remember the product rule for derivatives? \[ \frac{d}{dx}(f(x)g(x)) = \frac{df}{dx}g(x) + f(x)\frac{dg}{dx}. \] We can rewrite this as: \[ f(x)\frac{dg}{dx} = \frac{d}{dx}(f(x)g(x)) \ -\ \frac{df}{dx}g(x) . \] Now we take the integral on both sides \[ \int f(x)\frac{dg}{dx} \ dx \ = \ \int \left[ \frac{d}{dx}(f(x)g(x)) \; dx - \frac{df}{dx}g(x) \; dx \right]. \]

At this point, you need to recall the Fundamental Theorem of Calculus, which says that taking the derivative and taking an integral are inverse operations \[ \int \frac{d}{dx} h(x) \; dx = h(x). \] We use this to simplify the product rule equation as follows: \[ \int f(x)\frac{dg}{dx} \; dx \ = \ f(x)g(x) \ \ - \ \ \int \frac{df}{dx}g(x) \; dx. \]

Outro

We are done. Now you know all the integration techniques. I know it took a while, but we had to go through a lot of tricks. In any case, I must say I am glad to be done writing this section. My job of teaching you is done. Now your job begins. Do all the examples you can find. Do all the exercises. Practice the tricks.

Here is a suggestion for you. Make your own formula-sheet-slash-trophy-case where you record any complex integral that you have personally calculated from first principles in homework assignments. If by the end of the class you trophy case has 50 integrals which you calculated yourself, then you will get $100\%$ on your final. Another thing to try is to go over the integral formulas in the back of the book and see how many of them you can derive.

Links

[ More examples of integration techniques ]
http://en.wikibooks.org/wiki/Calculus/Integration_techniques/

Series

Can you compute $\ln(2)$ using only a basic calculator with four operations: [+], [-], [$\times$], [$\div$]? I can tell you one way. Simply compute the following sum: \[ 1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \frac{1}{5} - \frac{1}{6} + \frac{1}{7} + \ldots. \] We can compute the above sum for large values of $n$ using live.sympy.org:

  >>> def axn_ln2(n): return 1.0*(-1)**(n+1)/n
  >>> sum([ axn_ln2(n)  for n in range(1,100) ])
        0.69(817217931)
  >>> sum([ axn_ln2(n)  for n in range(1,1000) ])
        0.693(64743056)
  >>> sum([ axn_ln2(n)  for n in range(1,1000000) ])
        0.693147(68056)
  >>> ln(2).evalf()
        0.693147180559945

As you can see, the more terms you add in this series, the more accurate the series approximation of $\ln(2)$ becomes. A lot of practical mathematical computations are done in this iterative fashion. The notion of series is a powerful way to calculate quantities to arbitrary precision by summing together more and more terms.

Definitions

  • $\mathbb{N}$: $ = \{0, 1, 2, 3, 4, 5, 6, \ldots \}$.
  • $\mathbb{N}^*=\mathbb{N} \setminus \{0\}$: = $\{1, 2, 3, 4, 5, 6, \ldots \}$.
  • $a_n$: sequence of numbers $a_0, a_1, a_2, a_3, a_4, \ldots$.
  • $\sum$: sum. Means to take the sum of several objects

put together. The summation sign is the short way to express

  certain long expressions:
  \[
    a_3 + a_4 + a_5 + a_6 + a_7 = \sum_{3 \leq i \leq 7} a_i = \sum_{i=3}^7 a_i.
  \]
* $\sum a_i$: series. The running total of a sequence until $n$:
  \[
     S_n = \sum_{i=1}^n a_i  = a_1 + a_2 + \ldots + a_{n-1} + a_n.
  \]
  Most often, we take the sum of all the terms in the sequence:
  \[
     S_\infty = \sum_{i=1}^\infty = a_1 + a_2 + a_{3} + a_4 + \ldots.
  \]
* $n!$: the //factorial// function: $n!=n(n-1)(n-2)\cdots 3\cdot2\cdot1$.
* $f(x)=\sum_{n=0}^\infty a_n x^n$: //Taylor series// approximation
  of the function $f(x)$. It has the form of an infinitely long polynomial
  $a_0 + a_1x + a_2x^2 + a_3x^3 + \ldots$ where the coefficients $a_n$ are
  chosen so as to encode the properties of the function $f(x)$.

Exact sums

There exist formulas for calculating the exact sum of certain series. Sometimes even infinite series can be calculated exactly.

The sum of the geometric series of length $n$ is: \[ \sum_{k=0}^n r^k = 1 + r + r^2 + \cdots + r^n =\frac{1-r^{n+1}}{1-r}. \]

If $|r|<1$, we can take the limit as $n\to \infty$ in the above expression to obtain: \[ \sum_{k=0}^\infty r^k=\frac{1}{1-r}. \]

Example

Consider the geometric series with $r=\frac{1}{2}$. If we apply the above formula formula we obtain \[ \sum_{k=0}^\infty \left(\frac{1}{2}\right)^k=\frac{1}{1-\frac{1}{2}} = 2. \]

You can also visualize this infinite summation graphically. Imagine you start with a piece of paper of size one-by-one and then you add next to it a second piece of paper with half the size of the first, and a third piece with half the size of the second, etc. The total area that this sequence of pieces of papers will occupy is:

The geometric progression visualized for the case when r is equal to one half.

\[ \ \]

The sum of the first $N+1$ terms in arithmetic progression is given by: \[ \sum_{n=0}^N (a_0+nd)= a_0(N+1)+\frac{N(N+1)}{2}d. \]

We have the following closed form expression involving the first $N$ integers: \[ \sum_{k=1}^N k = \frac{N(N+1)}{2}, \qquad \quad \sum_{k=1}^N k^2=\frac{N(N+1)(2N+1)}{6}. \]

Other series which have exact formulas for their sum are the $p$-series with even values of $p$: \[ \sum_{n=1}^\infty\frac{1}{n^2}=\frac{\pi^2}{6}, \quad \sum_{n=1}^\infty\frac{1}{n^4}=\frac{\pi^4}{90}, \quad \sum_{n=1}^\infty\frac{1}{n^6}=\frac{\pi^6}{945}. \] These series are computed by Euler's method.

Other closed form sums: \[ \sum_{n=1}^\infty\frac{(-1)^{n+1}}{n^2}=\frac{\pi^2}{12}, \qquad \quad \sum_{n=1}^\infty\frac{(-1)^{n+1}}{n}=\ln(2), \] \[ \sum_{n=1}^\infty\frac{1}{4n^2-1}=\frac{1}{2}, \] \[ \sum_{n=1}^\infty\frac{1}{(2n-1)^2}=\frac{\pi^2}{8}, \quad \sum_{n=1}^\infty\frac{(-1)^{n+1}}{(2n-1)^3}=\frac{\pi^3}{32}, \quad \sum_{n=1}^\infty\frac{1}{(2n-1)^4}=\frac{\pi^4}{96}. \]

Convergence and divergence of series

Even when we cannot compute an exact expression for the sum of a series it is very important to distinguish series that converge from series that do not converge. A great deal of what you need to know about series is different tests you can perform on a series in order to check whether it converges or diverges.

Note that convergence of a series is not the same as convergence of the underlying sequence $a_i$. Consider the sequence of partial sums $S_n = \sum_{i=0}^n a_i$: \[ S_0, S_1, S_2, S_3, \ldots , \] where each of these corresponds to \[ a_0, \ \ a_0 + a_1, \ \ a_0 + a_1 + a_2, \ \ a_0 + a_1 + a_2 + a_3, \ldots. \]

We say that the series $\sum a_i$ converges if the sequence of partial sums $S_n$ converges to some limit $L$: \[ \lim_{n \to \infty} S_n = L. \]

As with all limits, the above statement means that for any precision $\epsilon>0$, there exists an appropriate number of terms to take in the series $N_\epsilon$, such that \[ |S_n - L | < \epsilon,\qquad \text{ for all } n \geq N_\epsilon. \]

Sequence convergence test

The only way the partial sums will converge is if the entries in the sequences $a_n$ tend to zero for large $n$. This observation gives us a simple series divergence test. If $\lim\limits_{n\rightarrow\infty}a_n\neq0$ then $\sum\limits_n a_n$ diverges. How could an infinite sum of non-zero quantities add up to a finite number?

Absolute convergence

If $\sum\limits_n|a_n|$ converges, $\sum\limits_n a_n$ also converges. The opposite is not necessarily true, since the convergence of $a_n$ might be due to some negative terms cancelling with the positive ones.

A sequence $a_n$ for which $\sum_n |a_n|$ converges is called absolutely convergent. A sequence $b_n$ for which $\sum_n b_n$ converges, but $\sum_n |b_n|$ diverges is called conditionally convergent.

Decreasing alternating sequences

An alternating series of which the absolute values of the terms are decreasing and go to zero converges.

p-series

The series $\displaystyle\sum_{n=1}^\infty \frac{1}{n^p}$ converges if $p>1$ and diverges if $p\leq1$.

Limit comparison test

Suppose $\displaystyle\lim_{n\rightarrow\infty}\frac{a_n}{b_n}=p$, then the following is true:

  • if $p>0$ then $\sum\limits_{n}a_n$ and $\sum\limits_{n}b_n$ either both converge or both diverge.
  • if $p=0$ holds: if $\sum\limits_{n}b_n$ converges, then $\sum\limits_{n}a_n$ also converges.

n-th root test

If $L$ is defined by $\displaystyle L=\lim_{n\rightarrow\infty}\sqrt[n]{|a_n|}$ then $\sum\limits_{n}a_n$ diverges if $L>1$ and converges if $L<1$. If $L=1$ the test is inconclusive.

Ratio test

$\displaystyle L=\lim_{n\rightarrow\infty}\left|\frac{a_{n+1}}{a_n}\right|$, then is $\sum\limits_{n}a_n$ diverges if $L>1$ and converges if $L<1$. If $L=1$ the test is inconclusive.

Radius of convergence for power series

In a power series $a_n=c_nx^n$, the $n$th term is multiplied by the $n$th power of $x$. For such series, the convergence or divergence of the series depends on the choice of the variable $x$.

The radius of convergence $\rho$ of $\sum\limits_n c_n$ is given by: $\displaystyle\frac{1}{\rho}=\lim_{n\rightarrow\infty}\sqrt[n]{|c_n|}= \lim_{n\rightarrow\infty}\left|\frac{c_{n+1}}{c_n}\right|$. For all $-\rho < x < \rho$ the series $a_n$ converges.

Integral test

If $\int_a^{\infty}f(x)dx<\infty$, then $\sum\limits_n f(n)$ converges.

Taylor series

The Taylor series approximation to the function $\sin(x)$ to the 9th power of $x$ is given by \[ \sin(x) \approx x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!}. \] If we want to get rid of the approximate sign, we have to take infinitely many terms in the series: \[ \sin(x) = \sum_{n=0}^\infty \frac{(-1)^nx^{2n+1}}{(2n+1)!} = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} - \frac{x^{11}}{11!} + \ldots . \]

This kind of formula is known as a Taylor series approximation. The Taylor series of a function $f(x)$ around the point $a$ is given by: \[ \begin{align*} f(x) & =f(a)+f'(a)(x-a)+\frac{f^{\prime\prime}(a)}{2!}(x-a)^2+\frac{f^{\prime\prime\prime}(a)}{3!}(x-a)^3+\cdots \nl & = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n. \end{align*} \]

The McLaurin series of $f(x)$ is the Taylor series expanded at $a=0$: \[ \begin{align*} f(x) & =f(0)+f'(0)x+\frac{f^{\prime\prime}(0)}{2!}x^2+\frac{f^{\prime\prime\prime}(0)}{3!}x^3 + \ldots \nl & = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!}x^n . \end{align*} \]

Taylor series of some common functions: \[ \begin{align*} \cos(x) &= 1 - \frac{x^2}{2} + \frac{x^4}{4!} - \frac{x^6}{6!} + \frac{x^8}{8!} + \ldots \nl e^x &= 1 + x + \frac{x^2}{2} + \frac{x^3}{3!} + \frac{x^4}{4!} + \frac{x^5}{5!} + \ldots \nl \ln(x+1) &= x - \frac{x^2}2 + \frac{x^3}{3} - \frac{x^4}{4} + \frac{x^5}{5} - \frac{x^6}{6} + \ldots \nl \cosh(x) &= 1 + \frac{x^2}{2} + \frac{x^4}{4!} + \frac{x^6}{6!} + \frac{x^8}{8!} + \frac{x^{10} }{10!} + \ldots \nl \sinh(x) &= x + \frac{x^3}{3!} + \frac{x^5}{5!} + \frac{x^7}{7!} + \frac{x^9}{9!} + \frac{x^{11} }{11!} + \ldots \end{align*} \] Note the similarity in the Taylor series of $\sin$, $\cos$ and $\sinh$ and $\cosh$. The formulas are the same, but the hyperbolic version do not alternate.

Explanations

Taylor series

The names Taylor series and McLaurin series are used interchangeably. Another synonym for the same concept is a power series. Indeed, we are talking about a polynomial approximation with coefficients $a_n=\frac{f^{(n)}(0)}{n!}$ in front of different powers of $x$.

If you remember your derivative rules correctly, you can calculate the McLaurin series of any function simply by writing down a power series $a_0 + a_1x + a_2x^2 + \ldots$ taking as the coefficients $a_n$ the value of the n'th derivative divided by the appropriate factorial. The more terms in the series you compute, the more accurate your approximation is going to get.

The zeroth order approximation to a function is \[ f(x) \approx f(0). \] It is not very accurate in general, but at least it is correct at $x=0$.

The best linear approximation to $f(x)$ is its tangent $T(x)$, which is a line that passes through the point $(0, f(0))$ and has slope equal to $f'(0)$. Indeed, this is exactly what the first order Taylor series formula tells us to compute. The coefficient in front of $x$ in the Taylor series is obtained by first calculating $f'(x)$ and then evaluating it at $x=0$: \[ f(x) \approx f(0) + f'(0)x = T(x). \]

To find the best quadratic approximation to $f(x)$, we find the second derivative $f^{\prime\prime}(x)$. The coefficient in front of the $x^2$ term will be $f^{\prime\prime}(0)$ divided by $2!=2$: \[ f(x) \approx f(0) + f'(0)x + \frac{f^{\prime\prime}(0)}{2!}x^2. \]

If we continue like this we will get the whole Taylor series of the function $f(x)$. At step $n$, the coefficient will be proportional to the $n$th derivative of $f(x)$ and the resulting $n$th degree approximation is going to imitate the function in its behaviour up the $n$th derivative.

Proof of the sum of the geometric series

We are looking for the sum $S$ given by: \[ S = \sum_{k=0}^n r^k = 1 + r + r^2 + r^3 + \cdots + r^n. \] Observe that there is a self similar pattern in the expanded summation $S$ where each term to the right has an additional power of $r$. The effects of multiplying by $r$ will therefore to “shift” all the terms of the series: \[ rS = r\sum_{k=0}^n r^k = r + r^2 + r^3 + \cdots + r^n + r^{n+1}, \] we can further add one to both sides to obtain \[ 1 + rS = \underbrace{1 + r + r^2 + r^3 + \cdots + r^n}_S + r^{n+1} = S + r^{n+1}. \] Note how the sum $S$ appears as the first part of the expression on the right-hand side. The resulting equation is quite simple: $1 + rS = S + r^{n+1}$. Since we wanted to find $S$, we just isolate all the $S$ terms to one side: \[ 1 - r^{n+1} = S - rS = S(1-r), \] and then solve for $S$ to obtain $S=\frac{1-r^{n+1}}{1-r}$. Neat no? This is what math is all about, when you see some structure you can exploit to solve complicated things in just a few lines.

Examples

An infinite series

Compute the sum of the infinite series \[ \sum_{n=0}^\infty \frac{1}{N+1} \left( \frac{ N }{ N +1 } \right)^n. \] This may appear complicated, but only until you recognize that this is a type of geometric series $\sum ar^n$, where $a=\frac{1}{N+1}$ and $r=\frac{N}{N+1}$: \[ \sum_{n=0}^\infty \frac{1}{N+1} \left( \frac{ N }{ N +1 } \right)^n = \sum_{n=0}^\infty a r^n = \frac{a}{1-r} = \frac{1}{N+1}\frac{1}{1-\frac{N}{N+1}} = 1. \]

Calculator

How does a calculator compute $\sin(40^\circ)=0.6427876097$ to ten decimal places? Clearly it must be something simple with addition and multiplication, since even the cheapest scientific calculators can calculate that number for you.

The trick is to use the Taylor series approximation of $\sin(x)$: \[ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} + \ldots = \sum_{n=0}^\infty \frac{(-1)^nx^{2n+1}}{(2n+1)!}. \]

To calculate sin of 40 degrees we just compute the sum of the series on the right with $x$ replaced by 40 degrees (expressed in radians). In theory, we need to sum infinitely many terms to satisfy the equality, but in practice you calculator will only have to sum the first seven terms in the series in order to get an accuracy of 10 digits after the decimal. In other words, the series converges very quickly.

Let me show you how this is done in Python. First we define the function for the $n^{\text{th}}$ term: \[ a_n(x) = \frac{(-1)^nx^{2n+1}}{(2n+1)!} \]

  >>> def axn_sin(x,n): return (-1.0)**n * x**(2*n+1) / factorial(2*n+1)

Next we convert $40^\circ$ to radians:

 >>> forti = (40*pi/180).evalf()
      0.698131700797732          # 40 degrees in radians

NOINDENT These are the first 10 coefficients in the series:

 >>> [ axn_sin( forti ,n) for n in range(0,10) ] 
 [(0, 0.69813170079773179),      # the values of a_n for Taylor(sin(40)) 
  (1, -0.056710153964883062),
  (2, 0.0013819920621191727),
  (3, -1.6037289757274478e-05),
  (4, 1.0856084058295026e-07),
  (5, -4.8101124579279279e-10),
  (6, 1.5028144059670851e-12),
  (7, -3.4878738801065803e-15),
  (8, 6.2498067170560129e-18),
  (9, -8.9066666494280343e-21)]

NOINDENT To compute $\sin(40^\circ)$ we sum together all the terms:

 >>> sum( [ axn_sin( forti ,n) for n in range(0,10) ] )
      0.642787609686539    	   # the Taylor approximation value
  
 >>> sin(forti).evalf()
      0.642787609686539   	   # the true value of sin(40)

Discussion

You can think of the Taylor series as “similarity coefficients” between $f(x)$ and the different powers of $x$. By choosing the coefficients as we have $a_n = \frac{f^{(n)}(?)}{n!}$, we guarantee that Taylor series approximation and the real function $f(x)$ will have identical derivatives. For a McLaurin series the similarity between $f(x)$ and its power series representation is measured at the origin where $x=0$, so the coefficients are chosen as $a_n = \frac{f^{(n)}(0)}{n!}$. The more general Taylor series allow us to build an approximation to $f(x)$ around any point $x_o$, so the similarity coefficients are calcualted to match the derivatives at that point: $a_n = \frac{f^{(n)}(x_o)}{n!}$.

Another way of looking at the Taylor series is to imagine that it is a kind of X-ray picture for each function $f(x)$. The zeroth coefficient $a_0$ in the power series tells you how much of the constant function there is in $f(x)$. The first coefficient, $a_1$, tells you how much of the linear function $x$ there is in $f$, the coefficient $a_2$ tells you about the $x^2$ contents of $f$, and so on and so forth.

Now get ready for some crazy shit. Using your new found X-ray vision for functions, I want you to go and take a careful look at the power series for $\sin(x)$, $\cos(x)$ and $e^x$. As you will observe, it is as if $e^x$ contains both $\sin(x)$ and $\cos(x)$, except for the alternating negative signs. How about that? This is a sign that these three functions are somehow related in a deeper mathematical sense: recall Euler's formula.

Exercises

Derivative of a series

Show that \[ \sum_{n=0}^\infty \frac{1}{N+1} \left( \frac{ N }{ N +1 } \right)^n n = N. \] Hint: take the derivative with respect to $r$ on both sides of the formula for the geometric series.

Basis

One of the most important concepts in the study of vectors is the concept of a basis. In the English language, the word basis carries the meaning of criterion. Thus, in the sentence “The students were selected on the basis of their results in the MEQ exams” means that the numerical results of some stupid test were used in order to classify the worth of the candidates. Sadly, this type of thing happens a lot and people often disregard the complex characteristics of a person and focus on a single criterion. The meaning of basis in mathematics is more holistic. A basis is a set of criteria that collectively capture all the information about an object.

Let's start with a simple example. If one looks at the HTML code behind the average web-page there will certainly be at least one mention of a colour like background-color:#336699; which should be read as a triplet of values $(33,66,99)$, each one describing how much red, green and blue is needed to create the given colour. The triple $(33,66,99)$ describes the colour “hotmail blue.” This convention for colour representation is called the RGB scale or something I would like to call this the RGB basis. A basis is a set of elements which can be used together to express something more complicated. In our case we have the R, G and B elements which are pure colours and when mixed appropriately they can create any colour. Schematically we can write this as: \[ {\rm RGB\_color}(33,66,99)=33{\mathbf R}+66{\mathbf G}+99{\mathbf B}, \] where we are using the coefficients to determine the strength of each colour component. To create the colour, we combine its components and the $+$ operation symbolizes the mixing of the colours. The reason why we are going into such detail is to illustrate that the coefficients by themselves do not mean much. In fact they do not mean anything unless we know the basis that is being used.

Another colour scheme that is commonly used is the cyan, magenta and yellow (CMY) colour basis. We would get a completely different colour if we were to interpret the same triplet of coordinates $(33,66,99)$ with respect to the CMY basis. To express the “hotmail blue” colour in the CMY basis you would need the following coefficients: \[ {\rm Hotmail Blue} = (33,66,99)_{RGB} = (222,189,156)_{CMY}. \]

A basis is a mapping which converts mathematical objects like the triple $(a,b,c)$ into real world ideas like colours. If there is ever an ambiguity about which basis is being used for a given vector, we can indicate the basis as a subscript after the bracket as we did above.

The ijk Basis

Look at the bottom left corner of the room you are in. Let's call “the $x$ axis” the edge between the wall that is to your left and the floor. The right wall and the floor meet at the $y$ axis. Finally, the vertical line where the two walls meet will be called the $z$ axis. This is a right-handed $xyz$ coordinate system. It is used by everyone in math and physics. It has three very nice axes. They are nice because they are orthogonal (perpendicular, i.e., at 90$^\circ$ with each other) and orthoginal is good for your life. We will see why that is shortly.

Now take an object of fixed definite length, say the size of your foot. We will call this the unit length. Measure a unit length along the $x$ axis. This is the $\hat{\imath}$ vector. Repeat the same procedure with the $y$ axis and you will have the $\hat{\jmath}$ vector. Using these two vectors and the property of addition, we can build new vectors. For example, I can describe a vector pointing at 45$^\circ$ with both the $x$ axis and the $y$ axis by the following expression: \[ \vec{v}=1\:\hat{\imath}+ 1\:\hat{\jmath}, \] which means measure one step out on the $x$ axis, one step out on the $y$ axis. Using our two basis vectors we can express any vector in the plane of the floor by a linear combination like \[ \vec{v}_{\mathrm{spoint\ on\ the\ floor}}=a\:\hat{\imath}+b\:\hat{\jmath}. \] The precise mathematical statement that describtes this situation is that the basis formed by the pair $\hat{\imath}$,$\hat{\jmath}$ span the two dimensional space of the floor. We can extend this idea to three dimensions by specifying the coordinates of any point in room as a weighted sum of the three basis vectors: \[ \vec{v}_{\mathrm{point\ in\ the\ room}}=a\:\hat{\imath}+b\:\hat{\jmath}+c\:\hat{k}, \] where $\hat{k}$ is the unit length vector along the $z$ axis.

Choice of basis

In the case where it is clear which coordinate system we are using in a particular situation, we can take the liberty to omit the explicit mention of the basis vectors and simply write $(a,b,c)$ as an ordered triplet which contains only the coefficients. When there is more than one basis in some context (like in problems where you have to change basis, then for every tuple of numbers we should be explicit about which basis it refers to. We can do this by putting a subscript after the tuple. For example, the vector $\vec{v}=a\:\hat{\imath} + b\:\hat{\jmath}+c\:\hat{k}$ in the standard basis is referred to as $(a,b,c)_{\hat{\imath}\hat{\jmath}\hat{k}}$.

Discussion

It is hard to over-emphasize the importance of the notion of a basis. Every time you solve a problem with vectors, you need to be consistent in your choice of basis, because all the numbers and variables in your equations will depend on it. The basis is the bridge between real world vector quantities and their mathematical representation in terms of components.

Eigenvalues and eigenvectors

The set of eigenvectors of a matrix is a special set of input vectors for which the action of the matrix is described as a scaling. Decomposing a matrix in terms of its eigenvalues and its eigenvectors gives valuable insights into the properties of the matrix.

Certain matrix calculations like computing the power of the matrix become much easier when we use the eigendecomposition of the matrix. For example, suppose you are given a square matrix $A$ and you want to compute $A^5$. To make this example more concrete, let's use the matrix \[ A = \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}. \]

We want to compute \[ A^5 = \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}. \] That is a lot of matrix multiplications. You'll have to multiply and add entries for a while! Imagine how many times you would have to multiply the matrix if I had asked for $A^{55}$ instead?

Let's be smart about this. Every matrix corresponds to some linear operation. This means that it is a legitimate question to ask “what does the matrix $A$ do?” and once we figure out what it does, we can compute $A^{55}$ by simply doing what $A$ does $55$ times.

The best way to see what a matrix does is to look inside of it and see what it is made of. What is its natural basis (own basis) and what are its values (own values).

Deep down inside, the matrix $A$ is really a product of three matrices: \[ \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix} = \underbrace{\begin{bmatrix} 0.850.. & -0.525.. \nl 0.525.. & 0.850.. \end{bmatrix} }_Q \ \underbrace{\! \begin{bmatrix} 1.618.. & 0 \nl 0 &-0.618.. \end{bmatrix} }_{\Lambda} \underbrace{ \begin{bmatrix} 0.850.. & 0.525.. \nl -0.525.. & 0.850.. \end{bmatrix} }_{Q^{-1}}. \] \[ A = Q\Lambda Q^{-1} \] I am serious. You can multiply these three matrices together and you will get $A$. Notice that the “middle matrix” $\Lambda$ (the capital Greek letter lambda) has entries only on the diagonal, the matrix $\Lambda$ is sandwiched between between the matrix $Q$ on the left and $Q^{-1}$ (the inverse of $Q$) on the right. This way of writing $A$ will allow us to compute $A^5$ in a civilized manner: \[ \begin{eqnarray} A^5 & = & A A A A A \nl & = & Q\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda \underbrace{Q^{-1}Q}_{I}\Lambda Q^{-1} \nl & = & Q\Lambda I \Lambda I \Lambda I \Lambda I \Lambda Q^{-1} \nl & = & Q\Lambda \Lambda \Lambda \Lambda \Lambda Q^{-1} \nl & = & Q\Lambda^5 Q^{-1}. \end{eqnarray} \]

Since the matrix $\Lambda$ is diagonal, it is really easy to compute its fifth power $\Lambda^5$: \[ \begin{bmatrix} 1.618.. & 0 \nl 0 &-0.618.. \end{bmatrix}^5 = \begin{bmatrix} (1.618..)^5 & 0 \nl 0 &(-0.618..)^5 \end{bmatrix} = \begin{bmatrix} 11.090.. & 0 \nl 0 &-0.090.. \end{bmatrix}\!. \]

Thus we have \[ \begin{bmatrix} 1 & 1 \nl 1 & 0 \end{bmatrix}^5 \! = \underbrace{\begin{bmatrix} 0.850..\! & -0.525.. \nl 0.525..\! & 0.850.. \end{bmatrix} }_Q \! \begin{bmatrix} 11.090.. \! & 0 \nl 0 \! &-0.090.. \end{bmatrix} \! \underbrace{ \begin{bmatrix} 0.850.. & 0.525.. \nl -0.525.. & 0.850.. \end{bmatrix} }_{Q^{-1}}\!. \] We still have to multiply these three matrices together, but we have brought down the work from $4$ matrix multiplications down to just two.

The answer is \[ A^5 = Q\Lambda^5 Q^{-1} = \begin{bmatrix} 8 & 5 \nl 5 & 3 \end{bmatrix}. \]

Using the same technique, we can just as easily compute $A^{55}$: \[ A^{55} = Q\Lambda^{55} Q^{-1} = \begin{bmatrix} 225851433717 & 139583862445 \nl 139583862445 & 86267571272 \end{bmatrix}. \]

We could even compute $A^{5555}$ if we wanted to, but you get the point. If you look at $A$ in the right basis, repeated multiplication only involves computing the powers of its eigenvalues.

Definitions

  • $A$: an $n\times n$ square matrix.

When necessary, we will denote the individual entries of $A$ as $a_{ij}$.

  • $\textrm{eig}(A)\equiv(\lambda_1, \lambda_2, \ldots, \lambda_n )$:

the list of eigenvalues of $A$. Usually denoted with the greek letter lambda.

  Note that some eigenvalues could be repeated.
* $p(\lambda)=\det(A - \lambda I)$: 
  the //characteristic polynomial// for the matrix $A$. The eigenvalues are the roots of this polynomial.
* $\{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \ldots, \vec{e}_{\lambda_n} \}$: 
  the set of //eigenvectors// of $A$. Each eigenvector is associated with a corresponding eigenvalue.
* $\Lambda  \equiv {\rm diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$: 
  the diagonal version of $A$. The matrix $\Lambda$ contains the eigenvalues of $A$ on the diagonal:
  \[
   \Lambda = 
   \begin{bmatrix}
   \lambda_1	&  \cdots  &  0 \nl
   \vdots 	&  \ddots  &  0  \nl
   0  	&   0      &  \lambda_n
   \end{bmatrix}.
  \]
  The matrix $\Lambda$ corresponds to the matrix representation of $A$ with respect to its eigenbasis.
* $Q$: a matrix whose columns are the eigenvectors of $A$:
  \[
   Q 
   \equiv
   \begin{bmatrix}
   |  &  & | \nl
   \vec{e}_{\lambda_1}  &  \cdots &  \vec{e}_{\lambda_n} \nl
   |  &  & | 
   \end{bmatrix}
    =  \ 
   _{B_s}\![I]_{B_\lambda}.
  \]
  The matrix $Q$ corresponds to the //change of basis matrix// 
  from the eigenbasis $B_\lambda = \{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \vec{e}_{\lambda_3}, \ldots \}$
  to the standard basis $B_s = \{\hat{\imath}, \hat{\jmath}, \hat{k}, \ldots \}$.
* $A=Q\Lambda Q^{-1}$: the //eigendecomposition// of the matrix $A$.
* $\Lambda = Q^{-1}AQ$: the //diagonalization// of the matrix $A$.

TODO: fix/tensorify indices above and use \ mathbbm{1} instead of I

Eigenvalues

The eigenvalue equation is \[ A\vec{e}_\lambda =\lambda\vec{e}_\lambda, \] where $\lambda$ is an eigenvalue and $\vec{e}_\lambda$ is an eigenvector of the matrix $A$. If we multiply $A$ by an eigenvector $\vec{e}_\lambda$, we get back the same vector scaled by the constant $\lambda$.

To find the eigenvalue of a matrix we start from the eigenvalue equation $A\vec{e}_\lambda =\lambda\vec{e}_\lambda$, insert the identity ${11}$, and rewrite it as a null-space problem: \[ A\vec{e}_\lambda =\lambda{11}\vec{e}_\lambda \qquad \Rightarrow \qquad \left(A - \lambda{11}\right)\vec{e}_\lambda = \vec{0}. \] This equation will have a solution whenever $|A - \lambda{11}|=0$. The eigenvalues of $A \in \mathbb{R}^{n \times n}$, denoted $(\lambda_1, \lambda_2, \ldots, \lambda_n )$, are the roots of the characteristic polynomial: \[ p(\lambda)=\det(A - \lambda I) \equiv |A-\lambda I|=0. \] When we calculate this determinant, we'll obtain an expression involving the coefficients $a_{ij}$ and the variable $\lambda$. If $A$ is an $n \times n $ matrix, the characteristic polynomial is of degree $n$ in the variable $\lambda$.

We denote the list of eigenvalues as $\textrm{eig}(A)=( \lambda_1, \lambda_2, \ldots, \lambda_n )$. If a $\lambda_i$ is a repeated root of the characteristic polynomial $p(\lambda)$, we say that it is a degenerate eigenvalue. For example the identity matrix $I \in \mathbb{R}^{2\times 2}$ has the characteristic polynomial $p_I(\lambda)=(\lambda-1)^2$ which has a repeated root at $\lambda=1$. We say the eigenvalue $\lambda=1$ has algebraic multiplicity $2$. It is important to keep track of degenerate eigenvalues, so we'll specify the multiplicity of an eigenvalue by repeatedly including it in the list of eigenvalues $\textrm{eig}(I)=(\lambda_1, \lambda_2) = (1,1)$.

Eigenvectors

The eigenvectors associated with eigenvalue $\lambda_i$ of matrix $A$ are the vectors in the null space of the matrix $(A-\lambda_i I )$.

To find the eigenvectors associated with the eigenvalue $\lambda_i$, you have to solve for the components $e_{\lambda,x}$ and $e_{\lambda,y}$ of the vector $\vec{e}_\lambda=(e_{\lambda,x},e_{\lambda,y})$ that satisfies the equation: \[ A\vec{e}_\lambda =\lambda\vec{e}_\lambda, \] or equivalently \[ (A-\lambda I ) \vec{e}_\lambda = 0\qquad \Rightarrow \qquad \begin{bmatrix} a_{11}-\lambda & a_{12} \nl a_{21} & a_{22}-\lambda \end{bmatrix} \begin{bmatrix} e_{\lambda,x} \nl e_{\lambda,y} \end{bmatrix} = \begin{bmatrix} 0 \nl 0 \end{bmatrix}. \]

If $\lambda_i$ is a repeated root (degenerate eigenvalue), the null space $(A-\lambda_i I )$ could contain multiple eigenvectors. The dimension of the null space of $(A-\lambda_i I )$ is called the geometric multiplicity of the eigenvalue $\lambda_i$.

Eigendecomposition

If an $n \times n$ matrix $A$ is diagonalizable, this means that we can find $n$ eigenvectors for that matrix. The eigenvectors that come from different eigenspaces are guaranteed to be linearly independent (see exercises). We can also pick a set of linearly independent vectors within each of the degenerate eigenspaces. Combining the eigenvectors from all the eigenspaces we get a set of $n$ linearly independent eigenvectors, which form a basis for $\mathbb{R}^n$. We call this the eigenbasis.

Let's put the $n$ eigenvectors next to each other as the columns of a matrix: \[ Q = \begin{bmatrix} | & & | \nl \vec{e}_{\lambda_1} & \cdots & \vec{e}_{\lambda_n} \nl | & & | \end{bmatrix}. \]

We can decompose $A$ into its eigenvalues and its eigenvectors: \[ A = Q \Lambda Q^{-1} = \begin{bmatrix} | & & | \nl \vec{e}_{\lambda_1} & \cdots & \vec{e}_{\lambda_n} \nl | & & | \end{bmatrix} \begin{bmatrix} \lambda_1 & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & \lambda_n \end{bmatrix} \begin{bmatrix} \ \nl \ \ \ \ \ \ Q^{-1} \ \ \ \ \ \ \nl \ \end{bmatrix}. \] The matrix $\Lambda$ is a diagonal matrix of eigenvalues and the matrix $Q$ is the “change of basis” matrix which contains the corresponding eigenvectors as columns.

Note that only the direction of each eigenvector is important and not the length. Indeed if $\vec{e}_\lambda$ is an eigenvector (with value $\lambda$), then so is any $\alpha \vec{e}_\lambda$ for all $\alpha \in \mathbb{R}$. Thus we are free to use any multiple of the vectors $\vec{e}_{\lambda_i}$ as the columns of the matrix $Q$.

Example

Find the eigenvalues, the eigenvectors and the diagonalization of the matrix: \[ A=\begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix}. \]

The eigenvalues of the matrix are (in decreasing order) \[ \lambda_1 = 3, \quad \lambda_2 = 2, \quad \lambda_3= 1. \] When an $n \times n$ matrix has $n$ distinct eigenvalues, it is diagonalizable since it will have $n$ linearly independent eigenvectors. Since the matrix $A$ has has $3$ different eigenvalues it is diagonalizable.

The eigenvalues of $A$ are the values that will appear in the diagonal of $\Lambda$, so by finding the eigenvalues of $A$ we already know its diagonalization. We could stop here, but instead, let's continue and find the eigenvectors of $A$.

The eigenvectors of $A$ are found by solving for the null space of the matrices $(A-3I)$, $(A-2I)$, and $(A-I)$ respectively: \[ \vec{e}_{\lambda_1} = \begin{bmatrix} -1 \nl -1 \nl 2 \end{bmatrix}, \quad \vec{e}_{\lambda_2} = \begin{bmatrix} 0 \nl 0 \nl 1 \end{bmatrix}, \quad \vec{e}_{\lambda_3} = \begin{bmatrix} -1 \nl 0 \nl 2 \end{bmatrix}. \] Check that $A \vec{e}_{\lambda_k} = \lambda_k \vec{e}_{\lambda_k}$ for each of the above vectors. Let $Q$ be the matrix with these eigenvectors as its columns: \[ Q= \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix}, \qquad \textrm{and} \qquad Q^{-1} = \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix}. \] These matrices form the eigendecomposition of the matrix $A$: \[ A = Q\Lambda Q^{-1} = \begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix} = \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix} \!\! \begin{bmatrix} 3 & 0 & 0 \nl 0 & 2 & 0 \nl 0 & 0 & 1\end{bmatrix} \!\! \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix}\!. \]

To find the diagonalization of $A$, we must move $Q$ and $Q^{-1}$ to the other side of the equation. More specifically, we multiply the equation $A=Q\Lambda Q^{-1}$ by $Q^{-1}$ on the left and by $Q$ on the right to obtain the diagonal matrix: \[ \Lambda = Q^{-1}AQ = \begin{bmatrix} 0 & -1 & 0 \nl 2 & 0 & 1 \nl -1 & 1 & 0 \end{bmatrix} \!\! \begin{bmatrix} 1 & 2 & 0 \nl 0 & 3 & 0 \nl 2 & -4 & 2 \end{bmatrix} \!\! \begin{bmatrix} -1 & 0 & -1 \nl -1 & 0 & 0 \nl 2 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 3 & 0 & 0 \nl 0 & 2 & 0 \nl 0 & 0 & 1\end{bmatrix}\!. \]

Explanations

Eigenspaces

Recall the definition of the null space of a matrix $M$: \[ \mathcal{N}(M) \equiv \{ \vec{v} \in \mathbb{R}^n \ | \ M\vec{v} = 0 \}. \] The dimension of the null space is the number of linearly independent vectors you can find in the null space. If $M$ sends exactly two linearly independent vectors $\vec{v}$ and $\vec{w}$ to the zero vector: \[ M\vec{v} = 0, \qquad M\vec{w} = 0, \] then the null space is two-dimensional. We can always choose the vectors $\vec{v}$ and $\vec{w}$ to be orthogonal $\vec{v}\cdot\vec{w}=0$ and thus obtain an orthogonal basis for the null space.

Each eigenvalue $\lambda_i$ has an eigenspace associated with it. The eigenspace is the null space of the matrix $(A-\lambda_i I)$: \[ E_{\lambda_i} \equiv \mathcal{N}\left( A-\lambda_i I \right) = \{ \vec{v} \in \mathbb{R}^n \ | \ \left( A-\lambda_i I \right)\vec{v} = 0 \}. \] For degenerate eigenvalues (repeated roots of the characteristic polynomial) the null space of $\left( A-\lambda_i I \right)$ could contain multiple eigenvectors.

Change of basis

The matrix $Q$ can be interpreted as a change of basis matrix. Given a vector written in terms of the eigenbasis $[\vec{v}]_{B_{\lambda}}=(v^\prime_1,v^\prime_2,v^\prime_3)_{B_{\lambda}} = v^\prime_1\vec{e}_{\lambda_1}+ v^\prime_2\vec{e}_{\lambda_3}+v^\prime_3\vec{e}_{\lambda_3}$, we can use the matrix $Q$ to convert it to the standard basis $[\vec{v}]_{B_{s}} = (v_1, v_2,v_3) = v_1\hat{\imath} + v_2\hat{\jmath}+v_3\hat{k}$ as follows: \[ [\vec{v}]_{B_{s}} = \ Q [\vec{v}]_{B_{\lambda}} = \ _{B_{s}\!}[{11}]_{B_{\lambda}} [\vec{v}]_{B_{\lambda}}. \]

The change of basis in the other direction is given by the inverse matrix: \[ [\vec{v}]_{B_{\lambda}} = \ Q^{-1} [\vec{v}]_{B_{s}} = _{B_{\lambda}\!}\left[{11}\right]_{B_{s}} [\vec{v}]_{B_{s}}. \]

Interpretations

The eigendecomposition $A = Q \Lambda Q^{-1}$ allows us to interpret the action of $A$ on an arbitrary input vector $\vec{v}$ as the following three steps: \[ [\vec{w}]_{B_{s}} = \ _{B_{s}\!}[A]_{B_{s}} [\vec{v}]_{B_{s}} = Q\Lambda Q^{-1} [\vec{v}]_{B_{s}} = \ \underbrace{\!\!\ _{B_{s}\!}[{11}]_{B_{\lambda}} \ \underbrace{\!\!\ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} \underbrace{\ _{B_{\lambda}\!}[{11}]_{B_{s}} [\vec{v}]_{B_{s}} }_1 }_2 }_3. \]

  1. In the first step we convert the vector $\vec{v}$ from the standard basis

to the eigenabasis.

  1. In the second step the action of $A$ on vectors expressed with respect to its eigenbasis

corresponds to a multiplication by the diagonal matrix $\Lambda$.

  1. In the third step we convert the output $\vec{w}$ from the eigenbasis

back to the standard basis.

Another way of interpreting the above steps is to say that, deep down inside, the matrix $A$ is actually the diagonal matrix $\Lambda$. To see the diagonal form of the matrix, we have to express the input vectors with respect to the eigenabasis: \[ [\vec{w}]_{B_{\lambda}} = \ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} [\vec{v}]_{B_{\lambda}}. \]

It is extremely important that you understand the the equation $A=Q\Lambda Q^{-1}$ intuitively in terms of the three step procedure. To help you understand, we'll analyze in detail what happens when we multiply $A$ by one of its eigenvectors. Let's pick $\vec{e}_{\lambda_1}$ and verify the equation $A\vec{e}_{\lambda_1} = Q\Lambda Q^{-1}\vec{e}_{\lambda_1} \lambda_1\vec{e}_{\lambda_1}$ by follow the vector through the three steps: \[ \ _{B_{s}\!}[A]_{B_{s}} [\vec{e}_{\lambda_1}]_{B_{s}} = Q\Lambda Q^{-1} [\vec{e}_{\lambda_1}]_{B_{s}} = \ \underbrace{\!\!\ _{B_{s}\!}[{11}]_{B_{\lambda}} \ \underbrace{\!\!\ _{B_{\lambda}\!}[\Lambda]_{B_{\lambda}} \underbrace{\ _{B_{\lambda}\!}[{11}]_{B_{s}} [\vec{e}_{\lambda_1}]_{B_{s}} }_{ (1,0,\ldots)^T_{B_\lambda} } }_{ (\lambda_1,0,\ldots)^T_{B_\lambda} } }_{ \lambda_1 [\vec{e}_{\lambda_1}]_{B_{s}} } = \lambda_1 [\vec{e}_{\lambda_1}]_{B_{s}} \] In the first step, we convert the vector $[\vec{e}_{\lambda_1}]_{B_{s}}$ to the eigenbasis and obtain $(1,0,\ldots,0)^T_{B_\lambda}$. The result of the second step is $(\lambda_1,0,\ldots,0)^T_{B_\lambda}$ because multiplying $\Lambda$ by the vector $(1,0,\ldots,0)^T_{B_\lambda}$ “selects only the first column of $\Lambda$. In the third step we convert $(\lambda_1,0,\ldots,0)^T_{B_\lambda}=\lambda_1(1,0,\ldots,0)^T_{B_\lambda}$ back to the standard basis to obtain $\lambda_1[\vec{e}_{\lambda_1}]_{B_{s}}$.

Invariant properties of matrices

The determinant and the trace of a matrix are strictly functions of the eigenvalues. The determinant of $A$ is the product of its eigenvalues: \[ \det(A) \equiv |A| =\prod_i \lambda_i = \lambda_1\lambda_2\cdots\lambda_n, \] and the trace is their sum: \[ {\rm Tr}(A)=\sum_i a_{ii}=\sum_i \lambda_i = \lambda_1 + \lambda_2 + \cdots \lambda_n. \]

Here are the steps we followed to obtain these equations: \[ |A|=|Q\Lambda Q^{-1}| =|Q||\Lambda| |Q^{-1}| =|Q||Q^{-1}||\Lambda| =|Q| \frac{1}{|Q|}|\Lambda| =|\Lambda| =\prod_i \lambda_i, \] \[ {\rm Tr}(A)={\rm Tr}(Q\Lambda Q^{-1}) ={\rm Tr}(\Lambda Q^{-1}Q) ={\rm Tr}(\Lambda)=\sum_i \lambda_i. \]

In fact the above calculations remain valid when the matrix undergoes any similarity transformation. A similarity transformation is essentially a “change of basis”-type of calculation: the matrix $A$ gets multiplied by an invertible matrix $P$ from the left and by the inverse of $P$ on the right: $A \to PA P^{-1}$. Therefore, the determinant and the trace of a matrix are two properties that do not depend on the choice of basis used to represent the matrix! We say the determinant and the trace are invariant properties of the matrix.

Relation to invertibility

Let us briefly revisit three of the equivalent conditions we stated in the invertible matrix theorem. For a matrix $A \in \mathbb{R}^{n \times n}$, the following statements are equivalent:

  1. $A$ is invertible
  2. $|A|\neq 0$
  3. The null space contains only the zero vector $\mathcal{N}(A)=\vec{0}$

Using the formula $|A|=\prod_{i=1}^n \lambda_i$, it is easy to see why the last two statements are equivalent. If $|A|\neq 0$ then none of the $\lambda_i$s is zero, otherwise the product of the eigenvalues would be zero. We know that $\lambda=0$ is not and eigenvalues of $A$ which means that there is no vector $\vec{v}$ such that $A\vec{v} = 0\vec{v}=\vec{0}$. Therefore there are no vectors in the null space: $\mathcal{N}(A)=\{ \vec{0} \}$.

We can also follow the reasoning in the other direction. If the null space of $A$ is empty, then there is no non-zero vector $\vec{v}$ such that $A\vec{v} = \vec{0}$, which means $\lambda=0$ is not an eigenvalue of $A$, and hence the product $\lambda_1\lambda_2\cdots \lambda_n \neq 0$.

However, if there exists a non-zero vector $\vec{v}$ such that $A\vec{v} = \vec{0}$, then $A$ has a non-empty null space and $\lambda=0$ is an eigenvalue of $A$ and thus $|A|=0$.

Normal matrices

A matrix $A$ is normal if it satisfies the equation $A^TA = A A^T$. All normal matrices are diagonalizable and furthermore the diagonalization matrix $Q$ can be chosen to be an orthogonal matrix $O$.

The eigenvectors corresponding to different eigenvalues of a normal matrix are orthogonal. Furthermore we can always choose the eigenvectors within the same eigenspace to be orthogonal. By collecting the eigenvectors from all of the eigenspaces of the matrix $A \in \mathbb{R}^{n \times n}$, it is possible to obtain a complete basis $\{\vec{e}_1,\vec{e}_2,\ldots, \vec{e}_n\}$ of orthogonal eigenvectors: \[ \vec{e}_{i} \cdot \vec{e}_{j} = \left\{ \begin{array}{ll} \|\vec{e}_i\|^2 & \text{ if } i =j, \nl 0 & \text{ if } i \neq j. \end{array}\right. \] By normalizing each of these vectors we can find a set of eigenvectors $\{\hat{e}_1,\hat{e}_2,\ldots, \hat{e}_n \}$ which is an orthonormal basis for the space $\mathbb{R}^n$: \[ \hat{e}_{i} \cdot \hat{e}_{j} = \left\{ \begin{array}{ll} 1 & \text{ if } i =j, \nl 0 & \text{ if } i \neq j. \end{array}\right. \]

Consider now the matrix $O$ constructed by using these orthonormal vectors as the columns: \[ O= \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix}. \]

The matrix $O$ is an orthogonal matrix, which means that it satisfies $OO^T=I=O^TO$. In other words, the inverse of $O$ is obtained by taking the transpose $O^T$. To see that this is true consider the following product: \[ O^T O = \begin{bmatrix} - & \hat{e}_{1} & - \nl & \vdots & \nl - & \hat{e}_{n} & - \end{bmatrix} \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \nl 0 & \ddots & 0 \nl 0 & 0 & 1 \end{bmatrix} ={11}. \] Each of the ones on the diagonal arises from the dot product of a unit-length eigenvector with itself. The off-diagonal entries are zero because the vectors are orthogonal. By definition, the inverse $O^{-1}$ is the matrix which when multiplied by $O$ gives $I$, so we have $O^{-1} = O^T$.

Using the orthogonal matrix $O$ and its inverse $O^T$, we can write the eigendecomposition of a matrix $A$ as follows: \[ A = O \Lambda O^{-1} = O \Lambda O^T = \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix} \begin{bmatrix} \lambda_1 & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & \lambda_n \end{bmatrix} \begin{bmatrix} - & \hat{e}_{1} & - \nl & \vdots & \nl - & \hat{e}_{n} & - \end{bmatrix}\!. \]

The key advantage of using a diagonalization procedure with an orthogonal matrix $O$ is that computing the inverse is simplified significantly since $O^{-1}=O^T$.

Discussion

Non-diagonalizable matrices

Not all matrices are diagonalizable. For example, the matrix \[ B= \begin{bmatrix} 3 & 1 \nl 0 & 3 \end{bmatrix}, \] has $\lambda = 3$ as a repeated eigenvalue, but the null space of $(B-3{11})$ contains only one vector $(1,0)^T$. The matrix $B$ has a single eigenvector in the eigenspace $\lambda=3$. We're one eigenvector short, and it is not possible to obtain a complete basis of eigenvectors. Therefore we cannot build the diagonalizing change of basis matrix $Q$. We say $B$ is not diagonalizable.

Matrix power series

One of the most useful concepts of calculus is the idea that functions can be represented as Taylor series. The Taylor series of the exponential function $f(x) =e^x$ is \[ e^x = \sum_{k=0}^\infty \frac{x^k}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{3!} + \frac{x^4}{4!} + \frac{x^5}{5!} + \ldots. \] Nothing stops us from using the same Taylor series expression to define the exponential function of a matrix: \[ e^A = \sum_{k=0}^\infty \frac{A^k}{n!} = 1 + A + \frac{A^2}{2} + \frac{A^3}{3!} + \frac{A^4}{4!} + \frac{A^5}{5!} + \ldots . \] Okay, there is one thing stopping us, and that is having to compute an infinite sum of progressively longer matrix products! But wait, remember how we used the diagonalization of $A=Q\Lambda Q^{-1}$ to easily compute $A^{55}=Q\Lambda^{55} Q^{-1}$? We can use that trick here too and obtain the exponential of a matrix in a much simpler form: \[ \begin{align*} e^A & = \sum_{k=0}^\infty \frac{A^k}{n!} = \sum_{k=0}^\infty \frac{(Q\Lambda Q^{-1})^k}{n!} \nl & = \sum_{k=0}^\infty \frac{Q\:\Lambda^k\:Q^{-1} }{n!} \nl & = Q\left[ \sum_{k=0}^\infty \frac{ \Lambda^k }{n!}\right]Q^{-1} \nl & = Q\left( 1 + \Lambda + \frac{\Lambda^2}{2} + \frac{\Lambda^3}{3!} + \frac{\Lambda^4}{4!} + \ldots \right)Q^{-1} \nl & = Qe^\Lambda Q^{-1} = \begin{bmatrix} \ \nl \ \ \ \ \ \ Q \ \ \ \ \ \ \ \nl \ \end{bmatrix} \begin{bmatrix} e^{\lambda_1} & \cdots & 0 \nl \vdots & \ddots & 0 \nl 0 & 0 & e^{\lambda_n} \end{bmatrix} \begin{bmatrix} \ \nl \ \ \ \ \ \ Q^{-1} \ \ \ \ \ \ \nl \ \end{bmatrix}\!. \end{align*} \]

We can use this approach to talk about “matrix functions” of the form: \[ F: \mathbb{M}(n,n) \to \mathbb{M}(n,n), \] simply by defining them as Taylor series of matrices. Computing the matrix function $F(M)$ on an input matrix $M=Q\Lambda Q^{-1}$ is equivalent to computing the function $f$ to the eigenvalues of $M$ as follows: $F(M)=Q\:f(\Lambda)\:Q^{-1}$.

Review

In this section we learned how to decompose matrices in terms of their eigenvalues and eigenvectors. Let's briefly review everything that we discussed. The fundamental equation is $A\vec{e}_{\lambda_i} = \lambda_i\vec{e}_{\lambda_i}$, where the vector $\vec{e}_{\lambda_i}$ is an eigenvector of the matrix $A$ and the number $\lambda_i$ is an eigenvalue of $A$. The word eigen is the German word for self.

The characteristic polynomial comes about from a simple manipulations of the eigenvalue equation: \[ \begin{eqnarray} A\vec{e}_{\lambda_i} & = &\lambda_i\vec{e}_{\lambda_i} \nl A\vec{e}_{\lambda_i} - \lambda \vec{e}_{\lambda_i} & = & 0 \nl (A-{\lambda_i} I)\vec{e}_{\lambda_i} & = & 0. \end{eqnarray} \]

There are two ways we can get a zero, either the vector $\vec{e}_\lambda$ is the zero vector or it lies in the null space of $(A-\lambda I)$. The problem of finding the eigenvalues therefore reduces to finding the values of $\lambda$ for which the matrix $(A-\lambda I)$ is not invertible, i.e., it has a null space. The easiest way to check if a matrix is invertible is to compute the determinant: $|A-\lambda I| = 0$.

There will be multiple eigenvalues and eigenvector that satisfy this equation, so we keep a whole list of eigenvalues $(\lambda_1, \lambda_2, \ldots, \lambda_n )$, and corresponding eigenvectors $\{ \vec{e}_{\lambda_1}, \vec{e}_{\lambda_2}, \ldots \}$.

Applications

Many scientific applications use the eigen-decomposition of a matrix as a building block. We'll mention a few of these applications without going into too much detail. - Principal component analysis - PageRank - quantum mechanics energy, and info-theory TODO, finish the above points

Analyzing a matrix in terms of its eigenvalues and its eigenvectors is a very powerful way to “see inside the matrix” and understand what the matrix does. In the next section we'll analyze several different types of matrices and discuss their properties in terms of their eigenvalues.

Links

[ Good visual examples from wikipedia ]
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

Exercises

Q1

Prove that a collection of nonzero eigenvectors corresponding to distinct eigenvalues is linearly independent.

Hint: Proof by contradiction. Assume that we have $n$ distinct eigenvalues $\lambda_i$ and eigenvectors $\{ \vec{e}_i \}$ which are linearly dependent: $\sum_{i=1}^n \alpha_i \vec{e}_i = \vec{0}$ with some $\alpha_i \neq 0$. If a non-zero combination of $\alpha_i$ really could give the zero vector as a linear combination then this equation would be true: $(A-\lambda_n I )\left(\sum \alpha_i\vec{e}_i\right) = (A-\lambda_n I )\vec{0}=\vec{0}$, but if you expand the expression on the left you will see that it is not equal to zero.

Q2

Show that an $n \times n$ matrix has at most $n$ distinct eigenvalues.

Q3

Linear transformations

In this section we'll study functions that take vectors as inputs and produce vectors as outputs. In order to describe a function $T$ that takes $n$-dimensional vectors as inputs and produces $m$-dimensional vectors as outputs, we will use the notation: \[ T \colon \mathbb{R}^n \to \mathbb{R}^m. \] In particular, we'll restrict our attention to the class of linear transformations, which includes most of the useful transformations from analytic geometry: stretching, projections, reflections, and rotations. Linear transformations are used to describe and model many real-world phenomena in physics, chemistry, biology, and computer science.

Definitions

Linear transformation are mappings between vector inputs and vector outputs:

  • $V =\mathbb{R}^n$: an $n$-dimensional vector space

$V$ is just a nickname we give to $\mathbb{R}^n$, which is the input vector space of $T$.

  • $W = \mathbb{R}^m$: An $m$-dimensional vector space, which is the output space of $T$.
  • ${\rm dim}(U)$: the dimension of the vector space $U$
  • $T:V \to W$: a linear transformation that takes vectors $\vec{v} \in V$ as inputs

and produces outputs $\vec{w} \in W$. $T(\vec{v}) = \vec{w}$.

  • $\textrm{Im}(T)$: the image space of the linear transformation $T$ is the

set of vectors that $T$ can output for some input $\vec{v}\in V$.

  The mathematical definition of the image space is
  \[
    \textrm{Im}(T) 
     = \{ \vec{w} \in W \ | \ \vec{w}=T(\vec{v}), \textrm{ for some } \vec{v}\in V \}.
  \]
  The image space is the vector equivalent of the //image// of a function of a single variable
  which you are familiar with $\{ y \in \mathbb{R} \ | \ y=f(x), \textrm{ for some } x \in \mathbb{R} \}$.
* $\textrm{Null}(T)$: The //null space// of the linear transformation $T$. 
  This is the set of vectors that get mapped to the zero vector by $T$. 
  Mathematically we write:
  \[
    \textrm{Null}(T) \equiv \{\vec{v}\in V   \ | \  T(\vec{v}) = \vec{0} \},
  \]
  and we have $\textrm{Null}(T) \subseteq V$. 
  The null space is the vector equivalent of the set of //roots// of a function,
  i.e., the values of $x$ where $f(x)=0$.

If we fix bases for the input and the output spaces, then a linear transformation can be represented as a matrix product:

  • $B_V=\{ \vec{b}_1, \vec{b}_2, \ldots, \vec{b}_n\}$: A basis for the vector space $V$.

Any vector $\vec{v} \in V$ can be written as:

  \[
    \vec{v} = v_1 \vec{b}_1 + v_1 \vec{b}_1 + \cdots + v_n \vec{b}_n,
  \]
  where $v_1,v_2,\ldots,v_n$ are real numbers, which we call the 
  //coordinates of the vector $\vec{v}$ with respect to the basis $B_V$//.
* $B_W=\{\vec{c}_1, \vec{c}_2, \ldots, \vec{c}_m\}$: A basis for the output vector space $W$.
* $M_T \in \mathbb{R}^{m\times n}$: A matrix representation of the linear transformation $T$:
  \[
     \vec{w} = T(\vec{v})  \qquad \Leftrightarrow \qquad \vec{w} = M_T \vec{v}.
  \]
  Multiplication of the vector $\vec{v}$ by the matrix $M_T$ (from the left) 
  is //equivalent// to applying the linear transformation $T$.
  Note that the matrix representation $M_T$ is //with respect to// the bases $B_{V}$ and $B_{W}$.
  If we need to show the choice of input and output bases explicitly, 
  we will write them in subscripts $\;_{B_W}[M_T]_{B_V}$.
* $\mathcal{C}(M_T)$: The //column space// of a matrix $M_T$ consists of all possible linear
  combinations of the columns of the matrix $M_T$.
  Given $M_T$, the representation of some linear transformation $T$,
  the column space of $M_T$ is equal to the image space of $T$: 
  $\mathcal{C}(M_T) = \textrm{Im}(T)$.
* $\mathcal{N}(M_T)$: The //null space// a matrix $M_T$ is the set of
  vectors that the matrix $M_T$ sends to the zero vector:
  \[
    \mathcal{N}(M_T) \equiv \{ \vec{v} \in V \ | \ M_T\vec{v} = \vec{0} \}.
  \]
  The null space of $M_T$ is equal to the null space of $T$: 
  $\mathcal{N}(M_T) = \textrm{Null}(T)$.

Properties of linear transformation

Linearity

The fundamental property of a linear transformation is, you guessed it, its linearity. If $\vec{v}_1$ and $\vec{v}_2$ are two input vectors and $\alpha$ and $\beta$ are two constants, then: \[ T(\alpha\vec{v}_1+\beta\vec{v}_2)= \alpha T(\vec{v}_1)+\beta T(\vec{v}_2). \]

Transformations as black boxes

Suppose someone gives you a black box which implements the transformation $T$. You are not allowed to look inside the box and see how $T$ acts, but you are allowed to probe the transformation by choosing various input vectors and observing what comes out.

Suppose we have a linear transformation $T$ of the form $T \colon \mathbb{R}^n \to \mathbb{R}^m$. It turns out that probing this transformation with $n$ carefully chosen input vectors and observing the outputs is sufficient to characterize it completely!

To see why this is true, consider a basis $\{ \vec{v}_1, \vec{v}_2, \ldots , \vec{v}_n \}$ for the $n$-dimensional input space $V = \mathbb{R}^n$. Any input vector can be written as a linear combination of the basis vectors: \[ \vec{v} = \alpha_1 \vec{v}_1 + \alpha_2 \vec{v}_2 + \cdots + \alpha_n \vec{v}_n. \] In order to characterize $T$, all we have to do is input each of $n$ basis vectors $\vec{v}_i$ into the black box that implements $T$ and record the $T(\vec{v}_i)$ that comes out. Using these observations and the linearity of $T$ we can now predict the output of $T$ for arbitrary input vectors: \[ T(\vec{v}) = \alpha_1 T(\vec{v}_1) + \alpha_2 T(\vec{v}_2) + \cdots + \alpha_n T(\vec{v}_n). \]

This black box model can be used in many areas of science, and is perhaps one of the most important ideas in linear algebra. The transformation $T$ could be the description of a chemical process, an electrical circuit or some phenomenon in biology. So long as we know that $T$ is (or can be approximated by) a linear transformation, we can obtain a complete description by probing it with the a small number of inputs. This is in contrast to non-linear transformations which could correspond to arbitrarily complex input-output relationships and would require significantly more probing in order to characterize precisely.

Input and output spaces

We said that the transformation $T$ is a map from $n$-vectors to $m$-vectors: \[ T \colon \mathbb{R}^n \to \mathbb{R}^m. \] Mathematically, we say that the domain of the transformation $T$ is $\mathbb{R}^n$ and the codomain is $\mathbb{R}^m$. The image space $\textrm{Im}(T)$ consists of all the possible outputs that the transformation $T$ can have. In general $\textrm{Im}(T) \subseteq \mathbb{R}^m$. A transformation $T$ for which $\textrm{Im}(T)=\mathbb{R}^m$ is called onto or surjective.

Furthermore, we will identify the null space as the subspace of the domain $\mathbb{R}^n$ that gets mapped to the zero vector by $T$: $\textrm{Null}(T) \equiv \{\vec{v} \in \mathbb{R}^n \ | \ T(\vec{v}) = \vec{0} \}$.

Linear transformations as matrix multiplications

There is an important relationship between linear transformations and matrices. If you fix a basis for the input vector space and a basis for the output vector space, a linear transformation $T(\vec{v})=\vec{w}$ can be represented as matrix multiplication $M_T\vec{v}=\vec{w}$ for some matrix $M_T$.

We have the following equivalence: \[ \vec{w} = T(\vec{v}) \qquad \Leftrightarrow \qquad \vec{w} = M_T \vec{v}. \] Using this equivalence, we can re-interpret several of the fact we know about matrices as properties of linear transformations. The equivalence is useful in the other direction too since it allows us to use the language of linear transformations to talk about the properties of matrices.

The idea of representing the action of a linear transformation as a matrix product is extremely important since it allows us to transform the abstract description of what the transformation $T$ does into the practical description: “take the input vector $\vec{v}$ and multiply it on the left by a matrix $M_T$.”

We'll now illustrate the “linear transformation $\Leftrightarrow$ matrix” equivalence with an example. Define $T=\Pi_{P_{xy}}$ to be the orthogonal projection onto the $xy$-plane $P_{xy}$. In words, action of this projection is simply to “kill” the $z$-component of the input vector. The matrix that corresponds to this projection is \[ T(\:(v_x,v_y,v_z)\:) = (v_x,v_y,0) \qquad \Leftrightarrow \qquad M_{T}\vec{v} = \begin{bmatrix} 1 & 0 & 0 \nl 0 & 1 & 0 \nl 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} v_x \nl v_y \nl v_z \end{bmatrix} = \begin{bmatrix} v_x \nl v_y \nl 0 \end{bmatrix}. \]

Finding the matrix

In order to find the matrix representation of a the transformation $T \colon \mathbb{R}^n \to \mathbb{R}^m$, it is sufficient to “probe it” with the $n$ vectors in the standard basis for $\mathbb{R}^n$: \[ \hat{e}_1 \equiv \begin{bmatrix} 1 \nl 0 \nl \vdots \nl 0 \end{bmatrix} \!\!, \ \ \ \hat{e}_2 \equiv \begin{bmatrix} 0 \nl 1 \nl \vdots \nl 0 \end{bmatrix}\!\!, \ \ \ \ \ldots, \ \ \ \hat{e}_n \equiv \begin{bmatrix} 0 \nl \vdots \nl 0 \nl 1 \end{bmatrix}\!\!. \] To obtain $M_T$, we combine the outputs $T(\hat{e}_1)$, $T(\hat{e}_2)$, $\ldots$, $T(\hat{e}_n)$ as the columns of a matrix: \[ M_T = \begin{bmatrix} | & | & \mathbf{ } & | \nl T(\vec{e}_1) & T(\vec{e}_2) & \dots & T(\vec{e}_n) \nl | & | & \mathbf{ } & | \end{bmatrix}. \]

Observe that the matrix constructed in this way has the right dimensions: when multiplied by an $n$-vector on the left it will produce an $m$-vector. We have $M_T \in \mathbb{R}^{m \times n}$, since the outputs of $T$ are $m$-vectors and since we used $n$ “probe” vectors.

In order to help you visualize this new “column thing”, we can analyze the matrix product $M_T \hat{e}_2$. The probe vector $\hat{e}_2\equiv (0,1,0,\ldots,0)^T$ will “select” only the second column from $M_T$ and thus we will obtain the correct output: $M_T \hat{e}_2 = T(\hat{e}_2)$. Similarly, applying $M_T$ to the other basis vectors selects each of the columns of $M_T$.

Any input vector can be written as a linear combination of the standard basis vectors $\vec{v} = v_1 \hat{e}_1 + v_2 \hat{e}_2 + \cdots + v_n\hat{e}_n$. Therefore, by linearity, we can compute the output $T(\vec{v})$: \[ \begin{align*} T(\vec{v}) &= v_1 T(\hat{e}_1) + v_2 T(\hat{e}_2) + \cdots + v_n T(\hat{e}_n) \nl & = v_1\!\begin{bmatrix} | \nl T(\hat{e}_1) \nl | \end{bmatrix} + v_2\!\begin{bmatrix} | \nl T(\hat{e}_2) \nl | \end{bmatrix} + \cdots + v_n\!\begin{bmatrix} | \nl T(\hat{e}_n) \nl | \end{bmatrix} \nl & = \begin{bmatrix} | & | & \mathbf{ } & | \nl T(\vec{e}_1) & T(\vec{e}_2) & \dots & T(\vec{e}_n) \nl | & | & \mathbf{ } & | \end{bmatrix} \begin{bmatrix} | \nl \vec{v} \nl | \end{bmatrix} \nl & = M_T \vec{v}. \end{align*} \]

Input and output spaces

Observe that the outputs of $T$ consist of all possible linear combinations of the columns of the matrix $M_T$. Thus, we can identify the image space of the transformation $\textrm{Im}(T) = \{ \vec{w} \in W \ | \ \vec{w}=T(\vec{v}), \textrm{ for some } \vec{v}\in V \}$ and the column space $\mathcal{C}(M_T)$ of the matrix $M_T$.

Perhaps not surprisingly, there is also an equivalence between the null space of the transformation $T$ and the null space of the matrix $M_T$: \[ \textrm{Null}(T) \equiv \{\vec{v}\in \mathbb{R}^n | T(\vec{v}) = \vec{0} \} = \mathcal{N}(M_T) \equiv \{\vec{v}\in \mathbb{R}^n | M_T\vec{v} = \vec{0} \}. \]

The null space $\mathcal{N}(M_T)$ of a matrix consists of all vectors that are orthogonal to the rows of the matrix $M_T$. The vectors in the null space of $M_T$ have a zero dot product with each of the rows of $M_T$. This orthogonality can also be phrased in the opposite direction. Any vector in the row space $\mathcal{R}(M_T)$ of the matrix is orthogonal to the null space $\mathcal{N}(M_T)$ of the matrix.

These observation allows us identify the domain of the transformation $T$ as the orthogonal sum of the null space and the row space of the matrix $M_T$: \[ \mathbb{R}^n = \mathcal{N}(M_T) \oplus \mathcal{R}(M_T). \] This split implies the conservation of dimensions formula \[ {\rm dim}(\mathbb{R}^n) = n = {\rm dim}({\cal N}(M_T))+{\rm dim}({\cal R}(M_T)), \] which says that sum of the dimensions of the null space and the row space of a matrix $M_T$ must add up to the total dimensions of the input space.

We can summarize everything we know about the input-output relationship of the transformation $T$ as follows: \[ T \colon \mathcal{R}(M_T) \to \mathcal{C}(M_T), \qquad T \colon \mathcal{N}(M_T) \to \{ \vec{0} \}. \] Input vectors $\vec{v} \in \mathcal{R}(M_T)$ get mapped to output vectors $\vec{w} \in \mathcal{C}(M_T)$. Input vectors $\vec{v} \in \mathcal{N}(M_T)$ get mapped to the zero vector.

Composition

The consecutive application of two linear operations on an input vector $\vec{v}$ corresponds to the following matrix product: \[ S(T(\vec{v})) = M_S M_T \vec{v}. \] Note that the matrix $M_T$ “touches” the vector first, followed by the multiplication with $M_S$.

For such composition to be well defined, the dimension of the output space of $T$ must be the same as the dimension of the input space of $S$. In terms of the matrices, this corresponds to the condition that inner dimension in the matrix product $M_S M_T$ must be the same.

Choice of basis

In the above, we assumed that the standard bases were used both for the inputs and the outputs of the linear transformation. Thus, the coefficients in the matrix $M_T$ we obtained were with respect to the standard bases.

In particular, we assumed that the outputs of $T$ were given to us as column vectors in terms of the standard basis for $\mathbb{R}^m$. If the outputs were given to us in some other basis $B_W$, then the coefficients of the matrix $M_T$ would be in terms of $B_W$.

A non-standard basis $B_V$ could also be used for the input space $\mathbb{R}^n$, in which case to construct the matrix $M_T$ we would have to “probe” $T$ with each of the vectors $\vec{b}_i \in B_V$. Furthermore, in order to compute $T$ as “the matrix product with the matrix produced by $B_V$-probing,” we would have to express the input vectors $\vec{v}$ in terms of its coefficients with respect to $B_V$.

Because of this freedom regarding the choice of which basis to use, it would be wrong to say that a linear transformation is a matrix. Indeed, the same linear transformation $T$ would correspond to different matrices if different bases are used. We say that the linear transformation $T$ corresponds to a matrix $M$ for a given choice of input and output bases. We write $_{B_W}[M_T]_{B_V}$, in order to show the explicit dependence of the coefficients in the matrix $M_T$ on the choice of bases. With the exception of problems which involve the “change of basis,” you can always assume that the standard bases are used.

Invertible transformations

We will now revisit the properties of invertible matrices and connect it with the notion of an invertible transformation. We can think of the multiplication by a matrix $M$ as “doing” something to vectors, and thus the matrix $M^{-1}$ must be doing the opposite thing to put the vector back in its place again: \[ M^{-1} M \vec{v} = \vec{v}. \]

For simple $M$'s you can “see” what $M$ does. For example, the matrix \[ M = \begin{bmatrix}2 & 0 \nl 0 & 1 \end{bmatrix}, \] corresponds to a stretching of space by a factor of 2 in the $x$-direction, while the $y$-direction remains untouched. The inverse transformation corresponds to a shrinkage by a factor of 2 in the $x$-direction: \[ M^{-1} = \begin{bmatrix}\frac{1}{2} & 0 \nl 0 & 1 \end{bmatrix}. \] In general it is hard to see what the matrix $M$ does exactly since it is some arbitrary linear combination of the coefficients of the input vector.

The key thing to remember is that if $M$ is invertible, it is because when you get the output $\vec{w}$ from $\vec{w} = M\vec{v}$, the knowledge of $\vec{w}$ allows you to get back to the original $\vec{v}$ you started from, since $M^{-1}\vec{w} = \vec{v}$.

By the correspondence $\vec{w} = T(\vec{v}) \Leftrightarrow \vec{w} = M_T\vec{v}$, we can identify the class of invertible linear transformation $T$ for which there exists a $T^{-1}$ such that $T^{-1}(T(\vec{v}))=\vec{v}$. This gives us another interpretation for some of the equivalence statements in the invertible matrix theorem:

  1. $T\colon \mathbb{R}^n \to \mathbb{R}^n$ is invertible.

$\quad \Leftrightarrow \quad$

  $M_T \in \mathbb{R}^{n \times n}$ is invertible.
- $T$ is //injective// (one-to-one function). 
  $\quad \Leftrightarrow \quad$
  $M_T\vec{v}_1 \neq M_T\vec{v}_2$ for all $\vec{v}_1 \neq \vec{v}_2$.
- The linear transformation $T$ is //surjective// (onto).
  $\quad \Leftrightarrow \quad$
  $\mathcal{C}(M_T) = \mathbb{R}^n$.
- The linear transformation $T$ is //bijective// (one-to-one correspondence). 
  $\quad \Leftrightarrow \quad$
  For each $\vec{w} \in \mathbb{R}^n$, there exists a unique $\vec{v} \in \mathbb{R}^n$,
  such that $M_T\vec{v} = \vec{w}$.
- The null space of $T$ is zero-dimensional $\textrm{Null}(T) =\{ \vec{0} \}$ 
  $\quad \Leftrightarrow \quad$
  $\mathcal{N}(M_T) = \{ \vec{0} \}$.

When $M$ is not invertible, it means that it must send some vectors to the zero vector: $M\vec{v} = 0$. When this happens there is no way to get back the $\vec{v}$ you started from, i.e., there is no matrix $M^{-1}$ such that $M^{-1} \vec{0} = \vec{v}$, since $B \vec{0} = \vec{0}$ for all matrices $B$.

TODO: explain better the above par, and the par before the list…

Affine transformations

An affine transformation is a function $A:\mathbb{R}^n \to \mathbb{R}^m$ which is the combination of a linear transformation $T$ followed by a translation by a fixed vector $\vec{b}$: \[ \vec{y} = A(\vec{x}) = T(\vec{x}) + \vec{b}. \] By the $T \Leftrightarrow M_T$ equivalence we can write the formula for an affine transformation as \[ \vec{y} = A(\vec{x}) = M_T\vec{x} + \vec{b}, \] where the linear transformation is performed as a matrix product $M_T\vec{x}$ and then we add a vector $\vec{b}$. This is the vector generalization of the affine function equation $y=f(x)=mx+b$.

Discussion

The most general linear transformation

In this section we learned that a linear transformation can be represented as matrix multiplication. Are there other ways to represent linear transformations? To study this question, let's analyze from first principles the most general form that linear transformation $T\colon \mathbb{R}^n \to\mathbb{R}^m$ can take. We will use $V=\mathbb{R}^3$ and $W=\mathbb{R}^2$ to keep things simple.

Let us first consider the first coefficients $w_1$ of the output vector $\vec{w} = T(\vec{v})$, when the input vector is $\vec{v}$. The fact that $T$ is linear, means that $w_1$ can be an arbitrary mixture of the input vector coefficients $v_1,v_2,v_3$: \[ w_1 = \alpha_1 v_1 + \alpha_2 v_2 + \alpha_3 v_3. \] Similarly, the second component must be some other arbitrary linear combination of the input coefficients $w_2 = \beta_1 v_1 + \beta_2 v_2 + \beta_3 v_3$. Thus, we have that the most general linear transformation $T \colon V \to W$ can be written as: \[ \begin{align*} w_1 &= \alpha_1 v_1 + \alpha_2 v_2 + \alpha_3 v_3, \nl w_2 &= \beta_1 v_1 + \beta_2 v_2 + \beta_3 v_3. \end{align*} \]

This is precisely the kind of expression that can be expressed as a matrix product: \[ T(\vec{v}) = \begin{bmatrix} w_1 \nl w_2 \nl \end{bmatrix} = \begin{bmatrix} \alpha_1 & \alpha_2 & \alpha_3 \nl \beta_1 & \beta_2 & \beta_3 \end{bmatrix} \begin{bmatrix} v_1 \nl v_2 \nl v_3 \nl \end{bmatrix} = M_T \vec{v}. \]

In fact, the reason why the matrix product is defined the way it is because it allows us to express linear transformations so easily.

Links

[ Nice visual examples of 2D linear transformations ]
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

NOINDENT [ More on null space and range space and dimension counting ]
http://en.wikibooks.org/wiki/Linear_Algebra/Rangespace_and_Nullspace

NOINDENT [ Rotations as three shear operations ]
http://datagenetics.com/blog/august32013/index.html

Lines and planes

We will now learn about points, lines and planes in $\mathbb{R}^3$. The purpose of this section is to help you understand the geometrical objets both in terms of the equations that describe them as well as to visualize what they look like.

Concepts

  • $p=(p_x,p_y,p_z)$: a point in $\mathbb{R}^3$.
  • $\vec{v}=(v_x,v_y,v_z)$: a vector in $\mathbb{R}^3$.
  • $\hat{v}=\frac{ \vec{v} }{ |\vec{v}| }$: a unit vector in the direction of $\vec{v}$.
  • $\ell: \{ p_o+t\:\vec{v}, t \in \mathbb{R} \}$:

the equation of a line with direction vector $\vec{v}$

  passing through the point $p_o$.
* $ \ell: \left\{ \frac{x - p_{0x}}{v_x} = \frac{y - p_{0y}}{v_y} = \frac{z - p_{0z}}{v_z} \right\}$:
  the symmetric equation of the line $\ell$.
* $P: \{ (x,y,z) \in \mathbb{R}^3 \ | \ (x,y,z)=p_o+s\:\vec{v} + t\:\vec{w}, \ s,t \in \mathbb{R} \}$:
  the //parametric// equation of a plane $P$.
* $P: \left\{ (x,y,z) \in \mathbb{R}^3 \ | \ \vec{n} \cdot [ (x,y,z) - p_o ] = 0 \right\}$. 
  the //geometric// equation of a plane
  which contains $p_o$ and has normal vector $\hat{n}$.
* $P: \left\{ Ax+By+Cz=D \right\}$: the //general// equation of a plane.
* $d(a,b)$: the shortest //distance// between two objects $a$ and $b$.

Points

We can specify a point in $\mathbb{R}^3$ by its coordinates $p=(p_x,p_y,p_z)$, which is similar to how we specify vectors. In fact the two notions are equivalent: we can either talk about the destination point $p$ or the vector $\vec{p}$ that takes us from the origin to the point $p$. By this equivalence, it makes sense to add vectors and points.

We can also specify a point as the intersection of two lines. For example in $\mathbb{R}^2$ we can describe $p$ as the intersection of the lines $x + 2y = 5$ and $3x + 9y = 21$. To find the point, $p$ we would have to solve these equations in parallel. In other words, we are looking for a point which lies on both lines. The answer is the point $p=(1,2)$.

In three dimensions, a point can also be specified as the intersection of three planes. Indeed, this is precisely what is going on when we are solving equations of the form $A\vec{x}=\vec{b}$ with $A \in \mathbb{R}^{3 \times 3}$ and $\vec{b} \in \mathbb{R}^{3}$. We are looking for some $\vec{x}$ that is lies in all three planes.

Lines

A line $\ell$ is a one-dimensional space that is infinitely long. There are a number of ways to specify the equation of a line.

The parametric equation of a line is obtained as follows. Given a direction vector $\vec{v}$ and some point $p_o$ on the line, we can define the line as: \[ \ell: \ \{ (x,y,z) \in \mathbb{R}^3 \ | \ (x,y,z)=p_o+t\:\vec{v}, t \in \mathbb{R} \}. \] We say the line is parametrized by the variable $t$. The line consists of all the points $(x,y,z)$ which can be obtained starting from the point $p_o$ and adding any multiple of the direction vector $\vec{v}$.

The symmetric equation is an equivalent way for describing a line that does not require an explicit parametrization. Consider the equation that corresponds to each of the coordinates in the equation of the line: \[ x = p_{0x} + t\:v_x, \quad y = p_{0y} + t\:v_y, \quad z = p_{0z} + t\:v_z. \] When we solve for $t$ in each of these equations and equate the results, we obtain the symmetric equation for a line: \[ \ell: \ \left\{ \ \frac{x - p_{0x}}{v_x} = \frac{y - p_{0y}}{v_y} = \frac{z - p_{0z}}{v_z} \right\}, \] in which the parameter $t$ does not appear at all. The symmetric equation specifies the line as the relationship between the $x$,$y$ and $z$ coordinates that holds for all the points on the line.

You are probably most familiar with this type of equation in the special case $\mathbb{R}^2$ when there is no $z$ variable. For non-vertical lines, we can think of $y$ as being a function of $x$ and write the line the equivalent form: \[ \frac{x - p_{0x}}{v_x} = \frac{y - p_{0y}}{v_y}, \qquad \Leftrightarrow \qquad y(x) = mx + b, \] where $m=\frac{v_y}{v_x}$ and $b=p_{oy}-\frac{v_y}{v_x}p_{ox}$, assuming $v_x \neq 0$. This makes sense intuitively, since we always thought of the slope $m$ as the “rise over run”, i.e., how much of the line goes in the $y$ direction divided by how much the line goes in the $x$ direction.

Another way to describe a line is to specify two points that are part of the line. The equation of a line that contains the points $p$ and $q$ can be obtained as follows: \[ \ell: \ \{ \vec{x}=p+t \: (p-q), \ t \in \mathbb{R} \}, \] where $(p-q)$ plays the role of the direction vector $\vec{v}$ of the line. We said any vector could be used in the definition so long as it is in the same direction as the line: $\vec{v}=p-q$ certainly can play that role since $p$ and $q$ are two points on the line.

In three dimensions, the intersection of two planes forms a line. The equation of the line corresponds to the solutions of the equation $A\vec{x}=\vec{b}$ with $A \in \mathbb{R}^{2 \times 3}$ and $\vec{b} \in \mathbb{R}^{2}$.

Planes

A plane $P$ in $\mathbb{R}^3$ is a two-dimensional space with infinite extent. The orientation of the plane is specified by a normal vector $\vec{n}$, which is perpendicular to the plane.

A plane is specified as the subspace that contains all the vectors that are orthogonal to the plane's normal vector $\vec{n}$ and contain the point $p_o$. The formula in compact notation is \[ P: \ \ \vec{n} \cdot [ (x,y,z) - p_o ] = 0. \] Recall that the dot product of two vectors is zero if and only if these vectors are orthogonal. In the above equation, the expression $[(x,y,z) - p_o]$ forms an arbitrary vector with one endpoint at $p_o$. From all these vectors we select only those that are perpendicular to the $\vec{n}$, and thus we obtain all the points of the plane.

If we expand the above formula, we obtain the general equation of the plane: \[ P: \ \ Ax + By + Cz = D, \] where $A = n_x, B=n_y, C=n_z$ and $D = \vec{n} \cdot p_o = n_xp_{0x} + n_yp_{0y} + n_yp_{oz}$.

We can also give a parametric description of a plane $P$, provided we have some point $p_o$ in the plane and two linearly independent vectors $\vec{v}$ and $\vec{w}$ which lie inside the plane: \[ P: \ \{ (x,y,z) \in \mathbb{R}^3 \ | \ (x,y,z)=p_o+s\:\vec{v} + t\:\vec{w}, \ s,t \in \mathbb{R} \}. \] Note that since a plane is two-dimensional, we need two parameters $s$ and $t$ to describe it.

Suppose we're given three points $p$, $q$, and $r$ that lie in the plane. Can you find the equation for this plane in the form: $\vec{n} \cdot [ (x,y,z) - p_o ] = 0$? We can use the point $p$ as the point $p_o$, but how do we find the normal vector $\vec{n}$ for that plane. The trick is to use the cross product. First we build two vectors that lie in the plane $\vec{v} = q-p$ and $\vec{w} = r-p$ and then to find a vector that is perpendicular to them we compute: \[ \vec{n} = \vec{v} \times \vec{w} = (q - p) \times ( r - p ). \] We can then write down the equation of a plane $\vec{n} \cdot [ (x,y,z) - p ] = 0$ as usual. The key property we used was the fact that the cross product of two vectors results in a vector that is perpendicular to both vectors. The cross product is the perfect tool for finding the normal vector.

Distances

The distance between 2 points $p$ and $q$ is equal to the length of the vector that goes from $p$ to $q$: \[ d(p,q)=\| q - p \| = \sqrt{ (q_x-p_x)^2 + (q_y-p_y)^2 + (q_z-p_z)^2}. \]

The distance between the line $\ell: \{ (x,y,z) \in \mathbb{R}^3 \ | \ (x,y,z)=p_o+t\:\vec{v}, t \in \mathbb{R} \}$ and the origin $O=(0,0,0)$ is given by the formula: \[ d(\ell,O) = \left\| p_o - \frac{ p_o \cdot \vec{v} }{ \| \vec{v} \|^2 } \vec{v} \right\|. \]

The interpretation of this formula is as follows. The first step is to identify a vector that starts at the origin and goes to any point $p_o$. The projection of $p_o$ onto line $\ell$ is given by the formula $\frac{ p_o \cdot \vec{v} }{ \| \vec{v} \|^2 } \vec{v}$. This is the part of the vector $p_o$ which is entirely in the direction of $\vec{v}$. The distance $d(\ell,O)$ is equal to the orthogonal complement of this vector.

The distance between a plane $P: \ \vec{n} \cdot [ (x,y,z) - p_o ] = 0$ and the origin $O$ is given by: \[ d(P,O)= \frac{| \vec{n}\cdot p_o |}{ \| \vec{n} \| }. \]

The above distance formulas are somewhat complicated expressions which involve computing dot products and taking the length of vectors a lot. In order to understand what is going on, we need to learn a bit about projective geometry which will help us measure distances between arbitrary points, lines and planes. As you can see from the formulas above, there will be no new math: just vector $+$, $-$, $\|.\|$ and dot products. The new stuff is actually all in picture-proofs (formally called vector diagrams). Projections play a key role in all of this and this is why we will learn about them in great detail in the next section.

Exercises

Find the plane which contains the line of intersection of the two planes $x+2y+z=1$ and $2x-y-z=2$ and is parallel to the line $x=1+2t$, $y=-2+t$, $z=-1-t$.

NOINDENT Sol: Find direction vector for the line of intersection $\vec{v}_1 = ( 1, 2,1 ) \times ( 2, -1, -1)$. We know that the plane is parallel to $\vec{v}_2=(2,1,-1)$. So the plane must be the $\textrm{span}\{\vec{v}_1, \vec{v}_2 \} + p_o$. To find a normal vector for the plane we do $\vec{n} = \vec{v}_1 \times \vec{v}_2$. Then choose a point that is on both of the above planes. Conveniently the point $(1,0,0)$ is in both of the above planes. So the anser is $\vec{n}\cdot[ (x,y,z) - (1,0,0) ]=0$.

Matrix equations

If $a,b$ and $c$ were three numbers, and I told you to solve for $a$ in the equation \[ ab = c, \] then you would know to tell me that the answer is $a = c/b = c\frac{1}{b}=\frac{1}{b}c$, and that would be the end of it.

Now suppose that $A$, $B$ and $C$ are matrices and you want to solve for $A$ in the matrix equation \[ AB = C. \]

The naive answer $A=C/B$ is not allowed. So far, we have defined a matrix product and matrix inverse, but not matrix division. Instead of division, we must do a multiplication by $B^{-1}$, which plays the role of the “divide by $B$” operation since the product of $B$ and $B^{-1}$ gives the identity matrix: \[ BB^{-1} = I, \qquad B^{-1}B = I. \] When applying the inverse matrix $B^{-1}$ to the equation, we must specify whether we are multiplying from the left or from the right because the matrix product is not commutative. What do you think is the right answer for $A$ in the above equations? Is it this one $A = CB^{-1}$ or this one $A = B^{-1}C$?

Matrix equations

To solve a matrix equation we will employ the same technique as we used to solve equations in the first chapter of the book. Recall that doing the same thing to both sides of any equation gives us a new equation that is equally valid as the first. There are only two new things you need to keep in mind for matrix equations:

  • The order in which the matrices are multiplied matters

because the matrix product is not a commutative operation $AB \neq BA$.

  This means that the two expressions $ABC$ and $BAC$ are different,
  despite the fact that they are the product of the same matrices.
* When performing operations on matrix equations you can act 
  either from the //left// or from the //right// side of the equation.

The best way to get you used to the peculiarities of matrix equations is to look at some examples together. Don't worry there will be nothing too mathematically demanding. We will just explain what is going on with pictures.

In the following examples, the unknown (matrix) we are trying to solve is shaded in. Your task is to solve this equation for the unknown by isolating it on one side of the equation. Let us see what is going on.

Matrix times a matrix

Let us continue with the equation we were trying to solve in the introduction: $AB=C$. In order to solve for $A$ in

,

we will can multiply by $B^{-1}$ from the right on both sides of the equation:

.

This is good stuff because $B$ and $B^{-1}$ will cancel out ($BB^{-1}=I$) and give us the answer:

.

Matrix times a matrix variation

Okay, but what if we were trying to solve for $B$ in $AB=C$. How would we proceed then?

The answer is, again, to do the same to both sides of the equation. If we want to cancel $A$, then we have to multiply by $A^{-1}$ from the left:

.

The result will be:

.

Matrix times a vector

We start with the equation \[ A\vec{x}=\vec{b}, \] which shows some $n\times n$ matrix $A$, and the vectors $\vec{x}$ and $\vec{b}$, which are nothing more than tall and skinny matrices of dimensions $n \times 1$.

Assuming that $A$ is invertible, there is nothing special to do here and we proceed by multiplying by the inverse $A^{-1}$ on the left of both sides of the equation. We get:

By definition, $A^{-1}$ times its inverse $A$ is equal to the identity $I$, which is a diagonal matrix with ones on the diagonal and zeros everywhere else:

The product of anything with the identity is the thing itself:

,

which is our final answer.

Note however that the question “Solve for $\vec{x}$ in $A\vec{x} = \vec{b}$” can sometimes be asked in situations where the matrix $A$ is not invertible. If the system of equations is under-specified (A is wider than it is tall), then there will be a whole subspace of acceptable solutions $\vec{x}$. If the system is over-specified (A is taller than it is wide) then we might be interested in finding the best fit vector $\vec{x}$ such that $A\vec{x} \approx \vec{b}$. Such approximate solutions are of great practical importance in much of science.

\[ \ \]

This completes our lightning tour of matrix equations. There is nothing really new to learn here, I just had to make you aware of the fact that the order in which you apply do matrix operations matters and remind you the general principle of “doing the same thing to both sides of the equation”. Acting according to this principle is really important when manipulating matrices.

In the next section we look at matrix equations in more details as we analyze the properties of matrix multiplication. We will also discuss several algorithms for computing the matrix inverse.

Exercises

Solve for X

Solve for the matrix $X$ the following equations: (1) $XA = B$, (2) $ABCXD = E$, (3) $AC = XDC$. Assume the matrices $A,B,C$ and $D$ are all invertible.

Ans: (1) $X = BA^{-1}$, (2) $X = C^{-1}B^{-1}A^{-1}E D^{-1}$, (3) $X=AD^{-1}$.

Special types of matrices

Mathematicians like to categorize things. There are some types of matrices to which mathematicians give specific names so that they can refer to them quickly without having to explain what they do in words:

 I have this matrix A whose rows are perpendicular vectors and 
 then when you multiply any vector by this matrix it doesn't change 
 the length of the vector but just kind of rotates it and stuff...

It is much simpler just to say:

 Let A be an orthogonal matrix.

Most advanced science textbooks and research papers will use terminology like “diagonal matrix”, “symmetric matrix”, and “orthogonal matrix”, so I want you to become familiar with these concepts.

This section also serves to review and reinforce what we learned about linear transformations. Recall that we can think of the matrix-vector product $A\vec{x}$ as applying a linear transformations $T_A$ to the input vector $\vec{x}$. Therefore, each of the special matrices which we will discuss here also corresponds to a special type of linear transformation. Keep this dual-picture in mind because the same terminology can be used to describe matrices and linear transformations.

Notation

  • $\mathbb{R}^{m \times n}$: the set of $m \times n$ matrices
  • $A,B,O,P,\ldots$: typical variable names for matrices
  • $a_{ij}$: the entry in the $i$th row and $j$th column of the matrix $A$
  • $A^T$: the transpose of the matrix $A$
  • $A^{-1}$: the inverse of the matrix $A$. The inverse obeys $AA^{-1}=A^{-1}A=I$.
  • $\lambda_1, \lambda_2, \ldots$: the eigenvalues of the matrix $A$.

For each eigenvalue $\lambda_i$ there is at least one associated eigenvector $\vec{e}_{\lambda_i}$ such that the following equation holds:

  \[
    A\vec{e}_{\lambda_i} = \lambda_i \vec{e}_{\lambda_i}.
  \]
  Multiplying the matrix $A$ by its eigenvectors $\vec{e}_{\lambda_i}$ 
  is the same scaling $\vec{e}_{\lambda_i}$ by the number $\lambda_i$.

Diagonal matrices

These are matrices that only have entries on the diagonal and are zero everywhere else. For example: \[ \left(\begin{array}{ccc} a_{11} & 0 & 0 \nl 0 & a_{22}& 0 \nl 0 & 0 & a_{33} \end{array}\right). \] More generally we say that a diagonal matrix $A$ satisfies, \[ a_{ij}=0, \quad \text{if } i\neq j. \]

The eigenvalues of a diagonal matrix are $\lambda_i = a_{ii}$.

Symmetric matrices

A matrix $A$ is symmetric if and only if \[ A^T = A, \qquad a_{ij} = a_{ji}, \quad \text{ for all } i,j. \] All eigenvalues of a symmetric transformation are real numbers, and the its eigenvectors can be chosen to be mutually orthogonal. Given any matrix $B\in\mathbb{M}(m,n)$, the product of $B$ with its transpose $B^TB$ is always a symmetric matrix.

Upper triangular matrices

Upper triangular matrices have zero entries below the main diagonal: \[ \left(\begin{array}{ccc} a_{11} & a_{12}& a_{13} \nl 0 & a_{22}& a_{23} \nl 0 & 0 & a_{33} \end{array}\right), \qquad a_{ij}=0, \quad \text{if } i > j. \]

A lower triangular matrix is one for which all the entries above the diagonal are zeros: $a_{ij}=0, \quad \text{if } i < j$.

Identity matrix

The identity matrix is denoted as $I$ or $I_n \in \mathbb{M}(n,n)$ and plays the role of the number $1$ for matrices: $IA=AI=A$. The identity matrix is diagonal with ones on the diagonal: \[ I_3 = \left(\begin{array}{ccc} 1 & 0 & 0 \nl 0 & 1 & 0 \nl 0 & 0 & 1 \end{array}\right). \]

Any vector $\vec{v} \in \mathbb{R}^3$ is an eigenvector of the identity matrix with eigenvalue $\lambda = 1$.

Orthogonal matrices

A matrix $O \in \mathbb{M}(n,n)$ is orthogonal if it satisfies $OO^T=I=O^TO$. The inverse of an orthogonal matrix $O$ is obtained by taking its transpose: $O^{-1} = O^T$.

The best way to think of orthogonal matrices is to think of them as linear transformations $T_O(\vec{v})=\vec{w}$ which preserve the length of vectors. The length of a vector before applying the linear transformation is given by: $\|\vec{v}\|=\sqrt{ \vec{v} \cdot \vec{v} }$. The length of a vector after the transformation is \[ \|\vec{w}\| =\sqrt{ \vec{w} \cdot \vec{w} } =\sqrt{ T_O(\vec{v}) \cdot T_O(\vec{v}) } = (O\vec{v})^T(O\vec{v}) = \vec{v}^TO^TO\vec{v}. \] When $O$ is an orthogonal matrix, we can substitute $O^TO=I$ in the above expression to establish $\|\vec{w}\|=\vec{v}^TI\vec{v}=\|\vec{v}\|$, which shows that orthogonal transformations are length preserving.

The eigenvalues of an orthogonal matrix have unit length, but can in general be complex numbers $\lambda_i=\exp(i\theta) \in \mathbb{C}$. The determinant of an orthogonal matrix is either one or minus one $|O|\in\{-1,1\}$.

A good way to think about orthogonal matrices is to imagine that their columns form an orthonormal basis for $\mathbb{R}^n$: \[ \{ \hat{e}_1,\hat{e}_2,\ldots, \hat{e}_n \}, \quad \hat{e}_{i} \cdot \hat{e}_{j} = \left\{ \begin{array}{ll} 1 & \text{ if } i =j, \nl 0 & \text{ if } i \neq j. \end{array}\right. \] The resulting matrix \[ O= \begin{bmatrix} | & & | \nl \hat{e}_{1} & \cdots & \hat{e}_{n} \nl | & & | \end{bmatrix} \] is going to be an orthogonal matrix. You can verify this by showing that $O^TO=I$. We can interpret the matrix $O$ as a change of basis from the stander basis to the $\{ \hat{e}_1,\hat{e}_2,\ldots, \hat{e}_n \}$ basis.

The set of orthogonal matrices contains as special cases the following important classes of matrices: rotation matrices, refection matrices, and permutation matrices. We'll now discuss each of these in turn.

Rotation matrices

A rotation matrix takes the standard basis $\{ \hat{\imath}, \hat{\jmath}, \hat{k} \}$ to a rotated basis $\{ \hat{e}_1,\hat{e}_2,\hat{e}_3 \}$.

Consider first an example in $\mathbb{R}^2$. The counterclockwise rotation by the angle $\theta$ is given by the matrix \[ R_\theta = \begin{bmatrix} \cos\theta &-\sin\theta \nl \sin\theta &\cos\theta \end{bmatrix}. \] The matrix $R_\theta$ takes $\hat{\imath}=(1,0)$ to $(\cos\theta,\sin\theta)$ and $\hat{\jmath}=(0,1)$ to $(-\sin\theta,\cos\theta)$.

As a second example, consider the rotation by the angle $\theta$ around the $x$-axis in $\mathbb{R}^3$: \[ \begin{bmatrix} 1&0&0\nl 0&\cos\theta&-\sin\theta\nl 0&\sin\theta&\cos\theta \end{bmatrix}. \] Note this is a rotation entirely in the $yz$-plane: the $x$-component of a vector multiplying this matrix would remain unchanged.

The determinant of a rotation matrix is equal to one. The eigenvalues of rotation matrices are complex numbers with magnitude one.

Reflections

If the determinant of an orthogonal matrix $O$ is equal to negative one, then we say that it is mirrored orthogonal. For example, the reflection through the line with direction vector $(\cos\theta, \sin\theta)$ is given by: \[ R= \begin{bmatrix} \cos(2\theta) &\sin(2\theta)\nl \sin(2\theta) &-\cos(2\theta) \end{bmatrix}. \]

A reflection matrix will always have at least one eigenvalue equal to minus one, which corresponds to the direction perpendicular to the axis of reflection.

Permutation matrices

Another important class of orthogonal matrices is the class permutation matrices. The action of a permutation matrix is simply to change the order of the coefficients of a vector. For example, the permutation $\hat{e}_1 \to \hat{e}_1$, $\hat{e}_2 \to \hat{e}_3$, $\hat{e}_3 \to \hat{e}_2$ can be represented as the following matrix: \[ M_\pi = \begin{bmatrix} 1 & 0 & 0 \nl 0 & 0 & 1 \nl 0 & 1 & 0 \end{bmatrix}. \] An $n \times n$ permutation contains $n$ ones in $n$ different columns and zeros everywhere else.

The sign of a permutation corresponds to the determinant $\det(M_\pi)$. We say that a permutation $\pi$ is even if $\det(M_\pi) = +1$ and odd if $\det(M_\pi) = -1$.

Positive matrices

A matrix $P \in \mathbb{M}(n,n)$ is positive semidefinite if \[ \vec{v}^T P \vec{v} \geq 0, \] for all $\vec{v} \in \mathbb{R}^n$. The eigenvalues of a positive semidefinite matrix are all non-negative $\lambda_i \geq 0$.

If we have $\vec{v}^T P \vec{v} > 0$, for all $\vec{v} \in \mathbb{R}^n$, we say that the matrix is positive definite. These matrices have eigenvalues strictly greater than zero.

Projection matrices

The defining property of a projection matrix is that it can be applied multiple times without changing the result: \[ \Pi = \Pi^2= \Pi^3= \Pi^4= \Pi^5 = \cdots. \]

A projection has two eigenvalues: one and zero. The space $S$ which is left invariant by the projection $\Pi_S$ corresponds to the eigenvalue $\lambda=1$. The space $S^\perp$ of vectors that get completely annihilated by $\Pi_S$ corresponds to the eigenvalue $\lambda=0$, which is also the null space of $\Pi_S$.

Normal matrices

The matrix $A = \mathbb{M}(n,n)$ is normal if $A^TA=AA^T$. If $A$ is normal we have the following properties:

  1. The matrix $A$ has a full set of linearly independent eigenvectors.

Eigenvectors corresponding to distinct eigenvalues are orthogonal

  and eigenvectors from the same eigenspace can be chosen to be mutually orthogonal.
- For all vectors $\vec{v}$ and $\vec{w}$ and a normal transformation $A$ we have: 
  \[
   (A\vec{v}) \cdot (A\vec{w}) 
    = (A^TA\vec{v})\cdot \vec{w}
    =(AA^T\vec{v})\cdot \vec{w}.
   \]
- $\vec{v}$ is an eigenvector of $A$ if and only if $\vec{v}$ is an eigenvector of $A^T$.

Every normal matrix is diagonalizable by an orthogonal matrix $O$. The eigendecomposition of a normal matrix can be written as $A = O\Lambda O^T$, where $O$ is orthogonal and $\Lambda$ is a diagonal matrix. Note that orthogonal ($O^TO=I$) and symmetric ($A^T=A$) matrices are special types of normal matrices since $O^TO=I=OO^T$ and $A^TA=A^2=AA^T$.

Discussion

In this section we defined several types of matrices and stated their properties. You're now equipped with some very precise terminology for describing the different types of matrices.

TODO: add a mini concept map here More importantly, we discussed the relations. It might be a good idea to summarize these relationships as a concept map…

Vector spaces

We will now discuss no vector in particular, but rather the set of all possible vectors. In three dimensions this is the space $(\mathbb{R},\mathbb{R},\mathbb{R}) \equiv \mathbb{R}^3$. We will also discuss vector subspaces of $\mathbb{R}^3$ like lines and planes thought the origin.

In this section we develop the vocabulary needed to talk about vector spaces. Using this language will allow us to say some interesting things about matrices. We will formally define the fundamental subspaces for a matrix $A$: the column space $\mathcal{C}(A)$, the row space $\mathcal{R}(A)$, and the null space $\mathcal{N}(A)$.

Definitions

Vector space

A vector space $V \subseteq \mathbb{R}^n$ consists of a set of vectors and all possible linear combinations of these vectors. The notion of all possible linear combinations is very powerful. In particular it has the following two useful properties. We say that vector spaces are closed under addition, which means the sum of any two vectors taken from the vector space is a vector in the vector space. Mathematically, we write: \[ \vec{v}_1+\vec{v}_2 \in V, \qquad \forall \vec{v}_1, \vec{v}_2 \in V. \] A vector space is also closed under scalar multiplication: \[ \alpha \vec{v} \in V, \qquad \forall \alpha \in \mathbb{R},\ \vec{v} \in V. \]

Span

Given a vector $\vec{v}_1$, we can define the following vector space: \[ V_1 = \textrm{span}\{ \vec{v}_1 \} \equiv \{ \vec{v} \in V \ | \vec{v} = \alpha \vec{v}_1 \textrm{ for some } \alpha \in \mathbb{R} \}. \] We say $V_1$ is the space spanned by $\vec{v}_1$ which means that it is the set of all possible multiples of $\vec{v}_1$. The shape of $V_1$ is an infinite line.

Given two vectors $\vec{v}_1$ and $\vec{v}_2$ we can define a vector space: \[ V_{12} = \textrm{span}\{ \vec{v}_1, \vec{v}_2 \} \equiv \{ \vec{v} \in V \ | \vec{v} = \alpha \vec{v}_1 + \beta\vec{v}_2 \textrm{ for some } \alpha,\beta \in \mathbb{R} \}. \] The vector space $V_{12}$ contains all vectors that can be written as a linear combination of $\vec{v}_1$ and $\vec{v}_2$. This is a two-dimensional vector space which has the shape of an infinite plane.

Note that the same space $V_{12}$ can be obtained as the span of different vectors: $V_{12} = \textrm{span}\{ \vec{v}_1, \vec{v}_{2^\prime} \}$, where $\vec{v}_{2^\prime} = \vec{v}_2 + 30\vec{v}_1$. Indeed, $V_{12}$ can be written as the span of any two linearly independent vectors contained in $V_{12}$. This is precisely what is cool about vector spaces: you can talk about the space as a whole without necessarily having to talk about the vectors in it.

As a special case, consider the the situation when $\vec{v}_1 = \gamma\vec{v}_2$, for some $\gamma \in \mathbb{R}$. In this case, the vector space $V_{12} = \textrm{span}\{ \vec{v}_1, \vec{v}_2 \}=\textrm{span}\{ \vec{v}_1 \}$ is actually one-dimensional since $\vec{v}_2$ can be written as a multiple of $\vec{v}_1$.

Vector subspaces

A subset $W$ of the vector space $V$ is called a subspace if:

  1. It is closed under addition: $\vec{w}_1 + \vec{w}_2 \in W$, for all $\vec{w}_1,\vec{w}_2 \in W$.
  2. It is closed under scalar multiplication: $\alpha \vec{w} \in W$, for all $\vec{w} \in W$.

This means that if you take any linear combination of vectors in $W$, the result will also be a vector in $W$. We use the notation $W \subseteq V$ to indicate that $W$ is a subspace of $V$.

An important fact about subspaces is that they always contains the zero vector $\vec{0}$. This is implied by the second property, since any vector becomes the zero vector when multiplied by the scalar $\alpha=0$: $\alpha \vec{w} = \vec{0}$.

Constraints

One way to define a vector subspace $W$ is to start with a larger space $(x,y,z) \in V$ and describe the a set of constraints that must be satisfied by all points $(x,y,z)$ in the subspace $W$. For example, the $xy$-plane can be defined as the set points $(x,y,z) \in \mathbb{R}^3$ that satisfy \[ (0,0,1) \cdot (x,y,z) = 0. \] More formally, we define the $xy$-plane as follows: \[ P_{xy} = \{ (x,y,z) \in \mathbb{R}^3 \ | \ (0,0,1) \cdot (x,y,z) = 0 \}. \] The vector $\hat{k}\equiv(0,0,1)$ is perpendicular to all the vectors that lie in the $xy$-plane so another description for the $xy$-plane is “the set of all vectors perpendicular to the vector $\hat{k}$.” In this definition, the parent space is $V=\mathbb{R}^3$, and the subspace $P_{xy}$ is defined as the set of points that satisfy the constraint $(0,0,1) \cdot (x,y,z) = 0$.

Another way to represent the $xy$-plane would be to describe it as the span of two linearly independent vectors in the plane: \[ P_{xy} = \textrm{span}\{ (1,0,0), (1,1,0) \}, \] which is equivalent to saying: \[ P_{xy} = \{ \vec{v} \in \mathbb{R}^3 \ | \ \vec{v} = \alpha (1,0,0) + \beta(1,1,0), \forall \alpha,\beta \in \mathbb{R} \}. \] This last expression is called an explicit parametrization of the space $P_{xy}$ and $\alpha$ and $\beta$ are the two parameters. There corresponds a unique pair $(\alpha,\beta)$ for each point in the plane. The explicit parametrization of an $m$-dimensional vector space requires $m$ parameters.

Matrix subspaces

Consider the following subspaces which are associated with a matrix $M \in \mathbb{R}^{m\times n}$. These are sometiemes referred to as the fundamental subspaces of the matrix $M$.

  • The row space $\mathcal{R}(M)$ is the span of the rows of the matrix.

Note that computing a given linear combination of the rows of a matrix can be

  done by multiplying the matrix //on the left// with an $m$-vector:
  \[
    \mathcal{R}(M) \equiv \{ \vec{v} \in \mathbb{R}^n \ | \ \vec{v} = \vec{w}^T M \textrm{ for some } \vec{w} \in \mathbb{R}^{m} \},
  \]
  where we used the transpose $T$ to make $\vec{w}$ into a row vector.
* The null space $\mathcal{N}(M)$ of a matrix $M \in \mathbb{R}^{m\times n}$
  consists of all the vectors that the matrix $M$ sends to the zero vector:
  \[
    \mathcal{N}(M) \equiv \{ \vec{v} \in \mathbb{R}^n \ | \ M\vec{v} = \vec{0} \}.
  \]
  The null space is also known as the //kernel// of the matrix.
* The column space $\mathcal{C}(M)$ is the span of the columns of the matrix.
  The column space consist of all the possible output vectors that the matrix can produce
  when multiplied by a vector on the right:
  \[
    \mathcal{C}(M) \equiv \{ \vec{w} \in \mathbb{R}^m 
    \ | \ 
    \vec{w} = M\vec{v} \textrm{ for some } \vec{v} \in \mathbb{R}^{n} \}.
  \]
* The left null space $\mathcal{N}(M^T)$ which is the null space of the matrix $M^T$. 
  We say //left// null space, 
  because this is the null space of vectors when multiplying the matrix by a vector on the left:
  \[
    \mathcal{N}(M^T) \equiv \{ \vec{w} \in \mathbb{R}^m \ | \ \vec{w}^T M = \vec{0}^T \}.
  \]
  The notation $\mathcal{N}(M^T)$ is suggestive of the fact that we can 
  rewrite the condition $\vec{w}^T M = \vec{0}^T$ as $M^T\vec{w} = \vec{0}^T$.
  Hence the left null space of $A$ is equivalent to the null space of $A^T$.
  The left null space consists of all the vectors $\vec{w} \in \mathbb{R}^m$ 
  that are orthogonal to the columns of $A$.

The matrix-vector product $M \vec{x}$ can be thought of as the action of a vector function (a linear transformation $T_M:\mathbb{R}^n \to \mathbb{R}^m$) on an input vector $\vec{x}$. The columns space $\mathcal{C}(M)$ plays the role of the image of the linear transformation $T_M$, and the null space $\mathcal{N}(M)$ is the set of zeros (roots) of the function $T_M$. The row space $\mathcal{R}(M)$ is the pre-image of the column space $\mathcal{C}(M)$. To every point in $\mathcal{R}(M)$ (input vector) corresponds one point (output vector) in $\mathcal{C}(M)$. This means the column space and the rows space must have the same dimension. We call this dimension the rank of the matrix $M$: \[ \textrm{rank}(M) = \dim\left(\mathcal{R}(M) \right) = \dim\left(\mathcal{C}(M) \right). \] The rank is the number of linearly independent rows, which is also equal to the number of independent columns.

We can characterize the domain of $M$ (the space of $n$-vectors) as the orthogonal sum ($\oplus$) of the row space and the null space: \[ \mathbb{R}^n = \mathcal{R}(M) \oplus \mathcal{N}(M). \] Basically a vector either has non-zero product with at least one of the rows of $M$ or it has zero product with all of them. In the latter case, the output will be the zero vector – which means that the input vector was in the null space.

If we think of the dimensions involved in the above equation: \[ \dim(\mathbb{R}^n) = \dim(\mathcal{R}(M)) + \dim( \mathcal{N}(M)), \] we obtain an important fact: \[ n = \textrm{rank}(M) + \dim( \mathcal{N}(M)), \] where $\dim( \mathcal{N}(M))$ is called the nullity of $M$.

Linear independence

The set of vectors $\{\vec{v}_1, \vec{v}_2, \ldots, \vec{v}_n \}$ is linear independent if the only solution to the equation \[ \sum\limits_i\lambda_i\vec{v}_i= \lambda_1\vec{v}_1 + \lambda_2\vec{v}_2 + \cdots + \lambda_n\vec{v}_n = \vec{0} \] is $\lambda_i=0$ for all $i$.

The above condition guarantees that none of the vectors can be written as a linear combination of the other vectors. To understand the importance of the “all zeros” solutions, let's consider an example where a non-zero solution exists. Suppose we have a set of three vectors $\{\vec{v}_1, \vec{v}_2, \vec{v}_3 \}$ which satisfy $\lambda_1\vec{v}_1 + \lambda_2\vec{v}_2 + \lambda_3\vec{v}_3 = 0$ with $\lambda_1=-1$, $\lambda_2=1$, and $\lambda_3=2$. This means that \[ \vec{v}_1 = 1\vec{v}_2 + 2\vec{v}_3, \] which shows that $\vec{v}_1$ can be written as a linear combination of $\vec{v}_2$ and $\vec{v}_3$, hence the vectors are not linearly independent.

Basis

In order to carry out calculations with vectors in a vector space $V$, we need to know a basis $B=\{ \vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n \}$ for that space. A basis for an $n$-dimensional vector space $V$ is a set of $n$ linearly independent vectors in $V$. Intuitively, a basis is a set of vectors that can be used as a coordinate system for a vector space.

A basis $B=\{ \vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n \}$ for the vector space $V$ has the following two properties:

  • Spanning property.

Any vector $\vec{v} \in V$ can be expressed as a linear combination of the basis elements:

  \[
   \vec{v} = v_1\vec{e}_1 + v_2\vec{e}_2 + \cdots +  v_n\vec{e}_n.
  \]
  This property guarantees that the vectors in the basis $B$ are //sufficient// to represent any vector in $V$.
* **Linear independence property**. 
  The vectors that form the basis $B = \{ \vec{e}_1,\vec{e}_2, \ldots, \vec{e}_n \}$ are linearly independent.
  The linear independence of the vectors in the basis guarantees that none of the vectors $\vec{e}_i$ is redundant.

If a set of vectors $B=\{ \vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n \}$ satisfies both properties, we say $B$ is a basis for $V$. In other words $B$ can serve as a coordinate system for $V$. Using the basis $B$, we can represent any vector $\vec{v} \in V$ as a unique tuple of coordinates \[ \vec{v} = v_1\vec{e}_1 + v_2\vec{e}_2 + \cdots + v_n\vec{e}_n \qquad \Leftrightarrow \qquad (v_1,v_2, \ldots, v_n)_B. \] The coordinates of $\vec{v}$ are calculated with respect to the basis $B$.

The dimension of a vector space is defined as the number of vectors in a basis for that vector space. A basis for an $n$-dimensional vector space contains exactly $n$ vectors. Any set of less than $n$ vectors would not satisfy the spanning property. Any set of with more than $n$ vectors from $V$ cannot be linearly independent. To form a basis for a vector space, the set of vectors must be “just right”: it must contain a sufficient number of vectors but not too many so that the coefficients of each vector will be uniquely determined.

Distilling a basis

A basis for an $n$-dimensional vector space $V$ consist of exactly $n$ vectors. Any set of vectors $\{ \vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n \}$ can serve as a basis as long as they are linearly independent and there is exactly $n$ of them.

Sometimes an $n$-dimensional vector space $V$ will be specified as the span of more than $n$ vectors: \[ V = \textrm{span}\{ \vec{v}_1, \vec{v}_2, \ldots, \vec{v}_m \}, \quad m > n. \] Since there are $m>n$ of the $\vec{v}$-vectors, they are too many to form a basis. We say this set of vectors is over-complete. They cannot all be linearly independent since there can be at most $n$ linearly independent vectors in an $n$-dimensional vector space.

If we want to have a basis for the space $V$, we'll have to reject some of the vectors. Given the set of vectors $\{ \vec{v}_1, \vec{v}_2, \ldots, \vec{v}_m \}$, our task is to distill a set of $n$ linearly indecent vectors $\{ \vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n \}$ from them.

We can use the Gauss–Jordan elimination procedure to distil a set of linearly independent vectors. Actually, you know how to do this already! You can write the set of $m$ vectors as the rows of a matrix and then do row operations on this matrix until you find the reduced row echelon form. Since row operations do not change the row space of the matrix, there will be $n$ non-zero rows of the final RREF of the matrix which form a basis for $V$. We will learn more about this procedure in the next section.

Examples

Example 1

Describe the set of vectors which are perpendicular to the vector $(0,0,1)$ in $\mathbb{R}^3$.
Sol: We need to find all the vectors $(x,y,z)$ such that $(x,y,z)\cdot (0,0,1) = 0$. By inspection we see that whatever choice of $x$ and $y$ components we choose will work so we say that the set of vectors perpendicular to $(0,0,1)$ is $\textrm{span}\{ (1,0,0), (0,1,0) \}$.

Vectors

Vectors are mathematical objects that have multiple components. The vector $\vec{v}$ is equivalent to a pair of numbers \[ \vec{v} \equiv (v_x, v_y), \] where $v_x$ is the $x$ component of $\vec{v}$ and $v_y$ is the $y$ component.

Just like numbers, you can add vectors \[ \vec{v}+\vec{w} = (v_x, v_y) + (w_x, w_y) = (v_x+w_x, v_y+w_y), \] subtract them \[ \vec{v}-\vec{w} = (v_x, v_y) - (w_x, w_y) = (v_x-w_x, v_y-w_y), \] and solve all kinds of equations where the unknown variable is a vector.

This might sound like a formidably complicated new development in mathematics, but it is not. Doing arithmetic calculations on vectors is simply doing arithmetic operations on their components.

Thus, if I told you that $\vec{v}=(4,2)$ and $\vec{w}=(3,7)$, then \[ \vec{v}-\vec{w} = (4, 2) - (3, 7) = (1, -5). \]

Vectors are extremely useful in all areas of life. In physics, for example, to describe phenomena in the three-dimensional world we use vectors with three components: $x,y$ and $z$. It is of no use to say that we have a force of 20[N] pushing on a block unless we specify in which direction the force acts. Indeed, both of these vectors have length 20 \[ \vec{F}_1 = (20,0,0), \qquad \vec{F}_2=(0,20,0), \] but one points along the $x$ axis, and the other along the $y$ axis, so they are completely different vectors.

Definitions

  • $\hat{x},\hat{y},\hat{z}$: the usual coordinate system. Every vector is implicitly defined in terms of this coordinate system. When you and I talk about the point $P=(3,4,2)$,

we are really saying “start from the origin, $(0,0,0)$, move 3 units in the $x$ direction, then move 4 units in the $y$ direction, and finally move 2 units in the $z$ direction.” Obviously it is simpler to just say $(3,4,2)$, but keep in mind that these numbers are relative to the coordinate system $\hat{x}\hat{y}\hat{z}$.

  • $\hat{\imath},\hat{\jmath},\hat{k}$: is an alternate way of describing the $xyz$-coordinate system

in terms of three unit length vectors:

  \[\hat{\imath} = (1,0,0), \quad \hat{\jmath} = (0,1,0), \quad \hat{k} = (0,0,1).\]
  Any number multiplied by $\hat{\imath}$ corresponds to a vector
  with that number in the first coordinate. For example, $\vec{v}=3\hat{\imath}\equiv(3,0,0)$.
* $\vec{v}=(v_x,v_y,v_z)=v_x\hat{\imath} + v_y \hat{\jmath}+v_z\hat{k}$:
  A //vector// expressed in terms of components and in terms of $\hat{\imath}$, $\hat{\jmath}$ and $\hat{k}$.

In two dimensions there are two equivalent ways to denote vectors:

  • In component notation $\vec{v} =(v_x, v_y)$,

which describes the vector as seen from the $x$ axis and the $y$ axis.

  • As a length and direction $\vec{v}=\|\vec{v}\|\angle \theta$, where $\|\vec{v}\|$

is the length of the vector and $\theta$ is the angle that the vector

  makes with the $x$ axis. 

Vector dimension

The most common types of vectors are $2$-dimensional vectors (like the ones in the Cartesian plane), and $3$-dimensional vectors (directions in 3D space). These kinds of vectors are easier to work with since we can visualize them and draw them in diagrams. Vectors in general can exist in any number of dimensions. An example of a $n$-dimensional vector is \[ \vec{v} = (v_1, v_2, \ldots, v_n) \in \mathbb{R}^n. \]

Vector arithmetic

Addition of vectors is done component wise \[ \vec{v}+\vec{w} = (v_x, v_y) + (w_x, w_y) = (v_x+w_x, v_y+w_y). \] Vector subtraction works the same way: component by component.

The length of a vector is obtained from Pythagoras theorem. Imagine a triangle with one side of length $v_x$ and the other side of length $v_y$. The length of the vector is equal to the length of the hypotenuse: \[ \|\vec{v}\| = \sqrt{ v_x^2 + v_y^2 }. \]

We can also scale a vector by any number $\alpha \in \mathbb{R}$: \[ \alpha \vec{v} = (\alpha v_x, \alpha v_y), \] where we see that each component gets multiplied by the scaling factor $\alpha$. If $\alpha>1$ the vector will get longer, if $0\leq \alpha <1 $ then the vector will shrink. If $\alpha$ is a negative number, then the resulting vector will point in the opposite direction.

A particularly useful scaling is to divide a vector $\vec{v}$ by its length $\|\vec{v}\|$ to obtain a unit length vector that points in the same direction as $\vec{v}$: \[ \hat{v} = \frac{\vec{v}}{ \|\vec{v}\| }. \] Unit-length vectors (denoted with a hat instead of an arrow) are useful when you want to describe a direction in space.

Vector geometry

You can think of a vectors as arrows, and addition as putting together of vectors head-to-tail as shown in the diagram.

The negative of a vector—a vector multiplied by $\alpha=-1$—is a vector of same length but in the opposite direction. So the graphical subtraction of vectors is also possible.

Length and direction of vectors

We have seen so far how to represent vectors as coefficients. There is also another way of expressing vectors: we can specify their length $||\vec{v}||$ and their orientation—the angle they make with the $x$ axis. For example, the vector $(1,1)$ can also be written as $\sqrt{2}\angle45\,^{\circ}$. It is useful to represent vectors in the magnitude and direction notation because their physical size becomes easier to see.

There are formulas for converting between the two notations. To convert the length-and-direction vector $\|\vec{r}\|\angle\theta$ to components $(r_x,r_y)$ use: \[ r_x=\|\vec{r}\|\cos\theta, \qquad\qquad r_y=\|\vec{r}\|\sin\theta. \] To convert from component notation $(r_x,r_y)$ to length-and-direction $\|\vec{r}\|\angle\theta$ use \[ r=\|\vec{r}\|=\sqrt{r_x^2+r_y^2}, \qquad\quad \theta=\tan^{-1}\!\left(\frac{r_y}{r_x}\right). \]

Note that the second part of the equation involves the arctangent (or inverse tan) function which by convention returns values between $\pi/2$ and $\mbox{-}\pi/2$ and must be used carefully for vectors that have direction outside of this range.

Alternate notation

A vector $\vec{v}=(v_x, v_y, v_z)$ is really a prescription to “go a distance $v_x$ in the $x$-direction, then a distance $v_y$ in the $y$-direction and $v_z$ in the $z$-direction.”

A more explicit notation for denoting vectors is as multiples of the basis vectors $\hat{\imath}, \hat{\jmath}$ and $\hat{k}$, which are unit length vectors pointing in the $x$, $y$ and $z$ direction respectively: \[ \hat{\imath} = (1,0,0), \quad \hat{\jmath} = (0,1,0), \quad \hat{k} = (0,0,1). \]

People who do a lot of numerical calculations with vectors often prefer to use the following alternate notation: \[ v_x \hat{\imath} + v_y\hat{\jmath} + v_z \hat{k} \qquad \Leftrightarrow \qquad \vec{v} \qquad \Leftrightarrow \qquad (v_x, v_y, v_z) . \]

The addition rule looks as follows in the new notation: \[ \underbrace{2\hat{\imath}+ 3\hat{\jmath}}_{\vec{v}} \ \ + \ \ \underbrace{ 5\hat{\imath} - 2\hat{\jmath}}_{\vec{w}} \ = \ \underbrace{ 7\hat{\imath} + 1\hat{\jmath} }_{\vec{v}+\vec{w}}. \] It is the same story repeating: adding $\hat{\imath}$s with $\hat{\imath}$s and $\hat{\jmath}$s with $\hat{\jmath}$s.

Examples

Vector addition example

You are heading to your physics class after a safety meeting with a friend and looking forward to two hours of amazement and absolute awe of the laws of Mother nature. As it turns out, there is no enlightenment to be had that day because there is going to be an in-class midterm. The first question you have to solve involves a block sliding down an incline. You look at it, draw a little diagram and then wonder how the hell you are going to find the net force acting on the block (this is what they are asking you to find). The three forces acting on the block are $\vec{W} = 30 \angle -90^{\circ} $, $\vec{N} = 200 \angle -290^{\circ} $ and $\vec{F}_f = 50 \angle 60^{\circ} $.

You happen to remember the formula: \[ \sum \vec{F} = \vec{F}_{net} = m\vec{a}. \qquad \text{[ Newton's \ 2nd law ]} \]

You get the feeling that this is the answer to all your troublems. You know that because the keyword “net force” that appeared in the question appears in this equation also.

The net force is simply the sum of all the forces acting on the block: \[ \vec{F}_{net} = \sum \vec{F} = \vec{W} + \vec{N} + \vec{F}_f. \]

All that separates you from the answer is the addition of these vectors. Vectors right. Vectors have components, and there is the whole sin cos thing for decomposing length and direction vectors in terms of their components. But can't you just add them together as arrows too? It is just a sum, of things right, should be simple.

OK, chill. Let's do this one step at a time. The net force must have and $x$-component which, according to the equation, must be equal to the sum of the $x$ components of all the forces: \[ \begin{align*} F_{net,x} & = W_x + N_x + F_{f,x} \nl & = 30\cos(-90^{\circ}) + 200\cos(-290^{\circ})+ 50\cos(60^{\circ}) \nl & = 93.4[\textrm{N}]. \end{align*} \] You find the $y$ component of the net force using the $\sin$ of the angles: \[ \begin{align*} F_{net,y} & = W_y + N_y + F_{f,y} \nl & = 30\sin(-90) + 200\sin(-290)+ 50\sin(60) \nl & = 201.2[\textrm{N}]. \end{align*} \]

Combining the two components of the victor, we get the final answer: \[ \vec{F}_{net} = (F_{net,x},F_{net,y}) =(93.4,201.2) =93.4 \hat{\imath} + 201.2 \hat{\jmath}. \] Bam! Just like that you are done because you overstand them mathematics. Nuh problem. What-a-di next question fi me?

Relative motion example

A boat can reach a top speed of 12 knots in calm seas. Instead of being in a calm sea, however, it is trying to sail up the St-Laurence river. The speed of the current is 5 knots.

If the boat goes directly upstream at full throttle 12$\vec{\imath}$, then the speed of the boat relative to the shore will be \[ 12\hat{\imath} - 5 \hat{\imath} = 7\hat{\imath}, \] since we have to “deduct” the speed of the current from the speed of the boat relative to the water.

Ferry crossing the river, has to cancel the current with part of the thrust of the boat. If the boat wants to cross the river perpendicular to the current flow, then it can use some of its thrust to counterbalance the current, and the other part to push across. What direction should the boat sail in so that it moves in the across-the-river direction? We are looking for the direction of $\vec{v}$ the boat should take such that, after adding the current component, the boat moves in a straight line between the two banks (the $\hat{\jmath}$ direction).

The geometrical picture is necessary so draw a river and a triangle in the river with the long side perpendicular to the current flow. Make the short side of length $5$ and the hypotenuse of length $12$. We will take the up-the-river component of the speed $\vec{v}$ to be equal to $5\hat{\imath}$ so that it cancels exactly the $-5\hat{\imath}$ flow of the river. We have also labeled the hypotenuse as 12 since this is the ultimate speed that the boat can have relative to the water.

From all of this we can answer the questions like professionals. You want the angle? OK, well we have that $12\sin(\theta)=5$, where $\theta$ is the angle of the boat's course relative to the straight line between the two banks. We can use the inverse-sin function to solve for the angle: \[ \theta = \sin^{-1}\!\left(\frac{5}{12} \right) = 24.62^\circ. \] The accross-the-river component speed can be calculated from $v_y = 12\cos(\theta)$, or from Pythagoras Theorem if you prefer $v_y = \sqrt{ \|\vec{v}\|^2 - v_x^2 } = \sqrt{ 12^2 - 5^2 }=10.91$.

Throughout this section we have used the $x$, $y$ and $z$ axes and described vectors as components along each of these directions. It is very convenient to have perpendicular axes like this, and a set of unit vectors pointing in each of the three directions like the vectors $\{\hat{\imath},\hat{\jmath},\hat{k}\}$.

More generally, we can express vectors in terms of any basis $\{ \hat{e}_1, \hat{e}_2, \hat{e}_3 \}$ for the space of three-dimensional vectors $\mathbb{R}^3$. What is a basis you ask? I am glad you asked, because it is a very important concept.

Introduction

One of the coolest things about understanding math is that you will automatically start to understand the laws of physics too. Indeed, most physics laws are expressed as mathematical equations. If you know how to manipulate equations and you know how to solve for the unknowns in them, then you know half of physics already.

Ever since Newton figured out the whole $F=ma$ thing, people have used mechanics in order to achieve great technological feats like landing space ships on The Moon and recently even on Mars. You can be part of that too. Learning physics will give you the following superpowers:

  1. The power to predict the future motion of objects using equations.

It is possible to write down the equation which describes the position of

  an object as a function of time $x(t)$ for most types of motion.
  You can use this equation to predict the motion at all times $t$,
  including the future.
  "Yo G! Where's the particle going to be at when $t=1.3$[s]?",
  you are asked. "It is going to be at $x(1.3)$[m] bro." 
  Simple as that. If you know the equation of motion $x(t)$, 
  which describes the position for //all// times $t$, 
  then you just have to plug $t=1.3$[s] into $x(t)$
  to find where the object will be at that time. 
- Special **physics vision** for seeing the world.
  You will start to think in term of concepts like force, acceleration and velocity
  and use these concepts to precisely describe all aspects of the motion of objects.
  Without physics vision, when you throw a ball in the air you will see it go up, 
  reach the top, then fall down.   
  Not very exciting. 
  Now //with// physics vision, 
  you will see that at $t=0$[s] a ball is thrown into the $+\hat{y}$ direction
  with an initial velocity of $\vec{v}_i=12\hat{y}$[m/s]. The ball reaches a maximum 
  height of $\max\{ y(t)\}= \frac{12^2}{2\times 9.81}=7.3$[m] at $t=12/9.81=1.22$[s], 
  and then falls back down to the ground after a total flight time of 
  $t_{f}=2\sqrt{\frac{2 \times 7.3}{9.81}}=2.44$[s].

Why learn physics?

A lot of knowledge buzz awaits you in learning about the concepts of physics and understanding how the concepts are connected. You will learn how to calculate the motion of objects, how to predict the outcomes of collisions, how to describe oscillations and many other things. Once you develop your physics skills, you will be able to use the equations of physics to derive one number (say the maximum height) from another number (say the initial velocity of the ball). Physics is a bit like playing LEGO with a bunch of cool scientific building blocks.

By learning how to solve equations and how to deal with complicated physics problems, you will develop your analytical skills. Later on, you can apply these skills to other areas of life; even if you do not go on to study science, the expertise you develop in solving physics problems will help you deal with complicated problems in general. Companies like to hire physicists even for positions unrelated to physics: they feel confident that if the candidate has managed to get through a physics degree then they can figure out all the business shit easily.

Intro to science

Perhaps the most important reason why you should learn physics is because it represents the golden standard for the scientific method. First of all, physics deals only with concrete things which can be measured. There are no feelings and zero subjectivity in physics. Physicists must derive mathematical models which accurately describe and predict the outcomes of experiments. Above all, we can test the validity of the physical models by running experiments and comparing the outcome predicted by the theory with what actually happens in the lab.

The key ingredient in scientific thinking is skepticism. The scientist has to convince his peers that his equation is true without a doubt. The peers shouldn't need to trust the scientist, but instead carry out their own tests to see if the equation accurately predicts what happens in the real world. For example, let's say that I claim that the equation of motion for the ball thrown up in the air with speed $12$[m/s] is given by $y_c(t)=\frac{1}{2}(-9.81)t^2 + 12t+0$. To test whether this equation is true, you can perform the throwing-the-ball-in-the-air experiment and record the maximum height the ball reaches and the total time of flight and compare them with those predicted by the claimed equation~$y_c(t)$. The maximum height that the ball will attain predicted by the claimed equation occurs at $t=1.22$ and is obtained by substituting this time into the equation of motion $\max_t\{ y_c(t)\}=y_{c}(1.22)=7.3$[m]. If this height matches what you measured in the real world, you can maybe start to trust my equation a little bit. You can also check whether the equation $y_c(t)$ correctly predicts the total time of flight which you measured to be $t=2.44$[s]. To do this you have to check whether $y_c(2.44) = 0$ as it should be when the ball hits the ground. If both predictions of the equation $y_c(t)$ match what happens in the lab, you can start to believe that the claimed equation of motion $y_c(t)$ really is a good model for the real world.

The scientific method depends on this interplay between experiment and theory. Theoreticians prove theorems and derive physics equations, while experimentalists test the validity of the equations. The equations that accurately predict the laws of nature are kept while inaccurate models are rejected.

Equations of physics

The best of the equations of physics are collected and explained in textbooks. Physics textbooks contain only equations that have been extensively tested and are believed to be true. Good physics textbooks also show how the equations are derived from first principles. This is really important, because it is much easier to remember a few general principles at the foundation of physics rather than a long list of formulas. Understanding trumps memorization any day of the week.

In the next section we learn about the equations $x(t)$, $v(t)$ and $a(t)$ which describes the motion of objects. We will also illustrate how the position equation $x(t)=\frac{1}{2}at^2 + v_it+x_i$ can be derived using simple mathematical methods (calculus). Technically speaking, you are not required to know how to derive the equations of physics—you just have to know how to use them. However, learning a bit of theory is a really good deal: reading a few extra pages of theory will give you a deep understanding of, not one, not two, but eight equations of physics.

Physics fundamentals

We begin with a lightning fast introduction to the basic tools of physics.

Mathematical methods

If you read chapter one of this book, you are now optimally prepared to learn physics. You are not afraid of numbers or simple algebra rules. You know how to solve equations. You are familiar with functions such as the linear function $f(x)=mx+b$ and the quadratic function $f(x)=ax^2+bx+c$. In particular you should know how to solve the quadratic equation. Sometimes there will be two unknowns to solve for in a physics problem, but this is not much harder. If you have two equations that you know to be true, then you can solve two equations simultaneously to find both unknowns.

Vectors

Most of the cool quantities in physics are vectors $\vec{v}=(v_x,v_y)$. Velocity is a vector, forces are vectors, and the electric and magnetic fields are vectors too. Dealing with vectors involves dealing with their components. So saying that $\vec{a}=\vec{b}$ is really saying that the $x$ components of these vectors are equal \[ a_x = b_x, \] and their $y$ components are equal too: \[ a_y = b_y. \] So when I say that $\vec{v}_i = 0\hat{x} + 12\hat{y}$, I am saying that the $x$-component is zero $v_{ix} = 0$ and the $y$-component is twelve $v_{iy}= 12$. However, the teacher won't make physics easy for you on the homework, and definitely not on the exams. He or she won't tell you the vector components, but instead say something like “the initial velocity $\vec{v}_i$ is 12[m/s] and it acts at an angle of 90 degrees with respect to the $x$ axis.” This is the length-and-direction way of talking about vectors. To get the $x$ and $y$ components of the vector $\vec{v}_i$ you have to use cos and sin as follows: \[ v_{ix} = 12 \cos 90=0, \qquad v_{iy} = 12 \sin 90=12. \] If this doesn't seem obvious to you, then you should draw a right-angle triangle and recall the definitions of sin and cos.

We will discuss vectors in more depth in Chapter 3.

Calculus

Yes, calculus. You need to understand calculus in order to understand mechanics properly. The two subjects are meant for each other. This is in fact the whole idea behind this book.

It is possible to teach physics without calculus. For example, a teacher could state the equations of kinematics (the area of physics which deals with the motion of objects) without proof. This “memorize the equations” approach is how physics is usually taught in high school. The equations are true “by revelation”. This is an OK way to learn physics when you are in high school, because the only mathematical technique you know as a kid is how to solve equations. Indeed just knowing how to use the equations of kinematics is quite enough to solve many physics problems.

Later on in this chapter (after learning a bit about calculus), we will revisit the equations of kinematics and see where they actually come from. You are adults now. You can handle the truth. Don't worry though, it won't take more than a couple of pages.

Kinematics

Kinematics (from the Greek word kinema for motion) is the study of trajectories of moving objects. The equations of kinematics can be used to calculate how long a ball thrown upwards will stay in the air, or to calculate the acceleration needed to go from 0 to 100 km/h in 5 seconds. To carry out these calculations we need to know which equation of motion to use and the initial conditions (the initial position $x_i$ and the initial velocity $v_{i}$). Plug in the knowns into the equations of motion and then you can solve for the desired unknown using one or two simple algebra steps. This entire section boils down to three equations. It is all about the plug-number-into-equation technique.

The purpose of this section is to make sure that you know how to use the equations of motion and understand concepts like velocity and accretion well. You will also learn how to easily recognize which equation is appropriate need to use to solve any given physics problem.

Concepts

The key notions used to describe the motion of an objects are:

  • $t$: the time, measured in seconds [s].
  • $x(t)$: the position of an object as a function of time—also known as the equation of motion. The position of an object is measured in metres [m].
  • $v(t)$: the velocity of the object as a function of time. Measured in [m/s].
  • $a(t)$: the acceleration of the object as a function of time. Measured in [m/s$^2$].
  • $x_i=x(0), v_i=v(0)$: the initial (at $t=0$) position and velocity of the object (initial conditions).

Position, velocity and acceleration

The motion of an object is characterized by three functions: the position function $x(t)$, the velocity function $v(t)$ and the acceleration function $a(t)$. The functions $x(t)$, $v(t)$ and $a(t)$ are connected—they all describe different aspects of the same motion.

You are already familiar with these notions from your experience driving a car. The equation of motion $x(t)$ describes the position of the car as a function of time. The velocity describes the change in the position of the car, or mathematically \[ v(t) \equiv \text{rate of change in } x(t). \] If we measure $x$ in metres [m] and time $t$ in seconds [s], then the units of $v(t)$ will be metres per second [m/s]. For example, an object moving at a constant speed of $30$[m/s] will have its position change by $30$[m] each second.

The rate of change of the velocity is called the acceleration: \[ a(t) \equiv \text{rate of change in } v(t). \] Acceleration is measured in metres per second squared [m/s$^2$]. A constant positive acceleration means the velocity of the motion is steadily increasing, like when you press the gas pedal. A constant negative acceleration means the velocity is steadily decreasing, like when you press the brake pedal.

The illustration on the right shows the simultaneous graph of the position, velocity and acceleration of a car during some time interval. In a couple of paragraphs, we will discuss the exact mathematical equations which describe $x(t)$, $v(t)$ and $a(t)$. But before we get to the math, let us visually analyze the motion illustrated on the right.

The car starts off with an initial position $x_i$ and just sits there for some time. The driver then floors the pedal to produce a maximum acceleration for some time, picks up speed and then releases the accelerator, but keeps it pressed enough to maintain a constant speed. Suddenly the driver sees a police vehicle in the distance and slams on the brakes (negative acceleration) and shortly afterwards brings the car to a stop. The driver waits for a few seconds to make sure the cops have passed. The car then accelerates backwards for a bit (reverse gear) and then maintains a constant backwards speed for an extended period of time. Note how “moving backwards” corresponds to negative velocity. In the end the driver slams on the brakes again to bring the car to a stop. The final position is $x_f$.

In the above example, we can observe two distinct types of motion. Motion at a constant velocity (uniform velocity motion, UVM) and motion with constant acceleration (uniform acceleration motion, UAM). Of course, there could be many other types of motion, but for the purpose of this section you are only responsible for these two.

  • UVM: During times when there is no acceleration,

the car maintains a uniform velocity, that is,

  $v(t)$ will be a constant function.
  Constant velocity means that the position function
  will be a line with a constant slope because, by definition, $v(t)= \text{slope of } x(t)$.
* UAM: During times where the car experiences a constant acceleration $a(t)=a$,
  the velocity of the function will change at a constant rate.
  The rate of change of the velocity is constant $a=\text{slope of } v(t)$,
  so the velocity function must look like a line with slope $a$.
  The position function $x(t)$ has a curved shape (quadratic) during moments of 
  constant acceleration.

Formulas

There are basically four equations that you need to know for this entire section. Together, these three equations fully describe all aspects of any motion with constant acceleration.

Uniform acceleration motion (UAM)

If the object undergoes a constant acceleration $a(t)=a$, like your car if you floor the accelerator, then its motion will be described by the following equations: \[ \begin{align*} x(t) &= \frac{1}{2}at^2 + v_i t + x_i, \nl v(t) &= at + v_i, \nl a(t) &= a, \end{align*} \] where $v_i$ is the initial velocity of the object and $x_i$ is its initial position.

There is also another useful equation to remember: \[ [v(t)]^2 = v_i^2 + 2a[x(t)- x_i], \] which is usually written \[ v_f^2 = v_i^2 + 2a\Delta x, \] where $v_f$ denotes the final velocity and $\Delta x$ denotes the change in the $x$ coordinate.

That is it. Memorize these equations, plug-in the right numbers, and you can solve any kinematics problem humanly imaginable. Chapter done.

Uniform velocity motion (UVM)

The special case where there is zero acceleration ($a=0$), is called uniform velocity motion or UVM. The velocity stays uniform (constant) because there is no acceleration. The following three equations describe the motion of the object under uniform velocity: \[ \begin{align} x(t) &= v_it + x_i, \nl v(t) &= v_i, \nl a(t) &= 0. \end{align} \] As you can see, these are really the same equations as in the UAM case above, but because $a=0$, some terms are missing.

Free fall

We say that an object is in free fall if the only force acting on it is the force of gravity. On the surface of the earth, the force of gravity produces a constant acceleration of $a=-9.81$[m/s$^2$]. The negative sign is there because the gravitational acceleration is directed downwards, and we assume that the $y$ axis points upwards. The motion of an object in free fall is described by the UAM equations.

Examples

We will now illustrate how the equations of kinematics are used to solve physics problems.

Moroccan example

Suppose your friend wants to send you a ball wrapped in aluminum foil from his balcony, which is located at a height of $x_i=44.145$[m]. How long does it take for the ball to hit the ground?

We recognize that this is a problem with acceleration, so we start by writing out the general UAM equations: \[ \begin{align*} y(t) &= \frac{1}{2}at^2 + v_i t + y_i, \nl v(t) &= at + v_i. \end{align*} \] To find the answer, we substitute the known values $y(0)=y_i=44.145$[m], $a=-9.81$ and $v_i=0$[m/s] (since the ball was released from rest) and solve for $t_{fall}$ in the equation $y(t_{fall}) = 0$ since we are interested in the time when the ball will reach a heigh of zero. The equation is \[ y(t_{fall}) = 0 = \frac{1}{2}(-9.81)(t_{fall})^2+0(t_{fall}) + 44.145, \] which has solution $t_{fall} = \sqrt{\frac{44.145\times 2}{9.81}}= 3$[s].

0 to 100 in 5 seconds

Suppose you want to be able to go from $0$ to $100$[km/h] in $5$ seconds with your car. How much acceleration does your engine need to produce, assuming it produces a constant amount of acceleration.

We can calculate the necessary $a$ by plugging the required values into the velocity equation for UAM: \[ v(t) = at + v_i. \] Before we get to that, we need to convert the velocity in [km/h] to velocity in [m/s]: $100$[km/h] $=\frac{100 [\textrm{km}]}{1 [\textrm{h}]} \cdot\frac{1000[\textrm{m}]}{1[\textrm{km}]} \cdot\frac{1[\textrm{h}]}{3600[\textrm{s}]}$= 27.8 [m/s]. We fill in the equation with all the desired values $v(5)=27.8$[m/s], $v_i=0$, and $t=5$[s] and solve for $a$: \[ v(5) = 27.8 = a(5) + 0. \] We conclude that your engine has to produce a constant acceleration of $a=5.56$[m/s$^2$] or more.

Moroccan example II

Some time later, your friend wants to send you another aluminum ball from his apartment located on the 14th floor (height of $44.145$[m]). In order to decrease the time of flight, he throws the ball straight down with an initial velocity of $10$[m/s]. How long does it take before the ball hits the ground?

Imagine the building with the $y$ axis measuring distance upwards starting from the ground floor. We know that the balcony is located at a height of $y_i=44.145$[m], and that at $t=0$[s] the ball starts with $v_i=-10$[m/s]. The initial velocity is negative, because it points in the opposite direction to the $y$ axis. We know that there is an acceleration due to gravity of $a_y=-g=-9.81$[m/s$^2$].

We start by writing out the general UAM equation: \[ y(t) = \frac{1}{2}a_yt^2 + v_i t + y_i. \] We want to find the time when the ball will hit the ground, so $y(t)=0$. To find $t$, we plug in all the known values into the general equation: \[ y(t) = 0 = \frac{1}{2}(-9.81)t^2 -10 t + 44.145, \] which is a quadratic equation in $t$. First rewrite the quadratic equation into the standard form: \[ 0 = \underbrace{4.905}_a t^2 + \underbrace{10.0}_b \ t - \underbrace{44.145}_c, \] and then solve using the quadratic equation: \[ t_{fall} = \frac{-b \pm \sqrt{ b^2 - 4ac }}{2a} = \frac{-10 \pm \sqrt{ 25 + 866.12}}{9.81} = 2.53 \text{ [s]}. \] We ignored the negative-time solution because it corresponds to a time in the past. Comparing with the first Moroccan example, we see that the answer makes sense—throwing a ball downwards will make it fall to the ground faster than just dropping it.

Discussion

Most kinematics problems you will be asked to solve follow the same pattern as the above examples. You will be given some of the initial values and asked to solve some unknown quantity. It is important to keep in mind the signs of the numbers you plug into the equations. You should always draw the coordinate system and indicate clearly (to yourself) the $x$ axis which measures the displacement. If a velocity or acceleration quantity points in the same direction as the $x$ axis then it is a positive number while quantities that point in the opposite direction are negative numbers.

All this talk of $v(t)$ being the “rate of change of $x(t)$” is starting to get on my nerves. The expression “rate of change of” is an euphemism for the calculus term derivative. We will now take a short excursion into the land of calculus in order to define some basic concepts (derivatives and integrals) so that we can use us this more precise terminology in the remainder of the book.

Force diagrams

Welcome to Force-Accounting 101. In this section we will learn how to identify all the forces acting on an object and use Newton's 2nd law $\sum \vec{F}=\vec{F}_{net} = m\vec{a}$ to predict the resulting acceleration.

Concepts

Newton's second law describes a relationship between these three quantities:

  • $m$: the mass of an object.
  • $\vec{F}_{net}$: the net force on the object.
  • $\vec{a}$: the acceleration of the object.

Forces and accelerations are vectors. To work with vectors, we work with their components:

  • $F_x$: the component of $\vec{F}$ in the $x$ direction.
  • $F_y$: the component of $\vec{F}$ in the $y$ direction.

Vectors are meaningless unless it is clear with respect to which coordinate system they are expressed.

  • $x$ axis: Usually the $x$ axis is horizontal and to the right, however, for problems with inclines,

it will be more convenient to use an inclined $x$ axis that is parallel to the slope.

  • $y$ axis: The $y$ axis is always perpendicular to the $x$ axis.
  • $\hat{\imath},\hat{\jmath}$: Unit vectors in the $x$ and $y$ directions. Any vector can be written as $\vec{v}=v_x\hat{\imath}+v_y\hat{\jmath}$ or as $\vec{v}=(v_x,v_y)$.

Provided we have a coordinate system, we can write any force vector in three equivalent ways: \[ \vec{F} \equiv F_x\hat{\imath} + F_y\hat{\jmath} \equiv (F_x,F_y) \equiv \|\vec{F}\|\angle \theta. \]

What types of forces are there in force diagrams?

  • $\vec{W}\equiv\vec{F}_{gravity}=m\vec{g}$: The weight. This is the force on a object due to its gravity. The gravitational pull $\vec{g}$ always points downwards – towards the centre of the earth. $g=9.81$[N/kg].
  • $\vec{T}$: Tension in a rope. Tension is always pulling away from the object.
  • $\vec{N}$: Normal force – the force between two surfaces.
  • $\vec{F}_{fs}=\mu_s\|\vec{N}\|$: Static force of friction.
  • $\vec{F}_{fk}=\mu_k\|\vec{N}\|$: Kinetic force of friction.
  • $\vec{F}_{s}=-kx$: The force (pull or push) of a spring that is displaced (stretched or compressed) by $x$ metres.

Formulas

Newton's 2nd law

The sum of the forces acting on an object, divided by the mass, gives you the acceleration of the object: \[ \sum \vec{F} \equiv \vec{F}_{net}= m\vec{a}. \]

Vector components

If a vector $\vec{v}$ makes an angle $\theta$ with the $x$ axis then: \[ v_x = \|\vec{v}\|\cos\theta, \qquad \text{and} \qquad v_y = \|\vec{v}\|\sin\theta. \] The vector $v_x\hat{\imath}$ corresponds to the part of $\vec{v}$ that points in the $x$ direction.

In what follows, you will be asked a countless number of times to \[ \text{Find the component of } \vec{F} \text{ in the ? direction. } \] Which is another way of asking you to find the number $v_?$.

The answer is usually equal to the length $\|\vec{F}\|$ multiplied by either $\cos$ or $\sin$ and sometimes $-1$ all depending on way the coordinate system is chosen. So don't guess. Look at the coordinate system. If the vector points in the direction where $x$ increases, then $v_x$ should be a positive number. If $\vec{v}$ points in the opposite direction, then $v_x$ should be negative.

To add forces $\vec{F}_1$ and $\vec{F}_2$ you have to add their components: \[ \vec{F}_1 + \vec{F}_2 = (F_{1x},F_{1y}) + (F_{1x},F_{2y}) = (F_{1x}+F_{2x},F_{1y}+F_{2y}) = \vec{F}_{net}. \] Instead of dealing with vectors in the bracket notation as above, when solving force diagrams it is easier to simply write the $x$ equation on one line, and the $y$ equation on a separate line below it: \[ F_{netx} = F_{1x}+F_{2x}, \] \[ F_{nety} = F_{1y}+F_{2y}. \] It is a good idea to always write those two equations together as a block – so it remains clear that you are talking about the same problem, but the first row represents the $x$-dimension and the second row represents the $y$-dimension.

Force check

It is important to account for all the forces acting on an object. Any object with mass on the surface of the earth will feel a downwards gravitational pull of magnitude $F_{g}=W=m\vec{g}$. Then you have to think about which of the other forces might be present: $\vec{T}$, $\vec{N}$, $\vec{F}_{f}$, $\vec{F}_{s}$. Anytime you see a rope tugging on the object, you know there must be some tension $\vec{T}$, which is a force vector pulling on the block. Anytime you have an object sitting on a surface, the surface will push back with a normal force $\vec{N}$. If the object is sliding on the surface there will be a force of friction acting against the direction of the motion: \[ F_{fk}=\mu_k\|\vec{N}\|. \] If the object is not moving, then you have to use $\mu_s$ in the friction force equation, to get the maximum static friction force that the contact between the object and the ground can support before the object starts to slip: \[ \max\{ F_{fs} \}=\mu_s\|\vec{N}\|. \] If you see a spring that is either stretched or compressed by the object, then you must account for the spring force. The force of a spring is restorative: it always acts against the deformation you are making to the spring. If you stretch it by $x$[cm], then it will try to pull itself back to its normal length with a force of: \[ \vec{F}_s = -kx \hat{\imath}. \] The constant of proportionality $k$ is called the spring constant and is measured in [N/m].

Recipe for solving force diagrams

Below we list the steps of the general procedure to follow when solving problems in dynamics.

  1. Draw a force diagram focussed on the object and indicate all the forces acting on it.
  2. Choose a coordinate system, and indicate clearly in the diagram what you will call the positive $x$ direction, and what you will call the positive $y$ direction. All quantities in the subsequent equations will be expressed with respect to this coordinate system.
  3. Write down the following “template”:

\[ \sum F_x = \qquad \qquad \qquad = ma_x, \] \[ \sum F_y = \qquad \qquad \qquad = ma_y. \]

  1. Fill in the template by calculating the $x$ and $y$ components

of each force acting on the object:

  $\vec{W}$, $\vec{N}$, $\vec{T}$, $\vec{F}_{fs}$, $\vec{F}_{fk}$,
  $\vec{F}_{s}$ as applicable.
- Solve the equations for the unknown quantities.

I highly recommend that you perform some consistency checks after Step 4. You should check the signs: if the force in the diagram is acting in the $x$ direction, then its component must be positive. If the force is acting in the direction opposite to the $x$ axis, then its component should be negative. You should also check that whenever $F_x \propto \cos\theta$, then $F_y \propto \sin\theta$. If instead we use the angle $\phi$ defined with respect to the $y$ axis, we would have $F_x \propto \sin\phi$, and $F_y \propto \cos\phi$.

We will now illustrate how to use this recipe through a series of examples.

Examples

Block on a table

You place a block of mass $m$ on the table. If it has mass $m$ then it feels its weight $\vec{W}$ pulling down on it, but the table is not letting it drop to the floor. The table pushes back on the block with a normal force $\vec{N}$.

Steps 1,2: We draw the force diagram and choose a coordinate system:

Simplest possible force diagram.

Step 3: Next, we write down the empty equations template: \[ \begin{align*} \sum F_x &= \qquad \qquad = ma_x, \nl \sum F_y &= \qquad \qquad = ma_y. \end{align*} \]

Step 4: There is nothing much going on in the $x$ direction: no forces acting in the $x$ direction and the block is not moving so $a_x=0$. In the $y$ direction we have the force of gravity and the normal force exerted by the table: \[ \begin{align*} \sum F_x &= 0 = 0, \nl \sum F_y &= N - mg = 0. \end{align*} \] We set $a_y=0$, because we see that the block is just sitting there on the table without moving. The technical term for situations where $a_x=0, a_y=0$ is called static equilibrium. Force diagrams with static equilibrium are easy to solve, because the entire right-hand side is equal to zero, which means that the forces on the object must be counter-balancing each other.

Step 5: Suppose the teacher was asking you “What is the magnitude of the normal force?”. You can easily answer this by looking at the second equation: “$N=mg$ bro!”

Moving the fridge

You are trying to push your fridge across the kitchen floor. Because it weights quite a lot, it is “gripping” the floor quite a bit. If the static coefficient of friction between the metal “feet” of your fridge and the tiles of the floor is $\mu_s$, how much force $\vec{F}_{ext}$ would it take to get the fridge to start moving?

When will the fridge start to slip?

\[ \begin{align*} \sum F_x &= F_{ext} - F_{fs} = 0, \nl \sum F_y &= N - mg = 0. \end{align*} \]

If you push with force $F_{ext}=30$[N], the fridge will push back (via its connection to the floor) with a force $F_{fs}=30$[N]. If you push harder, the fridge will push back harder and it will still not move. Only when you reach the slipping threshold will it move. This means you have to push with force equal to the maximum static friction force $F_{fs}=\mu_s N$, so we have: \[ \begin{align*} \sum F_x &= F_{ext} - \mu_s N = 0, \nl \sum F_y &= N - mg = 0. \end{align*} \]

To solve for $F_{ext}$ you first isolate $N=mg$ in the bottom line, and then substitute the value of $N$ in the top line to get $F_{ext} = \mu_s m g$.

Friction slowing you down

OK, so you have the fridge moving now and you are moving at a steady pace across the room:

Kinetic friction.

Your equation of motion is going to be: \[ \begin{align*} \sum F_x &= F_{ext} - \vec{F}_{fk} = ma_x, \nl \sum F_y &= N - mg = 0. \end{align*} \]

In particular if you want to keep a steady speed ($v=const$) as you move across the room, you will push with such a force just to balance the friction force and keep $a_x=0$.

To find the value of $F_{ext}$ to keep a constant speed we solve: \[ \begin{align*} \sum F_x &= F_{ext} - \mu_k N = 0, \nl \sum F_y &= N - mg = 0. \end{align*} \]

We get a similar expression as above, but with $\mu_k$ instead of $\mu_s$: $F_{ext} = \mu_k m g$. Generally, $\mu_k < \mu_s$ so it takes less force to keep the fridge moving than it took to get it to start moving.

Let us now take a different slant on this whole friction thing.

Incline

At this point, my dear readers, we are getting into the main kind of question that will be, without a doubt, asked in your homework or at the final exam. A block sliding down an incline. What is its acceleration?

Step 1: We draw a diagram which includes the weight $\vec{W}$, the normal force $\vec{N}$ and the friction force $\vec{F}_{fk}$.

Step 2: We pick the coordinate system to be tilted along the incline. This is important because this way the motion is purely in the $x$ direction, while the $y$ direction will be static.

Step 3,4: Let's copy the empty template, and fill in the equations: \[ \begin{align*} \sum F_x &= \|\vec{W}\|\sin\theta - F_{fk} = ma_x, \nl \sum F_y &= N - \|\vec{W}\|\cos\theta \ \ = 0, \end{align*} \] or substituting the values that we know: \[ \begin{align*} \sum F_x &= mg\sin\theta - \mu_kN = ma_x, \nl \sum F_y &= N - mg\cos\theta \ \ \ = 0. \end{align*} \]

Step 5: From the $y$ equation, we obtain $N=mg\cos\theta$ and substituting this into the $x$ equation we get: \[ a_x = \frac{1}{m}\left( mg\sin\theta - \mu_k mg\cos\theta \right) = g\sin\theta - \mu_k g\cos\theta. \]

Bathroom scale

You have a spring with spring constant $k$ on which you put a block of mass $m$. By what length $\Delta y$ will the spring be compressed?

Step 1,2: We draw a before and after picture, with the $y$ axis placed at the natural length of the spring.

Step 3,4: Filling in the template we get: \[ \begin{align*} \sum F_x &= 0 = 0, \nl \sum F_y &= F_s - mg = 0. \end{align*} \]

Step 5: We know that the force exerted by a spring is proportional to its displacement according to \[ F_s = -k y_B, \] so we can find $y_B = -\frac{mg}{k}$. The length of compression is therefore: \[ |\Delta y| = \frac{mg}{k}. \]

Two blocks

Now for a more involved example with two blocks. One block is sitting on the surface, and another one is falling straight down. The two are connected by a stiff rope. What is the acceleration of the system as a whole?

Steps 1,2: We have two objects, so we have to draw two force diagrams.

Step 3: We also have two sets of equations. One set of equations for the left block, and one for the right block: \[ \begin{align*} & \sum F_{1x} = \qquad\qquad = m_1a_{x_1} & \qquad & \sum F_{2x} = \qquad\quad = m_2a_{x_2} \nl & \sum F_{1y} = \qquad\qquad = m_1a_{y_1} & \qquad & \sum F_{2y} = \qquad\quad = m_2a_{y_2} \end{align*} \]

Steps 4: We fill them in with all the forces drawn in the diagram: \[ \begin{align*} & \sum F_{1x} = -F_{fk} + T_1 = m_1a_{x_1} & \qquad & \sum F_{2x} = 0 =0 \nl & \sum F_{1y} = N_1 - W_1 = 0 & \qquad & \sum F_{2y} = -W_2 + T_2 = m_2a_{y_2} \end{align*} \]

Step 5: What are the connections between the two blocks? Since it is the same rope that connects the two blocks, this means that the tension in the rope is the same on both ends so $T_1=T_2=T$. Also since the rope is of fixed length we have that the $x_1$ and $y_2$ coordinates are related by a constant (though they point in different directions), so it must be that $a_{x_1}= -a_{y_2} = a$.

Rewriting in terms of the new common variables $T$ and $a$ we have: \[ \begin{align*} & \sum F_{1x} = -\mu_kN_1 + T = m_1a & \qquad & \sum F_{2x} = 0 =0 \nl & \sum F_{1y} = N_1 - m_1g = 0 & \qquad & \sum F_{2y} = -m_2g + T = - m_2a \end{align*} \]

We isolate $N_1$ on the bottom left, and isolate $T$ on the bottom right: \[ \begin{align*} & \sum F_{1x} = -\mu_kN_1 + T = m_1a & \qquad & \sum F_{2x} = 0 =0 \nl & N_1 = m_1g & \qquad & T = - m_2a + m_2g \end{align*} \]

Now substitute the values into the top left equation to get \[ \sum F_{1x} = -\mu_k(m_1g) + (- m_2a + m_2g) = m_1 a, \] or moving all the $a$ terms to one side we have \[ -\mu_km_1g + m_2g = m_1 a + m_2 a = (m_1 + m_2) a, \] which makes sense since the “two blocks attached with a rope” is in some sense an object of collective mass $(m_1 + m_2)$ with two external forces on it. From this point of view, the tension $T$ is an internal force of the object and doesn't appear in the external force equation.

The acceleration of the whole two-block system going to be: \[ a = \frac{m_2g - \mu_km_1g}{m_1+m_2}. \]

Two inclines

OK, let's just go crazy now! Let's have two inclines, two blocks, a rope, and friction everywhere. We want to find the acceleration as usual.

Steps 1,2: We draw a force diagram with two different coordinate systems each adapted for the angle of the incline:

Steps 3,4: Fill in all force components, and set $a_{y_1}=0,a_{y_2}=0$: \[ \begin{align*} & \sum F_{1x} = W_1\sin\alpha - F_{1fk} + T_1 = m_1a_{x_1}, \nl & \sum F_{1y} = -W_1\cos\alpha + N_1 \quad \ \ \ = 0, \nl & \qquad \qquad \qquad \qquad \qquad \qquad \sum F_{2x} = W_2\sin\beta - F_{2fk} - T_2 = m_2a_{x_2}, \nl & \qquad \qquad \qquad \qquad \qquad \qquad \sum F_{2y} = -W_2\cos\beta + N_2 \quad \ \ \ =0. \end{align*} \]

Step 5: The links between the two worlds are two: the tension in the rope is the same $T=T_1=T_2$ and also the acceleration since the blocks are moving together $a=a_{x_1}=a_{x_2}$. Rewriting and expanding we have: \[ \begin{align*} & \sum F_{1x} = m_1g\sin\alpha - \mu_k N_1 + T = m_1a, \nl & N_1 = m_1g\cos\alpha, \nl & \qquad \qquad \qquad \qquad \qquad \qquad \sum F_{2x} = m_2g\sin\beta - \mu_k N_2 - T = m_2a, \nl & \qquad \qquad \qquad \qquad \qquad \qquad N_2 = m_2g\cos\beta. \end{align*} \]

Let's substitute the values of $N_1$ and $N_2$ into the $x$ equations: \[ \begin{align} & \sum F_{1x} = m_1g\sin\alpha - \mu_k m_1g\cos\alpha + T = m_1a, \nl & \qquad \qquad \qquad \qquad \qquad \qquad \sum F_{2x} = m_2g\sin\beta - \mu_k m_2g\cos\beta - T = m_2a. \end{align} \]

There are many ways to solve for the two unknowns in this pair of equations. Either (A) we isolate $T$ in one of the equations and substitute the value of $T$ into the second or (B) we isolate $a$ in both equations and set them equal to each other.

We will use approach (A) and isolate $T$ in the bottom equation to get: \[ \begin{align} & m_1g\sin\alpha - \mu_k m_1g\cos\alpha + T = m_1a, \nl & m_2g\sin\beta - \mu_k m_2g\cos\beta - m_2a = T. \end{align} \] and finally substitute the expression for $T$ into the top equation to obtain \[ m_1g\sin\alpha - \mu_k m_1g\cos\alpha + ( m_2g\sin\beta - \mu_k m_2g\cos\beta - m_2a) = m_1a, \] which can be rewritten as \[ m_1g\sin\alpha - \mu_k m_1g\cos\alpha + m_2g\sin\beta - \mu_k m_2g\cos\beta = (m_1 + m_2)a. \] Since we know the values of $m_1$, $m_2$, $\mu_k$, $\alpha$ and $\beta$, we can calculate all the quantities on the left-hand side and solve for $a$.

Other types of problems

All the examples shown asked you to find the acceleration, but sometimes you might be told the acceleration and asked to solve for some other unknown in the equations. Regardless of what you have to solve for, you should always start with the diagram and the sum-of-the-forces template. Once you have these equations in front of you, you will be able to reason about the problem more easily.

Experiment

Suspend an object of known mass (say a 100g chocolate bar) on the spring taken out from a retractable pen. Use a ruler to measure by how much the spring stretches in the process. What is the spring constant $k$?

Discussion

In previous sections we discussed the kinematics problem of finding the position of an object $x(t)$ given the knowledge of its acceleration function $a(t)$ and the initial conditions $x_i$ and $v_i$. In this section we studied the dynamics problem, which involved drawing force diagrams and calculating the net force on the object. Understanding these topics means that you fully understand Newton's equation $F=ma$ which is perhaps the most important equation in this book.

We can summarize the entire procedure for predicting the position of an object $x(t)$ from first principles in the following equation: \[ \frac{1}{m} \underbrace{ \left( \sum \vec{F} = \vec{F}_{net} \right) }_{\text{dynamics}} = \underbrace{ a(t) \ \overset{v_i+ \int\!dt }{\longrightarrow} \ v(t) \ \overset{x_i+ \int\!dt }{\longrightarrow} \ x(t) }_{\text{kinematics}}. \] The left-hand side calculates the net force, which is the cause of acceleration. The right-hand side indicates how we can calculate the equation of motion $x(t)$ from the knowledge of the acceleration and the initial conditions. This means that if you know the forces acting on any object (rocks, projectiles, cars, stars, planets, etc.) then we can predict its motion, which is kind of cool.

Energy

Instead of thinking about velocities $v(t)$ and motion trajectories $x(t)$, we can solve physics problems using energy calculations. In this section, we will define precisely the different kinds of energies that exist and then learn the rules of converting one energy into another. The key idea in this section is the principle of total energy conservation, which tells us that, in any physical process, the sum of the initial energies is equal to the sum of the final energies.

Example

Say you drop a ball from a height $h$[m] and you want to predict its speed right before it hits the ground. Using the kinematics approach, you would go for the general equation of motion: \[ v_f^2 = v_i^2 + 2a(y_f-y_i), \] and substitute $y_i=h$, $y_f=0$, $v_i=0$ and $a=-g$ to obtain the answer $v_f = \sqrt{2gh}$ for the final velocity at impact.

Alternately, you could use an energy calculation. Initially the ball starts from a height $h$, which means it has $U_i=mgh$[J] of potential energy. As the ball falls, the potential energy is converted into kinetic energy. Right before the ball hits the ground, it will have a final kinetic energy equal to the initial potential enegy: $K_f=U_i$ [J]. Since the formula for kinetic energy is $K=\frac{1}{2}mv^2$, we have $\frac{1}{2}mv_f^2 = mgh$. After cancelling the mass on both sides of the equation and solving for $v_f$ we obtain $v_f=\sqrt{2gh}$.

Both methods of solving the example problem come to the same conclusion, but the energy reasoning is arguably more intuitive than plugging values into a formula. In science, it is really important to know different ways for arriving at some answer. Knowing about these alternate routes will allow you to check your answers and to understand concepts better.

Concepts

Energy is measured in Joules [J] and it arises in several different contexts:

  • $K =$ kinetic energy.

This is the type of energy that objects have by virtue of their motion.

  • $W$ = work.

This is the amount of energy that an external

  force adds or subtracts from a system.
  Positive work corresponds to energy being added to the system while
  negative work corresponds to energy being withdrawn from the system.
* $U_g=$ **gravitational potential energy**.  
  This is the energy that an object has by virtue of its position above the ground.
  We say this energy is //potential// because it is a form of //stored work//.
  The potential energy corresponds to the amount of work that the force of
  gravity will add to an object when you let the object fall to the ground.
* $U_s= $ **spring potential energy.**
  This is the energy stored in a spring when it is displaced from 
  its relaxed position.
* There are many kinds other kinds of energy: electrical energy, 
  magnetic energy, sound energy, thermal energy, etc.
  In this section, however, we limit out focus only on the //mechanical// 
  energy concepts described above.

Formulas

Kinetic energy

An object of mass $m$ moving at velocity $\vec{v}$ has a kinetic energy of \[ K=\frac{1}{2}m\|\vec{v}\|^2 \qquad \text{[J]}. \] Note that the kinetic energy only depends on the speed $\|\vec{v}\|$ of the object and not the direction of motion.

Work

If an external force $\vec{F}$ acts on a object as it moves through a distance $\vec{d}$, the work done by this force is \[ W=\vec{F}\cdot \vec{d} = \|\vec{F}\| \|\vec{d}\|\cos \theta \qquad \text{[J]}, \] where the second equality follows from the geometrical interpretation of the dot product: $\vec{u}\cdot \vec{v} = \|\vec{u}\| \|\vec{v}\|\cos \theta$, with $\theta$ is the angle between $\vec{u}$ and $\vec{v}$.

If the force $\vec{F}$ acts in the same direction as the displacement $\vec{d}$, then it will do positive work ($\cos(180^\circ)=+1$)—the force will be adding energy to the system. If the force acts in the direction opposite to the displacement, then the work done will be negative ($\cos(180^\circ)=-1$), which means that energy is being withdrawn from the system.

Gravitational potential energy

An object raised to a height $h$ above the ground has a gravitational potential energy given by: \[ U_g(h) = mgh \qquad \text{[J]}, \] where $m$ is the mass of the object and $g=9.81$[m/s$^2$] is the gravitational acceleration on the surface of Earth.

Spring potential energy

The potential energy stored in a spring when it is displaced by $\vec{x}$[m] from its relaxed position is given by \[ U_{s} = \frac{1}{2}k\|\vec{x}\|^2 \qquad \text{[J]}, \] where $k$[N/m] is the spring constant.

Note that it doesn't matter whether the spring is stretched or compressed by a certain length: only the magnitude of the displacement matters $\|\vec{x}\|$.

Conservation of energy

Consider a system which starts from an initial state (i), undergoes some motion and arrives at a final state (f). The law of conservation of energy states that energy cannot be created or destroyed in any physical process. This means that the initial energy of the system plus the work that was input into the system must equal the final energy of the system plus any work that the was output: \[ \sum E_{i} \ \ + W_{in} \ \ \ = \ \ \ \sum E_{f} \ \ + W_{out}. \] The expression $\sum E_{(a)}$ corresponds to the sum of the different types of energy the system has in state (a). If we write down the equation in full we have: \[ K_i + U_{gi} + U_{si} \ \ \ + W_{in} \ \ \ = \ \ \ K_f + U_{gf} + U_{sf} \ \ \ + W_{out}. \] Usually, some of the terms in the above expression can be dropped. For example, we do not need to consider the spring potential energy $U_s$ in physics problems that do not involve springs.

Explanations

Work and energy are measured in Joules [J]. Joules can be expressed in terms of the fundamental units as follows: \[ [\text{J}] = [\text{N}\:\text{m}] = [\text{kg}\:\text{m}^2/\text{s}^{2}]. \] The first equality follows from the definition of work as force times displacement. The second equality comes from definition of the Newton [N]$=[\text{kg}\:\text{m}/\text{s}^2]$ via $F=ma$.

Kinetic energy

A moving object has energy $K=\frac{1}{2}m\|\vec{v}\|^2$[J], which we call kinetic energy from the Greek word for motion kinema.

Note that velocity $\vec{v}$ and speed $\|\vec{v}\|$ are not the same as energy. Suppose you have two objects of the same mass and one is moving twice faster than the other. The faster object will have twice the velocity, but four times more kinetic energy.

Work

When hiring someone to help you move, you have to pay them for the work they do. Work is the product of how much force is necessary for the move and the distance of the move. The more force, the more work there will be for a fixed displacement. The more displacement (think moving to the South Shore versus moving next door) the more money the movers will ask for.

The amount of work done by a force $\vec{F}$ on an object which moves along some path $p$ is given by: \[ W = \int_p \vec{F}(x) \cdot d\vec{x}, \] where we account for the fact that the magnitude and direction of the force might change throughout the motion.

If the force is constant and the displacement path is a straight line, the formula for work simplifies to: \[ W = \int_0^d \vec{F}\cdot d\vec{x} = \vec{F}\cdot\int_0^d d\vec{x} = \vec{F}\cdot \vec{d} = \|\vec{F}\|\|\vec{d}\|\cos\theta. \] Note the use of the dot product to obtain only the part of $\vec{F}$ that is pushing in the direction of the displacement $\vec{d}$. A force which acts perpendicular to the displacement produces no work, since it neither speeds up or slows down the motion.

Potential energy is stored work

Some kinds of work are just a waste of your time, like working in a bank for example. You work and you get your paycheque, but nothing remains with you at the end of the day. Other kinds of work leave you with some resource at the end of the work day. Maybe you learn something, or you network with a lot of good people.

In physics, we make a similar distinction. Some types of work, like work against friction, are called dissipative since they just waste energy. Other kinds of work are called conservative since the work you do is not lost: it is converted into potential energy.

The gravitational force and the spring force are conservative forces. Any work you do while lifting an object up into the air against the force of gravity is not lost but stored in the height of the object. You can get all the work/energy back if you let go of the object. The energy will come back in the form of kinetic energy since the object will pick up speed during the fall.

The negative of the work done against a conservative force is called potential energy. For any conservative force $\vec{F}_?$, we can define the associated potential energy $U_?$ through the formula: \[ U_?(d) = -W_{done} = - \int_0^d \vec{F}_? \cdot d\vec{x}. \] We will discuss two specific examples of this general formula below: the gravitational and spring potential energies. Being high in the air means you have a lot of potential to fall, and compressing a spring by a certain distance means it has the potential to spring back to its normal position. Let us look now at the exact formulas for these two cases.

Gravitational potential energy

The force of gravity is given by: \[ \vec{F}_g = -mg \hat{\jmath}. \] The direction of the gravitational force is downwards, towards the centre of the Earth.

The gravitational potential energy of lifting an object from a height of $y=0$ to a height of $y=h$ is given by: \[ \begin{align*} U_g(h) &\equiv - W_{done} \nl &= - \!\int_0^h \! \vec{F}_g \cdot d\vec{y} = - \!\int_0^h \!\!(-mg \hat{\jmath})\cdot \hat{\jmath} \; dy = mg \!\int_0^h \!\!\! 1\:dy = mg y\big\vert_{y=0}^{y=h} = mgh. \end{align*} \]

Spring energy

The force of a spring when stretched a distance $\vec{x}$[m] form its natural position is given by: \[ \vec{F}_s(\vec{x}) = - k\vec{x}. \]

The potential energy stored in a spring as it is compressed from $y=0$ to $y=x$[m] is given by: \[ \begin{align*} U_s(x) &= -W_{done} \nl &=-\!\int_0^x \!\vec{F}_{s}(y) \cdot d\vec{y} = \int_0^x \!\! ky dy = k\int_0^x \!\! y dy = k\frac{1}{2}y^2\big\vert_{y=0}^{y=x} = \frac{1}{2}kx^2. \end{align*} \]

Conservation of energy

Energy cannot be created or destroyed. It can only be transforms from one form to another. If there are no external forces acting on the system, then we have conservation of energy: \[ \sum E_i \ \ = \ \ \sum E_f. \]

If there are external forces like friction that do work on the system, we must take their energy contributions into account as well: \[ \sum E_i \ +\ W_{in} = \sum E_f, \quad \text{or} \quad \sum E_i = \sum E_f \ +\ W_{out}. \]

This is one of the most important equations you will find in this book, because it will allow you to solve very complicated problems simply by accounting for all the different kinds of energy involved in the problem.

Examples

Banker dropped

An investment banker is dropped (from rest) from a 100[m] tall building. What is his speed when he hits the ground?

We start from: \[ \begin{align*} \sum E_i \ \ &= \ \ \sum E_f, \nl K_i + U_i \ \ & = \ \ K_f + U_f, \end{align*} \] and plugging in the numbers we get: \[ 0 + m \times9.81 \times100 = \frac{1}{2}mv^2 + 0. \] After cancelling the mass $m$ from both sides of the equation we are left with \[ 9.81\times 100 = \frac{1}{2}v_f^2. \] Solving for $v_f$ in the above equation, we find that the banker will be going at $v_f =\sqrt{ 2\times 9.81\times 100}=44.2945$[m/s] when he hits the ground. This is like $160$[km/h]. Ouch! That will definitely hurt.

Bullet speedometer

A suspended block serving to measure the speed of a bullet. An incoming bullet at speed $v$ hits a mass $M$ suspended on two strings. Use conservation of momentum and conservation of energy principles to find the speed $v$ of the bullet if the block rises to a height $h$ after it is hit by the bullet.

First we use the conservation of momentum principle to find the (horizontal) speed of the block and mass right after the bullet hits: \[ \vec{p}_{in,m} + \vec{p}_{in,M} = \vec{p}_{out}, \] \[ m v + 0 = (m+M) v_{out}, \] so the velocity of the block with the bullet embedded in it is $v_{out}= \frac{mv}{M+m}$ right after collision.

Next we use the conservation of energy principle to relate the initial kinetic energy of the block-plus-bullet and the height $h$ by which it rises: \[ K_i + U_i = K_f + U_f, \] \[ \frac{1}{2}(M+m)v_{out}^2 + 0 = 0 + (m+M)gh. \] Isolating $v_{out}$ in the above equation and setting it equal to the $v_{out}$ we got from the momentum calculation we get: \[ v_{out} = \frac{mv}{M+m} = \sqrt{2gh} = v_{out}. \] We can use this equation to find the speed of the incoming bullet: \[ v = \frac{M+m}{m}\sqrt{2gh}. \]

Incline and spring

A block of mass $m$ is released from rest at point (A) on the top of an incline at a coordinate $y=y_i$. It slides down the frictionless incline to the point (B) $y=0$. The coordinate $y=0$ corresponds to the relaxed length of a spring of spring constant $k$. The block then compresses the spring all the way to point (C ), corresponding to $y=y_f$, when the block comes to rest again. The angle of the slope is $\theta$.

What is the speed of the block at $y=0$? How far does the spring get compressed $y_f$? Bonus points if you can express your answer for $y_f$ in terms of $\Delta h$, the difference in height between $y_i$ and $y_f$.

We have essentially two problems: the motion from (A) to (B) in which the gravitational potential energy of the block is converted into kinetic energy and the motion from (B) to (C ) in which the all the energy of the block gets converted into spring potential energy.

In both cases, there is no friction so we can use the conservation of energy formula: \[ \sum E_i \ \ = \ \ \sum E_f. \]

For the motion from (A) to (B) we have: \[ K_i + U_i = K_f + U_f. \] The block starts from rest so $K_i=0$. The difference in potential energy is equal to $mgh$ and in this case the block is $|y_i|\sin\theta$ [m] higher at (A) than it is at (B), so we can write: \[ 0 + mg|y_i|\sin\theta = \frac{1}{2}mv_B^2 + 0. \] The above formula uses the point (B) at $y=0$ as reference for the gravitational potential energy. The potential at point (A) is $U_i=mgh=mg|y_i-0|\sin\theta$ relative to point (B) since the point (A) is $h=|y_i-0|\sin\theta$ metres higher than the point (B).

Solving for $v_B$ in this equation gives us the answer to the first part of the question: \[ v_{B} = \sqrt{ 2 g|y_i|\sin\theta }. \]

Now for the second part of the motion. The law of conservation of energy dictates that: \[ K_i + U_{gi} + U_{si} = K_f + U_{gf} + U_{sf}, \] where now $i$ refers to the moment (B) and $f$ refers to the moment (C ). Initially the spring is uncompressed so $U_{si}=0$, and by the end of the motion the spring is compressed by a total of $\Delta y=|y_f-0|$[m], so its spring potential energy is $U_{sf}=\frac{1}{2}k|y_f|^2$. We choose the height of (C ) as the reference potential energy and thus $U_{gf}=0$. Since the difference in gravitational potential energy is $U_{gi} - U_{gf}=mgh=|y_f-0|\sin\theta$, we can fill-in the entire energy equation: \[ \frac{1}{2}m v_B^2 + mg|y_f|\sin\theta + 0 = 0 + 0 + \frac{1}{2}k|y_f|^2. \] Since $k$ and $m$ are given and we know $v_B$ from the first part of the question, we can solve for $|y_f|$ (a quadratic equation).

To obtain the answer $|y_f|$ in terms of $\Delta h$ we can use $\sum E_i = \sum E_f$ again, but this time $i$ will refer moment (A) and $f$ refers to the moment (C ). The energy equation becomes $mg\Delta h = \frac{1}{2}k|y_f|$ from which we obtain $|y_f|=\frac{ 2 mg\Delta h}{k}$.

Energy lost to friction

You have a block of mass 50[kg] on an incline. The force of friction between the block and the incline is 30N. The block slides for 200[m] down the incline. The incline is at a slope $\theta=30^\circ$ so the total vertical displacement of the block is $200\sin30=100$[m]. What is its speed as it reaches the bottom of the incline?

This is a problem in which initial energies are converted into final energies and some lost work: \[ \sum E_i = \sum E_f + W_{lost}. \] The term $W_{lost}$ represents the energy lost due to the friction.

Another (better) way of describing the situation is that the block had a negative amount of word done on it \[ \sum E_i + \underbrace{W_{done}}_{ \textrm{negative} } = \sum E_f. \] The quantity $W_{done}$ is negative because during the entire motion the friction force on the object was acting in the opposite direction to the motion: \[ W_{done} = \vec{F}\cdot \vec{d} = \|\vec{F}_f\|\|\vec{d}\|\cos(180^\circ) = - F_f\|\vec{d}\|, \] where $\vec{d}$ is the $200$[m] of sliding distance during which the friction acts. Since we are told that $F_f = 30$[N], we can calculate $W_{done} = W_{friction} = -30[\text{N}]\times 200[\text{m}] = -6000$[J].

We can now substitute this value into the conservation of energy equation: \[ \begin{align*} K_i + U_i + W_{done} &= K_f + U_f, \nl 0 + mgh + (-F_f|d|) &= \frac{1}{2}mv_f^2 + 0, \end{align*} \] where we have used the formula $mgh= U_i- U_f$ for the difference in gravitational potential energy. Substituting all the values we know we get \[ 0 + 50 \times 9.81 \times 100 - 6000 = \frac{1}{2}(50)v_f^2 + 0, \] which can be solved for $v_f$.

Discussion

In this section we saw that describing physical situation in terms of the energies involved is a useful way of thinking. The law of conservation of energy allows us to do simple “energy accounting” and calculate the values of unknown quantities.

Momentum

During a collision between two objects there will be a sudden spike in the contact force between them, which can be difficult to measure and quantify. It is therefore not possible to use Newton's law $F=ma$ to predict the accelerations that occur during collisions. In order to predict the motion of the objects after the collision we must use a momentum calculation. The law of conservation of momentum states that the total amount of momentum before and after the collision is the same. Thus, if we know the momenta of the objects before the collision, it will be possible to calculate their momenta after the collision and from this figure out their subsequent motion.

To illustrate why the notion of momentum is important, consider the following situation. Say you have a 1[g] piece of paper and a 1000[kg] car moving at the same speed 100[km/h]. Which of the two objects would you rather get hit by? Momentum, denoted $\vec{p}$, is the precise physical concept which measures the “amount of moving stuff”. An object of mass $m$ moving with velocity $\vec{v}$ has momentum $\vec{p}\equiv m\vec{v}$. Momentum plays a key role in collisions, so your gut feeling about the piece of paper and the car is correct. The car weights $1000\times1000=10^{6}$ times more than the piece of paper, so it has $10^6$ times more momentum when moving at the same speed. A collision with the car will “hurt” a million times more than the collision with the piece of paper even though they were moving at the same speed.

In this section we will learn how to use the law of conservation of momentum to predict the outcomes of collisions.

Concepts

  • $m$: the mass of the moving object.
  • $\vec{v}$: the velocity of the moving object.
  • $\vec{p}=m\vec{v}$: the momentum of the moving object.
  • $\sum \vec{p}_{in}$: the sum of the momenta of particles before a collision.
  • $\sum \vec{p}_{out}$: the sum of the momenta after the collision.

Definition

The momentum of a moving object is equal to the velocity of the moving object multiplied by the object's mass: \[ \vec{p} = m\vec{v} \qquad [\text{kg}\:\text{m}/\text{s}]. \] If the velocity of the object is $\vec{v}=20\hat{\imath}=(20,0)$[m/s] and it has a mass of 100[kg] then its momentum is $\vec{p}=2000\hat{\imath}=(2000,0)$[kg$\:$m/s].

Momentum is a vector quantity, so we will often have to convert momentuma from the length-and-direction form to the components form: \[ \vec{p}= \|\vec{p}\| \angle \theta = (\|\vec{p}\|\cos\theta, \|\vec{p}\|\sin\theta) = (p_x, p_y). \] The component form makes it easy to add and subtract vectors: $\vec{p}_1 + \vec{p}_2 = (p_{1x}+p_{2x},p_{1y}+p_{2y})$. To express the final answer, we will have to convert from the component form back to the length-and-direction form using: \[ \|\vec{p}\| = \sqrt{ p_x^2 + p_y^2 }, \qquad \theta = \tan^{-1}\!\left( \frac{ p_{y} }{ p_{x} } \right). \]

Conservation of momentum

Newton's first law states that in the absence of acceleration ($\vec{a}=0$), an object will maintain a constant velocity. This is kind of obvious if you know Calculus, since $\vec{a}$ is the derivative of $\vec{v}$. For example, if an object is stationary and there are no forces on it to cause it to accelerate, then it will remain stationary. If an object is moving with velocity $\vec{v}$ and there is no acceleration (or deceleration), then it will keep moving with velocity $\vec{v}$ forever. In the absence of acceleration, objects will conserve their velocity: \[ \vec{v}_{in}= \vec{v}_{out}. \] This is equivalent to saying that objects conserve their momentum (just multiply the velocity by the constant mass of the object).

More generally, if you have a situation involving multiple moving objects, you can say that the “overall momentum”, i.e., the sum of the momenta of all the interacting particles stays constant. This reasoning is particularly useful when analyzing collisions since it allows us to connect the sum of the momenta before the collision and after the collision: \[ \sum \vec{p}_{in} = \sum \vec{p}_{out}. \] Whatever momentum comes into a collision must come out. This equation is known as the law of conservation of momentum.

This conservation law is one of the furthest reaching laws of physics you will learn in Mechanics. We learned about the conservation of momentum in a simple context of two colliding particles, but the law applies much more generally: for multiple particles, for fluids, for fields, and even for collisions involving atomic particles described by quantum mechanics. The quantity of motion (momentum) cannot be created or destroyed, it can only be exchanged between systems.

Examples

Example 1

You throw a piece of rolled up carton of mass $0.4$[g] from your balcony on a rainy day. You throw it horizontally with a speed of 10[m/s]. Shortly after it leaves your hand it collides with a rain drop of weight $2$[g] falling straight down at a speed of $30$[m/s]. What will be resulting velocity if the two objects stick together after the collision?

The conservation of momentum equation says that: \[ \vec{p}_{in,1} + \vec{p}_{in,2} = \vec{p}_{out}. \] Plugging in the values we get \[ 0.4\times (10,0) \ \ + \ \ 2\times (0,-30) \ \ = \ \ 2.4 \times \vec{v}_{out}, \] or solving for $\vec{v}_{out}$ we find: \[ \vec{v}_{out} = \ \frac{ 0.4(10,0) - 2 (0,30)} {2.4} = (1.666, - 25.0) = 1.666\hat{\imath} - 25.0\hat{\jmath}. \]

Example 2: Hipsters on bikes

Two hipsters on single-speed bicycles are headed towards the same intersection. Say they are both speeding down Parc street at 50[km/h] and the first hipster is crossing the street at a diagonal of 30 degrees when they collide. I mean you saw this coming right? Well the second hipster didn't, because he was busy turning the pedals as fast as he can.

Hipster 1 trying to cross the street gets hit by Hipster 2 coming down the street. Let us assume that the combined weight of the straight-going hipster and his bike is 100[kg], whereas the street-crossing-at-30-degrees hipster has a lighter, more expensive bicycle frame. We put his weight at 90[kg].

(I am going to continue with the story, but I want to point out that we have been given, the following information so far: \[ \begin{align*} \vec{p}_{in,1} &= 90\times50 \angle 30=90(50\cos30,50\sin30), \nl \vec{p}_{in,2} &= 100\times50 \angle 0=(5000,0), \end{align*} \] where the $x$ coordinate points down Park street, and the $y$ coordinate is perpendicular to the street.)

Surprisingly, nobody gets hurt in this collision. They bump shoulder-to-shoulder and the one that was trying to cross the street gets redirected straight down the street, while the one going straight down gets deflected to the side and right onto the bike path. I know what you are thinking: couldn't they get hurt at least a little bit? OK, let's say that the whiplash from their shoulder-to-shoulder collision sends their heads flying towards each other and their glasses get smashed. There you have it.

Suppose the velocity of the first hipster after the collision is 60 [km/h], what is the velocity and the deflected direction of the second hipster? (I have just told you that the outgoing momentum of the first hipster is $\vec{p}_{out,1}=(90\times60,0)$, and asked you to find $\vec{p}_{out,2}$.)

We can solve this problem using the conservation of momentum formula, which tells us that: \[ \vec{p}_{in,1} + \vec{p}_{in,2} = \vec{p}_{out,1} + \vec{p}_{out,2}. \] We know three of the above quantities so we can solve for the one (vector) unknown by isolating it on one side of the equation: \[ \vec{p}_{out,2} = \vec{p}_{in,1} + \vec{p}_{in,2} - \vec{p}_{out,1}, \] \[ \vec{p}_{out,2} = 90(50\cos30,50\sin30)\ +\ (5000,0)\ - \ (90\times60,0). \] The $x$ component of the momentum $\vec{p}_{out,2}$ is: \[ p_{out,2,x} = 90\times50\cos30 + 5000 - 90\times 60 = 3497.11, \] and the $y$ component is $p_{out,2,y} = 90\times 50\sin30 = 2250$.

The magnitude of the momentum of hipster 2 is given by: \[ \|\vec{p}_{out,2}\| = \sqrt{ p_{out,2,x}^2 + p_{out,2,y}^2 } = 4158.39, \quad \textrm{[kgkm/h]}. \] Note the units we use for the momentum is not the standard choice [kgm/s]. That is fine. So long as you keep in mind which units you are using, you don't have to always convert to SI units.

The final velocity of hipster two is $v_{out,2} = 4158.39/100= 41.58$[km/h]. The deflection angle is obtained by \[ \phi_{def} = \tan^{-1}\!\!\left( \frac{ p_{out,2,y} }{ p_{out,2,x} } \right)= 32.76^\circ. \]

Discussion

We defined the concept of momentum in terms of the velocity of the object, but in fact, momentum is a more fundamental concept than velocity. If you go on to take more advanced physics classes, you will learn that the natural variables to describe the state of a particle are their positions and momenta $(\vec{x}, \vec{p})$. You will also learn that the real form of Newton's second law is written in terms of the momentum: \[ \vec{F} = \frac{d \vec{p} }{dt} \quad \text{for } m \text{ constant } \Rightarrow \quad \vec{F}=\frac{d (m\vec{v}) }{dt}=m\frac{d \vec{v} }{dt} =m\vec{a}. \] In most physics problems the mass of objects will stay constant so using $\vec{F}=m\vec{a}$ is perfectly fine.

The law of conservation of momentum follows from Newton's third law: for each force $\vec{F}_{12}$ exerted by Object 1 on Object 2, there exists a counter force $\vec{F}_{21}$ of equal magnitude and opposite direction, which is the force of Object 2 pushing back on Object 1. Earlier I said that it is difficult to quantify the magnitude of the exact forces $\vec{F}_{12}$ and $\vec{F}_{21}$ that occur during a collision. Indeed, the amount of force suddenly shoots up as the two objects collide and then suddenly drops. Complicated as these forces may be, we know that during the entire collision they obey Newton's third law. Assuming there are no other forces acting on the objects we have: \[ \vec{F}_{12} = -\vec{F}_{21} \quad \text{using the above} \Rightarrow \quad \frac{d \vec{p}_1 }{dt} = -\frac{d \vec{p}_2 }{dt}. \] If now move both terms to the left-hand side we obtain the equation: \[ \frac{d \vec{p}_1 }{dt} + \frac{d \vec{p}_2 }{dt} = \frac{d}{dt}\left( \vec{p}_1 + \vec{p}_2 \right) = 0, \] which implies that quantity $\vec{p}_1 + \vec{p}_2$ is constant over time.

In this section we saw how to use a momentum calculation to predict the motion of the particles after a collision. In the next section, we will learn about the concept of energy which is another useful concept for understanding and predicting the motion of objects.

Links

[ Animations of simple collisions between objects. ]
http://en.wikipedia.org/wiki/Conservation_of_linear_momentum

Simple harmonic motion

Vibrations and oscillations are all around us. White light is made up of many oscillations of the electromagnetic field at different frequencies (colors). Sounds are made up of a combination of many air vibrations with different frequencies and strengths. In this section we will learn about simple harmonic motion, which describes the oscillation of a mechanical system at a fixed frequency and with a constant amplitude. By studying oscillations in their simplest form, you will pick up important intuition which you can apply to all other types of oscillations.

Mass attached to a spring. The canonical example of simple harmonic motion is the motion of a mass-spring system illustrated in the figure on the right. The block is free to slide along the horizontal frictionless surface. If the system is disturbed from its equilibrium position, it will start to oscillate back and forth at a certain natural frequency, which depends on the mass of the block and the stiffness of the spring.

In this section we will focus our attention on two mechanical systems: the mass-spring system and the simple pendulum. We will follow the usual approach and describe the positions, velocities, accelerations and energies associated with this type of motion. The notion of simple harmonic motion (SHM) is far more important than just these two systems. The equations and intuition developed for the analysis of the oscillation of these simple mechanical systems can be applied much more generally to sound oscillations, electric current oscillations and even quantum oscillations. Pay attention, that is all I am saying.

Concepts

  • $A$: The amplitude of the movement, how far the object goes back and forth relative to the centre position.
  • $x(t)$[m], $v(t)$[m/s], $a(t)$[m/s$^2$]: The position, velocity and acceleration of the object as functions of time.
  • $T$[s]: The period of the motion, i.e., how long it takes for the motion to repeat.
  • $f$[Hz]: The frequency of the motion.
  • $\omega$[rad/s]: The angular frequency of the simple harmonic motion.
  • $\phi$[rad]: The phase constant. The Greek letter $\phi$ is pronounced “phee”.

Simple harmonic motion

A mass-spring system disturbed from the equilibrium position will oscillate. The position function is described by the cosine function. The figure on the right illustrates a mass-spring system undergoing simple harmonic motion. Observe that the position of the mass as a function of time behaves like the cosine function. From the diagram, we can also identify two important parameters of the motion: the amplitude $A$, which describes the maximum displacement of the mass from the centre position, and the period $T$, which describes how long it takes for the mass to come back to its initial position.

The equation which describes the position of the object as a function of time is the following: \[ x(t)=A\cos(\omega t + \phi). \] The constant $\omega$ (omega) is called the angular frequency of the motion. It is related to the period $T$ by the equation $\omega = \frac{2\pi}{T}$. The additive constant $\phi$ (phee) is called the phase constant or phase shift and its value depends on the initial condition for the motion $x_i\equiv x(0)$.

I don't want you to be scared by the formula for simple harmonic motion. I know there are a lot of Greek letters that appear in it, but it is actually pretty simple. In order to understand the purpose of the three parameters $A$, $\omega$ and $\phi$, we will do a brief review of the properties of the $\cos$ function.

Review of sin and cos functions

A plot of the unscaled and unshifted sin and cos functions. The functions $f(t)=\sin(t)$ and $f(t)=\cos(t)$ are periodic functions which oscillate between $-1$ and $1$ with a period of $2\pi$. Previously we used the functions $\cos$ and $\sin$ in order to find the horizontal and vertical components of vectors, and called the input variable $\theta$ (theta). However, in this section the input variable is the time $t$ measured in seconds. Look carefully at the plot of the function $\cos(t)$. As $t$ goes from $t=0$ to $t=2\pi$, the function $\cos(t)$ completes one full cycle. The period of $\cos(t)$ is $T=2\pi$ because this is how long it takes (in radians) for a point to go around the unit circle.

Time-scaling

To describe periodic motion with a different period, we can still use the $\cos$ function but we must add a multiplier in front of the variable $t$ inside the $\cos$ function. This multiplier is called the angular frequency and is usually denoted $\omega$ (omega). The input-scaled $\cos$ function: \[ f(t) = \cos(\omega t ), \] has a period of $T=\frac{2\pi}{\omega}$.

If you want to have a periodic function with period $T$, you should use the multiplier constant $\omega = \frac{2\pi}{T}$ inside the $\cos$ function. When you vary $t$ from $0$ to $T$, the function $\cos(\omega t )$ will go through one cycle because the quantity $\omega t$ goes from $0$ to $2\pi$. You shouldn't just take my word for this: try this for yourself by building a cos function with a period of 3 units.

The frequency of periodic motion describes how many times per second the motion repeats. The frequency is equal to the inverse of the period: \[ f=\frac{1}{T}=\frac{\omega}{2\pi} \text{ [Hz].} \] The relation between $f$ (frequency) and $\omega$ (angular frequency) is a factor of $2\pi$. This multiplier is needed since the natural cycle length of the $\cos$ function is $2\pi$ radians.

Output-scaling

If we want to have oscillations that go between $-A$ and $+A$ instead of between $-1$ and $+1$, we can multiply the $\cos$ function by the appropriate amplitude: \[ f(t)=A\cos(\omega t). \] The above function has period $T=\frac{2\pi}{\omega}$ and oscillates between $-A$ and $A$ on the $y$ axis.

Time-shifting

The function $A\cos(\omega t)$ starts from its maximum value at $t=0$. In the case of the mass-spring system, this corresponds to the case when the motion begins with the spring maximally stretched $x_i\equiv x(0)=A$.

In order to describe other starting positions for the motion, it may be necessary to introduce a phase shift inside the $\cos$ function: \[ f(t)=A\cos(\omega t + \phi). \] The constant $\phi$ must be chosen so that at $t=0$, the function $f(t)$ correctly describes the initial position of the system.

For example, if the harmonic motion starts from the centre $x_i \equiv x(0)=0$ and is initially going in the positive direction, then the equation of motion is described by the function $A\sin(\omega t)$. However, since $\sin(\theta)=\cos(\theta - \frac{\pi}{2})$ we can equally well describe the motion in terms of a shifted $\cos$ function: \[ x(t) = A\cos\!\left(\omega t - \frac{\pi}{2}\right) = A\sin(\omega t). \] Note that the function $x(t)$ correctly describes the initial position: $x(0)=0$.

By now, the meaning of all the parameters in the simple harmonic motion equation should be clear to you. The constant in front of the $\cos$ tells us the amplitude $A$ of the motion, the multiplicative constant $\omega$ inside the $\cos$ is related to the period/frequency of the motion $\omega = \frac{2\pi}{T} = 2\pi f$. Finally, the additive constant $\phi$ is chosen depending on the initial conditions.

Mass and spring

OK, enough math. It is time to learn about the first physical system which exhibits simple harmonic motion: the mass-spring system.

An object of mass $m$ is attached to a spring with spring constant $k$. If disturbed from rest, this mass-spring system will undergo simple harmonic motion with angular frequency: \[ \omega = \sqrt{ \frac{k}{m} }. \] A stiff spring attached to a small mass will result in very rapid oscillations. A weak spring or a large mass will result in slow oscillations.

A typical exam question will tell you $k$ and $m$ and ask about the period $T$. If you remember the definition of $T$, you can easily calculate the answer: \[ T = \frac{2\pi}{\omega} = 2\pi \sqrt{ \frac{m}{k} }. \]

Equations of motion

The general equations of motion for the mass-spring system are as follows: \[ \begin{align} x(t) &= A\cos(\omega t + \phi), \nl v(t) &= -A\omega \sin(\omega t + \phi), \nl a(t) &= -A\omega^2\cos(\omega t + \phi). \end{align} \]

The general shape of the function $x(t)$ is $\cos$-like. The angular frequency $\omega$ parameter is governed by the physical properties of the system. The parameters $A$ and $\phi$ describe the specifics of the motion, namely, the size of the oscillation and where it starts from.

The function $v(t)$ is obtained, as usual, by taking the derivative of $x(t)$. The function $a(t)$ is obtained by taking the derivative of $v(t)$, which corresponds to the second derivative of $x(t)$.

Motion parameters

The velocity and the acceleration of the object are also periodic functions.

We can find the maximum values of the velocity and the acceleration by reading off the coefficient in front of the $\sin$ and $\cos$ in the functions $v(t)$ and $a(t)$.

  1. The maximum velocity of the object is

\[ v_{max} = A \omega. \]

  1. The maximum acceleration is

\[ a_{max} = A \omega^2. \] The velocity is maximum as the object passes through the centre, while the acceleration is maximum when the spring is maximally stretched (compressed).

You will often be asked to solve for the quantities $v_{max}$ and $a_{max}$ in exercises and exams. This is an easy task if you remember the above formulas and you know the values of the amplitude $A$ and the angular frequency $\omega$.

Energy

The potential energy stored in a spring which is stretched (compressed) by a length $x$ is given by the formula $U_s=\frac{1}{2}k x^2$. Since we know $x(t)$, we can obtain the potential energy of the mass-spring system as a function of time: \[ U_s(t)= \frac{1}{2} kx(t)^2 =\frac{1}{2}kA^2\cos^2(\omega t +\phi). \] The potential energy reaches its maximum value $U_{s,max}=\frac{1}{2}kA^2$ when the spring is fully stretched or fully compressed.

The kinetic energy of the mass as a function of time is given by: \[ K(t)= \frac{1}{2} mv(t)^2 = \frac {1}{2}m\omega^2A^2\sin^2(\omega t +\phi). \] The kinetic energy is maximum when the mass passes through the centre position. The maximum kinetic energy is given by $K_{max} = \frac{1}{2} mv_{max}^2= \frac{1}{2}mA^2\omega^2$.

Conservation of energy

The conservation of energy equation tells us that the total energy of the mass-spring system is conserved. The sum of the potential energy and the kinetic energy at any two instants $t_1$ and $t_2$ is the same: \[ U_{s1} + K_2 = U_{s2} + K_2. \]

It is also useful to calculate the total energy of the system $E_T = U_s(t) + K(t) = \text{const}$. This means that even if $U_s(t)$ and $K(t)$ change over time, the total energy of the system always remains constant.

We can use the identity $\cos^2\theta + \sin^2\theta =1$ to verify that the total energy is indeed a constant and that it is equal $U_{s,max}$ and $K_{max}$: \[ \begin{align} E_{T} &= U_s(t) + K(t) \nl &= \frac{1}{2}kA^2\cos^2(\omega t) + \frac {1}{2}m\omega^2A^2\sin^2(\omega t) \nl &= \frac{1}{2}m\omega^2A^2\cos^2(\omega t ) + \frac {1}{2}m\omega^2A^2\sin^2(\omega t ) \ \ \ (\text{since } k = m\omega^2 )\nl &= \frac{1}{2}m\underbrace{\omega^2A^2}_{v_{max}^2}\underbrace{\left[ \cos^2(\omega t) + \sin^2(\omega t)\right]}_{=1} = \frac{1}{2}mv_{max}^2 = K_{max} \nl & =\frac{1}{2}m(\omega A)^2 = \frac{1}{2}(m \omega^2) A^2 =\frac{1}{2}kA^2 = U_{s,max}. \end{align} \]

The best way to understand SHM is to visualize how the energy of the system shifts between the potential energy of the spring and the kinetic energy of the moving mass. When the spring is maximally stretched $x=\pm A$, the mass will have zero velocity and hence zero kinetic energy $K=0$. At this moment all the energy of the system is stored in the spring $E_T= U_{s,max}$. The other important moment is when the mass has zero displacement but maximal velocity $x=0, U_s=0, v=\pm A\omega, E_T=K_{max}$, which corresponds to all the energy being stored as kinetic energy.

Pendulum motion

We now turn our attention to another simple mechanical system whose motion is also described by the simple harmonic motion equations.

A pendulum consists of a mass suspended on a string which swings back and forth. Consider a mass suspended at the end of a long string of length $\ell$ in a gravitational field of strength $g$. If we start the pendulum from a certain angle $\theta_{max}$ away from the vertical position and then release it, the pendulum will swing back and forth undergoing simple harmonic motion.

The period of oscillation is given by the following formula: \[ T = 2\pi \sqrt{ \frac{\ell}{g} }. \] Note that the period does not depend on the amplitude of the oscillation (how far the pendulum swings) nor the mass of the pendulum. The only factor that plays a role is the length of the string $\ell$. The angular frequency for a pendulum of length $\ell$ is going to be: \[ \omega \equiv \frac{2\pi}{T} = \sqrt{ \frac{g}{\ell} }. \]

We describe the position of the pendulum in terms of the angle $\theta$ that it makes with the vertical. The equations of motion are described in terms of angular variables: the angular position $\theta$, the angular velocity $\omega_\theta$ and the angular acceleration $\alpha_\theta$: \[ \begin{align} \theta(t) &= \theta_{max} \: \cos\!\left( \sqrt{ \frac{g}{\ell} } t + \phi\right), \nl \omega_\theta(t) &= -\theta_{max}\sqrt{ \frac{g}{\ell} } \: \sin\!\left( \sqrt{ \frac{g}{\ell} } t + \phi\right), \nl \alpha_\theta(t) &= -\theta_{max}\frac{g}{\ell} \: \cos\!\left( \sqrt{ \frac{g}{\ell} } t + \phi\right). \end{align} \] The angle $\theta_{max}$ describes the maximum angle that the pendulum swings to. Note how we had to use a new variable name $\omega_\theta$ for the angular velocity of the pendulum $\omega_\theta(t)=\frac{d}{dt}\!\left(\theta(t)\right)$, so as not to confuse it with the constant $\omega=\sqrt{ \frac{g}{\ell} }$ inside the $\cos$ function, which describes angular frequency of the periodic motion.

Energy

The motion of the pendulum is best understood by imagining how the energy of the system shifts between the gravitational potential energy of the mass and its kinetic energy.

At the maximum angle, the gravitational potential energy of the pendulum will be mhg. The pendulum will have a maximum potential energy when it swings to the side by the angle $\theta_{max}$. At that angle, the vertical position of the mass will be increased by a height $h$ above the lowest point. We can calculate $h$ as follows: \[ h = \ell - \ell \cos \theta_{max}. \] Thus the maximum gravitational potential energy of the mass is therefore: \[ U_{g,max}= mgh= mg\ell(1-\cos\theta_{max}). \]

By the conservation of energy principle, the maximum kinetic energy of the pendulum must be equal to the maximum of the gravitational potential energy: \[ mg\ell(1-\cos\theta_{max}) = U_{g,max} = K_{max} = \frac{1}{2} mv_{max}^2, \] where $v_{max}=\ell \omega_\theta$ is the linear velocity of the mass as it swings through the centre.

Explanations

It is worthwhile to understand how the equations of simple harmonic motion come about. In this subsection, we will discuss how the equations are derived from Newton's second law $F=ma$.

Trigonometric derivatives

The slope (derivative) of the function $\sin(t)$ varies between $-1$ and $1$. The slope is largest when $\sin$ passes through the $x$ axis and the slope is zero when it reaches its maximum and minimum values. A careful examination of the graphs of the bare functions $\sin$ and $\cos$ reveals that the derivative of the function $\sin(t)$ is described by the function $\cos(t)$ and vice versa: \[ f(t) = \sin(t) \:\qquad \Rightarrow \qquad f'(t) = \cos(t), \] \[ f(t) = \cos(t) \qquad \Rightarrow \qquad f'(t) = -\sin(t). \] When you learn more about calculus you will know how to find the derivative of any function you want, but for now just take my word that the above two formulas are true.

The chain rule for derivatives tells us that the derivative of a composite function $f(g(x))$ is given by $f'(g(x))\cdot g'(x)$, i.e., you must take the derivative of the outer function and then multiply by the derivative of the inner function. We can use the chain rule to the find derivative of the simple harmonic motion position function: \[ x(t)=A\cos(\omega t +\phi) \ \ \Rightarrow \ \ v(t) \equiv x^{\prime}(t)=-A\sin(\omega t +\phi)(\omega) = -A\omega\sin(\omega t +\phi), \] where the outer function is $f(x)=A\cos(x)$ with derivative $f'(x)=-A\sin(x)$ and the inner function is $g(x)=\omega x +\phi$ with derivative $g'(x)=\omega$.

The same reasoning is used to obtain the second derivative: \[ a(t)\equiv \frac{d}{dt}\!\left\{ v(t) \right\} =-A\omega^2 \cos(\omega t +\phi) = -\omega^2 x(t). \] Note that $a(t)=x^{\prime\prime}(t)$ has the same form as $x(t)$, but always acts in the opposite direction.

I hope this clarifies for you how we obtained the functions $v(t)$ and $a(t)$: we simply took the derivative of the function $x(t)$.

Derivation of the mass-spring SHM equation

You may be wondering where the equation $x(t)=A\cos(\omega t + \phi)$ comes from. This formula looks very different from the kinematics equations for linear motion $x(t) = x_i + v_it + \frac{1}{2}at^2$, which we obtained starting from Newton's second law $F=ma$ after two integration steps.

In this section, we pulled the $x(t)=A\cos(\omega t + \phi)$ formula out of thin air, as if by revelation. Why did we suddenly start talking about $\cos$ functions and Greek letters with dubious names like phase. Are you phased by all of this? When I was first learning about simple harmonic motion, I was totally phased because I didn't see where the $\sin$ and $\cos$ came from.

The $\cos$ also comes from $F=ma$, but the story is a little more complicated this time. The force exerted by a spring is $F_{s} = -kx$. If you draw a force diagram on the mass, you will see that the force of the spring is the only force acting on it so we have: \[ \sum F = F_s =ma \qquad \Rightarrow \qquad -kx = ma. \] Recall that the acceleration is the second derivative of the position: \[ a=\frac{dv(t)}{dt} = \frac{d^2x(t)}{dt^2} = x^{\prime\prime}(t). \]

We now rewrite the equation $-kx = ma$ in terms of the function $x(t)$ and its second derivative: \[ \begin{align*} -kx(t) &= m\frac{d^2x(t)}{dt^2} \nl 0 & = m\frac{d^2x(t)}{dt^2}+ kx(t) \nl 0 & = \frac{d^2x(t)}{dt^2}+ \frac{k}{m}x(t). \end{align*} \]

This is called a differential equation. Instead of looking for an unknown number as in normal equations, in differential equations we are looking for an unknown function $x(t)$. We do not know what $x(t)$ is but we do know one of its properties, namely, that its second derivative $x^{\prime\prime}(t)$ is equal to the negative of $x(t)$ multiplied by some constant.

To solve a differential equation, you have to guess which function $x(t)$ satisfies this property. There is an entire course called Differential Equations, in which engineers and physicists learn how to do this guessing thing. Can you think of a function which, when multiplied by $\frac{k}{m}$, is equal to its second derivative?

OK, I thought of one: \[ x_1(t)=A_1 \cos\!\left( \sqrt{ \frac{k}{m}}t \right). \] Come to think of it, there is also a second one which works: \[ x_2(t)=A_2 \sin\!\left( \sqrt{ \frac{k}{m}}t \right). \] You should try this for yourself: verify that $x^{\prime\prime}_1(t) + \frac{k}{m}x_1(t)=0$ and $x^{\prime\prime}_2(t) + \frac{k}{m}x_2(t)=0$, which means that these functions are both solutions to the differential equation $x^{\prime\prime}(t)+\frac{k}{m} x(t)=0$. Since both $x_1(t)$ and $x_2(t)$ are solutions, any combination of them must also be a solution: \[ x(t) = A_1\cos(\omega t) + A_2\sin(\omega t). \] This is kind of the answer we were looking for. I say kind of because the function $x(t)$ is specified in terms of the coefficients $A_1$ and $A_2$ instead of the usual parameters: the amplitude $A$ and a phase $\phi$.

Lo and behold, using the trigonometric identity $\cos(a + b)=\cos(a)\cos(b) - \sin(a)\sin(b)$ we can express the function $x(t)$ as a time-shifted trigonometric function: \[ x(t)=A\cos(\omega t + \phi) = A_1\cos(\omega t) + A_2\sin(\omega t). \] The expression on the left is the preferred way of describing SHM because the parameters $A$ and $\phi$ corresponds to observable aspects of the motion.

Let me go over what just happened here one more time. Our goal was to find the equation of motion which predicts the position of an object as a function of time $x(t)$. To understand what is going on, let us draw an analogy with a situation which we have seen previously. In linear kinematics, uniform accelerated motion with $a(t)=a$ is described by the equation $x(t)=x_i+v_it + \frac{1}{2}at^2$ in terms of parameters $x_i$ and $v_i$. Depending on the initial velocity and the initial position of the object, we obtain different trajectories. Simple harmonic motion with angular frequency $\omega$ is described by the equation $x(t)=A\cos(\omega t + \phi)$ in terms of the parameters $A$ and $\phi$, which are the natural parameters for describing SHM. We obtain different harmonic motion trajectories depending on the values of the parameters $A$ and $\phi$.

Derivation of the pendulum SHM equation

To see how the SHM equation of motion arises in the case of the pendulum, we need to start from the torque equation $\mathcal{T}=I\alpha$.

The torque caused by the weight of the pendulum is proportional to the sine of the angle. The diagram on the right illustrates how we can calculate the torque on the pendulum which is caused by the force of gravity as a function of the displacement angle $\theta$. Recall that the torque calculation only takes into account the $F_{\!\perp}$ component of any force, since it is the only part which causes rotation: \[ \mathcal{T}_\theta = F_{\!\perp} \ell = mg\sin\theta \ell. \] If we now substitute this into the equation $\mathcal{T}=I\alpha$, we obtain the following: \[ \begin{align*} \mathcal{T} &= I \alpha \nl mg\sin\theta(t) \ell &= m\ell^2 \frac{d^2\theta(t)}{dt^2} \nl g\sin\theta(t) &= \ell \frac{d^2\theta(t)}{dt^2} \end{align*} \]

What follows is something which is not mathematically rigorous, but will allow us to continue and solve this problem. When $\theta$ is a small angle we can use the following approximation: \[ \sin(\theta)\ \approx \ \theta, \qquad \qquad \text{ for } \theta \ll 1. \] This type of equation is called a small angle approximation. You will see where it comes from later on when you learn about Taylor series approximations to functions. For now, you can convince yourself of the above formula by zooming many times on the graph of the function $\sin$ near the origin to see that $y=\sin(x)$ will look very much like $y=x$. Try this out.

Using the small angle approximation for $\sin\theta$ we can rewrite the equation involving $\theta(t)$ and its second derivative as follows: \[ \begin{align*} g\sin\theta(t) &= \ell \frac{d^2\theta(t)}{dt^2} \nl g\theta(t) &\approx \ell \frac{d^2\theta(t)}{dt^2} \nl 0 &= \frac{d^2\theta(t)}{dt^2}+ \frac{g}{\ell}\theta(t). \end{align*} \]

At this point we recognize that we are dealing with the same differential equation as in the case of the mass-spring system: $\theta^{\prime\prime}(t)+\omega^2 \theta(t)=0$, which has solution: \[ \theta(t) = \theta_{max}\cos(\omega t + \phi), \] where the constant inside the $\cos$ function is $\omega=\sqrt{\frac{g}{\ell}}$.

Examples

When asked to solve word problems, you will usually be told the initial amplitude $x_i=A$ or the initial velocity $v_i=\omega A$ of the SHM and the question will ask you to calculate some other quantity. Answering these problems shouldn't be too difficult provided you write down the general equations for $x(t)$, $v(t)$ and $a(t)$, fill-in the knowns quantities and then solve for the unknowns.

Standard example

You are observing a mass-spring system build from a $1$[kg] mass and a 250[N/m] spring. The amplitude of the oscillation is 10[cm]. Determine (a) the maximum speed of the mass, (b) the maximum acceleration, and (c ) the total mechanical energy of the system.

First we must find the angular frequency for this system $\omega = \sqrt{k/m}=\sqrt{250/1}=15.81$[rad/s]. To find (a) we use the equation $v_{max} = \omega A = 15.81 \times 0.1=1.58$[m/s]. Similarly, we can find the maximum acceleration using $a_{max} = \omega^2 A = 15.81^2 \times 0.1=25$[m$^2$/s]. There are two equivalent ways for solving (c ). We can obtain the total energy of the system by considering the potential energy of the spring when it is maximally extended (compressed) $E_T=U_s(A) = \frac{1}{2}kA^2 = 1.25$[J], or we can obtain the total energy from the maximum kinetic energy $E_T=K=\frac{1}{2}m v_{max}^2 = 1.25$[J].

Discussion

In this section we learned about simple harmonic motion, which is described by the equation $x(t)=A\cos(\omega t + \phi)$. You may be wondering what non-simple harmonic motion is. A simple extension of what we learned would be to study oscillating systems where the energy is slowly dissipating. This is known as damped harmonic motion for which the equation of motion looks like $x(t)=Ae^{-\gamma t}\cos(\omega t + \phi)$, which describes an oscillation whose magnitude slowly decreases. The coefficient $\gamma$ is known as the damping coefficient and indicates how fast the energy of the system is dissipated.

The concept of SHM comes up in many other areas of physics. When you learn about electric circuits, capacitors and inductors, you will run into equations of the form $v^{\prime\prime}(t)+\omega^2 v(t)=0$, which indicates that the voltage in a circuit is undergoing simple harmonic motion. Guess what, the same equation used to describe the mechanical motion of the mass-spring system will be used to describe the voltage in an oscillating circuit!

Links

[ Plot of the simple harmonic motion using a can of spray-paint. ]
http://www.youtube.com/watch?v=p9uhmjbZn-c

NOINDENT [ 15 pendulums with different lengths. ]
http://www.youtube.com/watch?v=yVkdfJ9PkRQ

Optics

Introduction

A camera consists essentially of two parts: a detector and a lens construction. The detector is some surface that can record the light which hits it. Old-school cameras used the chemical reaction of silver-oxidation under light, whereas modern cameras use electronic photo-detectors.

While the detector is important, that which really makes or brakes a camera is the lens. The lens' job is to take the light reflected off some object (that which you are taking a picture of) and redirect it in an optimal way so that a faithful image forms on the detection surface. The image has to form exactly at the right distance $d_i$ (so that it is in focus) and have exactly the right height $h_i$ (so it fits on the detector).

To understand how lenses transform light, there is just one equation you need to know: \[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f}, \] where $d_o$ is the distance from the object to the lens, $d_i$ is the distance from the lens to the image and $f$ is called the focal length of the lens. This entire chapter is dedicated to this equation and its applications. It turns out that curved mirrors behave very similarly to lenses, and the same equation can be used to calculate the properties of the images formed by mirrors. Before we talk about curved mirrors and lenses, we will have to learn about the basic properties of light and the laws of reflection and refraction.

Light

Light is pure energy stored in the form of a travelling electromagnetic wave.

The energy of a light particle is stored in the electromagnetic oscillation. During one moment, light is a “pulse” of electric field in space, and during the next instant it is a “pulse” of pure magnetic energy. Think of sending a “wrinkle pulse” down a long rope – where the pulse of mechanical energy is traveling along the rope. Light is like that, but without the rope. Light is just an electro-magnetic pulse and such pulses happen even in empty space. Thus, unlike most other waves you may have seen until now, light does not need a medium to travel in: empty space will do just fine.

The understanding of light as a manifestation of electro-magnetic energy (electromagnetic radiation) is some deep stuff, which is not the subject of this section. We will get to this, after we cover the concept of electric and magnetic fields, electric and magnetic energy and Maxwell's equations. For the moment, when I say “oscillating energy”, I want you to think of a mechanical mass-spring system in which the energy oscillates between the potential energy of the spring and the kinetic energy of the mass. A photon is a similar oscillation between a “magnetic system” part and the “electric system” part, which travels through space at the speed of light.

In this section, we focus on light rays. The vector $\hat{k}$ in the figure describes the direction of travel of the light ray.

  Oh light ray, light ray! 
  Where art thou, on this winter day.

Definitions

Light is made up of “light particles” called photons:

  • $p$: a photon.
  • $E_p$: the Energy of the photon.
  • $\lambda$: the wavelength the photon.
  • $f$: the frequency of the photon. (Denoted $\nu$ in some texts.)
  • $c$: the speed of light in vacuum. $c=2.9979\times 10^{8}$[m/s].

NOINDENT The speed of light depends on the material in which it travels:

  • $v_x$: the speed of light in material $x$.
  • $n_x$: the diffraction index of material $x$,

which tells you how much slower light is in that material

  relative to the speed of light in vacuum. 
  $v_x=c/n_x$.
  Air is pretty much like vacuum, 
  so $v_{air} \approx c$ and $n_{air}\approx 1$. 
  There are different types of glass used in 
  lens-manifacturing with $n$ values ranging from 1.4 to 1.7.

Equations

Like all travelling waves, the propagation speed of light is equal to the product of its frequency times its wavelength. In vacuum we have \[ c = \lambda f. \]

For example, red light of wavelength $\lambda=700$n[m], has frequency $f=428.27$THz since the speed of light is $c=2.9979\times 10^{8}$[m/s].

The energy of a beam of light is proportional to the intensity of the light (how many photon per second are being emitted) and the energy carried by each photon. The energy of a photon is proportional to its frequency: \[ E_p = \hbar f, \] where $\hbar=1.05457\times 10^{-34}$ is Plank's constant. The above equation is a big deal, since it applies not just to light but to all forms of electromagnetic radiation. The higher the frequency, the more energy per photon there is. Einstein got a Nobel prize for figuring out the photoelectric effect which is a manifestation of the above equation.

The speed of light in a material $x$ with refractive index $n_x$ is \[ v_x = \frac{c}{n_x}. \]

Here is a list of refractive indices for some common materials: $n_{vacuum}\equiv 1.00$, $n_{air} = 1.00029$, $n_{ice}=1.31$, $n_{water}=1.33$, $n_{fused\ quartz}=1.46$, $n_{NaCl}=1.54$, Crown glass 1.52-1.62, Flint glass 1.57-1.75, $n_{sapphire}=1.77$, and $n_{diamond}=2.417$.

Discussion

Visible light

Our eyes are able to distinguish certain wavelengths of light as different colours.

Color Wavelength (nm)
Red 780 - 622
Orange 622 - 597
Yellow 597 - 577
Green 577 - 492
Blue 492 - 455
Violet 455 - 390

Note that units of wavelength are tiny numbers like: nanometers $1[\textrm{nm}]=10^{-9}[\textrm{m}]$ or Armstrongs $1[\textrm{A}]=10^{-10}[\textrm{m}]$.

The electromagnetic spectrum

Visible light is only a small part of the electromagnetic spectrum. Waves with frequency higher than that of violet light are called ultraviolet (UV) radiation and cannot be seen by the human eye. Also, frequencies lower than that of red light (infrared) are not seen, but can sometimes be felt as heat.

The EM spectrum extends to all sorts of frequencies (and therefore wavelengths, by $c=\lambda f$). We have different names for the different parts of the EM spectrum. The highest energy particles (highest frequency $\to$ shortest wavelength) are called gamma rays ($\gamma$-rays). We are constantly bombarded by gamma rays coming from outer space with tremendous energy. These $\gamma$-rays are generated by nuclear reactions inside distance stars.

Particles with less energy than $\gamma$-rays are called X-rays. These are still energetic enough that they easily pass through most parts of your body like a warm knife through butter. Only your bones offer some resistance, which is kind of useful in medical imaging since all bone structure can be seen in contrast when taking an X-ray picture.

The frequencies below the visible range (wavelengths longer than that of visible light) are populated by radio waves. And when I say radio, I don't mean specifically radio, but any form of wireless communication. Starting from 4G (or whatever cell phones have gotten to these days), then the top GSM bands at 2.2-2.4GHz, the low GSM bands 800-900MHz, and then going into TV frequencies, FM frequencies (87–108MHz) and finally AM frequencies (153kHz–26.1MHz). It is all radio. It is all electromagnetic radiation emitted by antennas, travelling through space and being received by other antennas.

Light rays

In this section we will study how light rays get reflected off the surfaces of objects and what happens when light rays reach the boundary between two different materials.

Definitions

The speed of light depends on the material where it travels:

  • $v_x$: the speed of light in material $x$.
  • $n_x$: the diffraction index of material $x$,

which tells you how much slower light is in that material.

  $v_x=c/n_x$.

When an incoming ray of light comes to the surface of a transparent object, part of it will be reflected and part of it will be transmitted. We measure all angles with respect to the normal, which is the direction perpendicular to the interface.

  • $\theta_{i}$: The incoming or incidence angle.
  • $\theta_{r}$: The reflection angle.
  • $\theta_{t}$: The transmission angle:

the angle of the light that goes into the object.

Formulas

Reflection

Light that hits a reflective surface will bounce back exactly at the same angle as it came in on: \[ \theta_{i} = \theta_{r}. \]

Refraction

The transmission angle of light when it goes into a material with different refractive index can be calculated from Snell's law: \[ n_i \sin\theta_{i} = n_t \sin \theta_{t}. \]

Total internal refraction

Light coming in from a medium with low refraction index into a medium with high refraction index gets refracted towards the normal. If the light travels in the opposite direction (from high $n$, to low $n$), then it will get deflected away from the normal. In the latter case, an interesting phenomenon called total internal refraction occurs, whereby light rays incident at sufficiently large angles with the normal get trapped inside the material. The angle at which this phenomenon starts to kick in is called the critical angle $\theta_{crit}$.

Consider a light ray inside a material of refractive indeed $n_x$ surrounded by a material with smaller refractive index $n_y$, $n_x > n_y$. To make this more concrete, think of a trans-continental underground optical cable made of glass $n_x=1.7$ surrounded by some plastic with $n_y=1.3$. All light at an angle greater than: \[ \theta_{crit} = \sin^{-1}\left( \frac{n_y}{n_{x}} \underbrace{\sin(90^\circ)}_{=1} \right) = \sin^{-1}\!\left( \frac{n_y}{n_{x}} \right) = \sin^{-1}\!\left( \frac{1.3}{1.7} \right) = 49.88^\circ, \] will get reflected every time it reaches the surface of the optical cable. Thus, if you shine a laser pointer into one end of such a fibre-optical cable in California, 100% of that laser light will come out in Japan. Most high-capacity communication links around the world are based around this amazing property of light. In other words: no total internal refraction means no internet.

Examples

What is wrong in this picture?

Here is an illustration from one of René Descartes' books, which shows a man in funny pants with some sort of lantern which produces a light ray that goes into the water.

Q: What is wrong with the picture?

Hint: Recall that $n_{air}=1$ and $n_{water}=1.33$, so $n_i < n_t$.

Hint 2: What should happen to the angles of the light ray?

A: Suppose that the line $\overline{AB}$, is at $45^\circ$ angle, then after entering the water at $B$, the ray should be deflected towards the normal, i.e., it should pass somewhere between $G$ and $D$. If you wanted to be precise and calculate the transmission angle then we would use: \[ n_i \sin\theta_{i} = n_t \sin \theta_{t}, \] filled in with the values for air and water \[ 1 \sin(45^\circ) = 1.33 \sin( \theta_{t} ), \] and solved for $\theta_{t}$ (the refracted angle) we would get: \[ \theta_{t} = \sin^{-1}\left( \frac{\sqrt{2}}{2\times1.33} \right) = 32.1^\circ. \] The mistake apparently is due to Descartes' printer who got confused and measured angles with respect to the surface of the water. Don't make that mistake: remember to always measure angles with respect to the normal. The correct drawing should have the light ray going at an angle of $32.1^\circ$ with respect to the line $\overline{BG}$.

Explanations

Refraction

To understand refraction you need to imagine “wave fronts” perpendicular to the light rays. Because light comes in at an angle, one part of the wave front will be in material $n_i$ and the other will be in material $n_t$. Suppose $n_i < n_t$, then the part of the wavefront in the $n_t$ material will move slower so angles of the wavefronts will change. The precise relationship between the angles will depend on the refractive indices of the two materials:

\[ n_i \sin\theta_{i} = n_t \sin \theta_{t}. \]

Total internal refraction

Whenever $n_i > n_t$, we reach a certain point where the formula: \[ n_i \sin\theta_{i} = n_t \sin \theta_{t}, \] brakes down. If the transmitted angle $\theta_t$ becomes greater than $90^\circ$ (the critical transmission angle) it will not be transmitted at all. Instead, 100% of the light ray will get reflected back into the material.

To find the critical incident angle solve for $\theta_i$ in: \[ n_i \sin\theta_{i} = n_t \sin 90^\circ, \] \[ \theta_{crit} = \sin^{-1}\left( \frac{n_t}{n_{i}} \right). \]

The summary of the “what happens when a light ray comes to a boundary”-story is as follows:

  1. If $-\theta_{crit} < \theta_i < \theta_{crit}$, then some part of the light will be

transmitted at an angle $\theta_t$ and some part will be reflected at an angle $\theta_r=\theta_i$.

  1. If $\theta_i \geq \theta_{crit}$, then all the light will get reflected at an angle $\theta_r=\theta_i$.

Note that when going from a low $n$ medium into a high $n$ medium, there is no critical angle – there will always be some part of the light that is transmitted.

Parabolic shapes

The parabolic curve has a special importance in optics. Consider for example a very weak radio signal coming from a satellite in orbit. If you use just a regular radio receiver, the signal will be so weak as to be indistinguishable from the background noise. However, if you use a parabolic satellite dish to collect the power from a large surface area and focus it on the receiver, then you will be able to detect the signal. This works because of the parabolic shape of the satellite dish: all radio wave coming in from far away will get reflected towards the same point—the focal point of the parabola. Thus, if you put your receiver at the focal point, it will have the signal power from the whole dish redirected right to it.

Depending on the shape of the parabola (which way it curves and how strong the curvature is) the focal point or focus will be at a different place. In the next two sections, we will study parabolic mirrors and lenses. We will use the “horizontal rays get reflected towards the focus”-fact to draw optics diagrams and calculate where images will be formed.

Mirrors

Definitions

To understand how curved mirrors work, we imagine some test object (usually drawn as an arrow, or a candle) and the test image it forms.

  • $d_o$: The distance of the object from the mirror.
  • $d_i$: The distance of the image from the mirror.
  • $f$: The focal length of the mirror.
  • $h_o$: The height of the object.
  • $h_i$: The height of the image.
  • $M$: The magnification $M=h_i/h_o$.

When drawing optics diagrams with mirrors, we can draw the following three rays:

  • $R_\alpha$: A horizontal incoming ray which gets redirected towards

the focus after it hits the mirror.

  • $R_\beta$: A ray that passes through the focus and gets redirected horizontally

after it hits the mirror.

  • $R_\gamma$: A ray that hits the mirror right in the centre and bounces back

at the same angle at which it came in.

Formulas

The following formula can be used to calculate where an image will be formed, given that you know the focal length of the mirror and the distance $d_o$ of the object: \[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f}. \]

We follow the convention that distances measured from the reflective side of the mirror are positive, and distances behind the mirror are negative.

The magnification is defined as: \[ M = \frac{h_i}{h_o} = \frac{|d_i|}{|d_o|} \] How much bigger is the image compared to the object?

Though it might sound confusing, we will talk about magnification even when the image is smaller than the object; in those cases we say we have fractional magnification.

Examples

Visual examples

Mirrors reflect light, so it is usual to see an image formed on the same side as where it came from. This leads to the following convention:

  1. If the image forms on the usual side (in front of the mirror),

then we say it has positive distance $d_i$.

  1. If the image forms behind the mirror, then it has negative $d_i$.

Let us first look at the kind of mirror that you see in metro tunnels: convex mirror. These mirrors will give you a very broad view, and if someone is coming around the corner the hope is that your peripheral vision will be able to spot them in the mirror and you won't bump into each other.

I am going to draw $R_\alpha$ and $R_\gamma$:

Note that the image is “virtual”, since it appears to form inside the mirror.





Here is a drawing of a concave mirror instead, with the rays $R_\alpha$ and $R_\gamma$ drawn again.

Can you add the ray $R_\beta$ (through the focus)? As you can see, any two rays out of the three are sufficient to figure out where the image will be: just find the point where the rays meet.

Here are two more examples where the object its placed closer and closer to the mirror.

These are meant to illustrate that the same curved surface, and the same object can lead to very different images depending on where the object is placed relative to the focal point.

Numerical example 1

OK, let's do an exercise of the “can you draw straight lines using a ruler” type now. You will need a piece of white paper, a ruler and a pencil. Go get this stuff, I will be waiting right here.

Q: A convex mirror (like in the metro) is placed at the origin. An object of height 3[cm] is placed $x=5$[cm] away from the mirror. Where will the image be formed?

Geometric answer: Instead of trying to draw a curved mirror, we will draw a straight line. This is called the thin lens approximation (in this case, thin mirror) and it will make the drawing of lines much simpler. Take out the ruler and draw the two rays $R_\alpha$ and $R_\gamma$ as I did:

Then I can use the ruler to measure out $d_i\approx 1.7cm$.

Formula Answer: Using the formula \[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f}, \] with the appropriate values filled in \[ \frac{1}{5} + \frac{1}{d_i} = \frac{1}{-2.6}, \] or \[ d_i = 1.0/(-1.0/2.6 - 1.0/5) = -1.71 \text{[cm]}. \] Nice.

Observe that (1) I used a negative focal point for the mirror since in some sense the focal point is “behind” the mirror, and (2) the image is formed behind the mirror, which means that it is virtual: this is where the arrow will appear to an the observing eye drawn in the top left corner.

Numerical example 2

Now we have a concave mirror with focal length $f=2.6cm$ and we measure the distances the same way (positive to the left).

Q: An object is placed at $d_0=7$[cm] from the mirror. Where will the image form? What is the height of the image?

Geometric answer: Taking out the ruler, you can choose to draw any of the three rays. I picked $R_\alpha$ and $R_\beta$ since they are the easiest to draw:

Then measuring with the ruler I find that $d_i \approx 4.3$[cm], and that the image is height $h_i\approx-1.9$[cm], where negative height means that the image is upside down.

Formula Answer: With the formula now. We start from \[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f}, \] and fill in what we know \[ \frac{1}{7} + \frac{1}{d_i} = \frac{1}{2.6}, \] then solve for $d_i$: \[ d_i = 1.0/(1.0/2.6 - 1.0/7.0) = 4.136 \text{[cm]}. \] To find the height of the image we use \[ \frac{h_i}{h_o} = \frac{d_i}{d_o}, \] so \[ h_i = 3 \times \frac{4.136}{7.0} = 3 \times 4.13/7.0 = 1.77 \text{[cm]}. \] You still need the drawing to figure out that the image is inverted though.

Generally, I would trust the numeric answers from the formula more, but read the signs of the answers from the drawing. Distances in front of the mirror are positive whereas images formed behind the mirror have negative distance.

Links

Lenses

Definitions

To understand how lenses work, we imagine again some test object. (an arrow) and the test image it forms.

  • $d_o$: The distance of the object from the lens.
  • $d_i$: The distance of the image from the lens.
  • $f$: The focal length of the lens.
  • $h_o$: The height of the object.
  • $h_i$: The height of the image.
  • $M$: The magnification $M=h_i/h_o$.

When drawing lens diagrams, we use the following representative rays:

  • $R_\alpha$: A horizontal incoming ray which gets redirected towards

the focus after it passes through the lens.

  • $R_\beta$: A ray that passes through the focus and gets redirected horizontally

after the lens.

  • $R_\gamma$: A ray that passes exactly through the centre of the lens

and travels in a straight line.

Formulas

\[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f} \]

\[ M = \frac{h_i}{h_o} = \frac{|d_i|}{|d_o|} \]

Examples

Visual

Fist consider the typical magnifying glass situation. You put the object close to the lens, and looking from the side, the object will appear magnified.

A similar setup with a diverging lens. This time the image will appear to the observer to be smaller than the object.

Note that in the above two examples, if you used the formula you would get a negative $d_i$ value since the image is not formed on the “right” side. We say the image is virtual.

Now for an example where a real image is formed:

In this example all the quantities $f$, $d_o$ and $d_i$ are positive.

Numerical

An object is placed at a distance of 3[cm] from a magnifying glass of focal length 5[cm]. Where will the object appear to be?

You should really try this on your own. Just reading about light rays is kind of useless. Try drawing the above by yourself with the ruler. Draw the three kinds of rays: $R_\alpha$, $R_\beta$, and $R_\gamma$.

Here is my drawing.

Numerically we get \[ \frac{1}{d_o} + \frac{1}{d_i} = \frac{1}{f}, \] \[ \frac{1}{3.0} + \frac{1}{d_i} = \frac{1}{5.0}, \] \[ d_i = 1.0/(1.0/5.0 - 1.0/3.0) = -7.50 \text{[cm]}. \]

As you can see, drawings are not very accurate. Always trust the formula for the numeric answers to $d_o$, $d_i$ type of questions.

Multiple lenses

Imagine that the “output” image formed by the first lens is the “input” image to a second lens.

It may look complicated, but if you solve the problem in two steps (1) how the object forms an intermediary image, and (2) how the intermediary image forms the final image you will get things right.

You can also trace all the rays as they pass through the double-lens apparatus:

We started this chapter talking about real cameras, so I want to finish on that note too. To form a clear image, with variable focus and possibly zoom functionality, we have to use a whole series of lenses, not just one or two.

For each lens though, we can use the formula and calculate the effects of that lens on the light coming in.

Note that the real world is significantly more complicated than the simple ray picture which we have been using until now. For one, each frequency of light will have a slightly different refraction angle, and sometimes the lens shapes will not be perfect parabolas, so the light rays will not be perfectly redirected towards the focal point.

Discussion

Fresnel lens

Thicker lenses are stronger. The reason is that the curvature of a thick lens is bigger and thus light will be refracted more when it hits the surface. The actual thickness of the lens is of no importance. The way rays get deflected by lenses only depends on the angles of incidence. Indeed, we can cut out all the middle part of the lens and leave a highly curved surface parts. This is called a Fresnel lens and it is used in car headlights.

Useful links

An amazing resource !!!!!
http://www.xs4all.nl/~johanw/#formularium

below is from Wikipedia

La physique est une science dont une des expressions les plus précises et utiles pour faire des prévisions est le langage mathématique. Des lois physiques traduisent les phénomènes et observation, et souvent leur expression mathématique est courte et explicite … pour ceux qui maîtrisent cet outil que sont les mathématiques.

Les 'formules de physique' sont des expressions qui montrent les relations entre la matière, l'énergie, le mouvement, et les forces dans ce langage mathématique. La vision des formules multiples sur une page peut permettre de comprendre les relations entre les variables, après un cours de physique de base de niveau secondaire (typiquement proposé aux 16-18 ans).

L'objectif de cette page est de présenter les relations (formules) principales sous forme mathématique autant que verbale, pour que les élèves en aient une meilleure compréhension. La formulation verbale de nombreuses relations doit encore être ajoutée ou précisée.

Signification des symboles

$a\,$ : accélération

$A\,$ : surface ou amplitude

$c\,$ : vitesse de la lumière

$E\,$ : énergie

$F\,$ : force

$F_{resultante} = \sum F_i$ : force résultante

$f_k\,$ : force de frottement cinétique

$f_s\,$ : force de frottement statique

$g\,$ : accélération de la gravité

$I\,$ : percussion mécanique

$E_c\,$ : énergie cinétique

$m\,$ : masse

$\mu_c\,$ : coefficient frottement cinétique

$\mu_s\,$ : coefficient de frottement statique

$F_N\,$ : force normale à une surface ou un axe

$\nu \,$ : fréquence

$\omega \,$ : vitesse angulaire

$\vec{p}$ : quantité de mouvement

$P\,$ : puissance

$Q\,$ : quantité de chaleur

$r\,$ : rayon

$\vec{s}\,$ : distance parcourue

$T\,$ : période

$t\,$ : temps

$\theta\,$ : angle (voir les annotations à côté de chaque formules)

$E_p\,$ : énergie potentielle

$V\,$ : volume

$V_{df}\,$ : volume de fluide déplacé

$v_f\,$ : vitesse finale

$v_i\,$ : vitesse initiale

$x_f\,$ : position finale

$x_i\,$ : position initiale

[[Cinématique]] du [[MRUA]] ou des cas où l'accélération est constante

Les formules de cinématique lient la position d'un objet, sa vitesse, et son accélération, sans tenir compte de sa masse et des forces qui s'exercent sur lui.

$ v = \left( \frac{\Delta x}{\Delta t}\right)_{{\Delta t} \rightarrow 0} $ : la vitesse d'un mobile en un instant est la dérivée de la position $ x(t) $ en fonction du temps, c'est-à-dire aussi la pente de la tangente à la courbe $ x(t) $ de la position en fonction du temps en cet instant.

$ a = \left( \frac{\Delta v}{\Delta t}\right)_{{\Delta t} \rightarrow 0} $ : l'accélération d'un mobile en un instant est la dérivée de la vitesse $ v(t) $ en fonction du temps, c'est-à-dire aussi la pente de la tangente à la courbe $ v(t) $ de la vitesse en fonction du temps en cet instant.

$ \Delta v = a \Delta t \,\, {\rm avec } \,\, \Delta v = v_f - v_i = \,\, {\rm ou } \,\, v_f = v_i + a \Delta t $ : la vitesse varie linéairement avec le temps

$ \Delta x = {v_i}{t} + \frac{1}{2}{at^2} \,\, {\rm avec } \,\, \Delta x = x_f - x_i \,\, {\rm ou } \, \, x_f = x_i + {v_i}{t} + \frac{1}{2}{at^2}$ : l'espace parcouru (ou la position) varie quadratiquement (comme une parabole) avec le temps

d'où l'on peut déduire aussi les relations

$ x_f = x_i + {v_i}{t} + \frac{1}{2}{at^2}$

$ x_f = x_i + \frac{(v_i+v_f)}{2}t $

$ v_f^2 = v_i^2 + {2a}{( x_f - x_i )} = v_i^2 + 2 a \Delta x $

[[Dynamique (physique)|Dynamique]]

Comme la cinématique, la dynamique concerné le mouvement mais cette fois en prenant en compte la force et la masse des objets.

$ {\vec{F}_{resultante}} = m . \vec{a}\,\ $ : une force agissant sur un mobile communique à celui-ci une accélération inversément proportionnelle à sa masse. C'est la seconde loi de Newton

$F_N = m g \cos \theta\,$ ($\theta\,$ est l'angle entre la surface de support et la verticale) : la force normale (perpendiculaire) exercée par une surface faisant un angle $\theta\,$ avec l'horizontale, sur un corps est la projection de son poids sur cette direction perpendiculaire

$F_c = {\mu_c} F_N\,\ $ : la force de frottement cinétique, qui apparaît lorsque le point de contact entre l'object est en mouvement l'une par rapport au support, est proportionnelle à la force avec laquelle le support agit sur l'objet.

$F_s = {\mu_s} \vec{F}_N\,\ $ : la force de frottement statique, qui apparaît lorsque le point de contact entre l'object est immobile l'une par rapport au support, est proportionnelle à la force avec laquelle le support agit sur l'objet. Cette dernière est presque toujours plus grande que la force de frottement cinétique (donc $ {\mu_s} > {\mu_c} $)

[[Travail]], [[énergie]], et [[Puissance (physique)|puissance]]

Le travail, l'énergie et la puissance décrivent la manière dont les objets affectent la nature.

$ W = \int \vec{F} \cdot d\vec{s}$ – définition du travail mécanique, en toute généralité et en particulier si la force change le long du déplacement. Si la force est 'constante' (en direction, sens et norme) sur tout le déplacement, cette relation devient simplement : $ W = \vec{F} \cdot \Delta \vec{x}$

$ W = \Delta {E_c}\,\!$ : une expression du théorème de l'énergie cinétique

$ W = -\Delta {E_p}\,\!$ : une définition de l'énergie potentielle

$ E_p = mgh \,\!$ : l'énergie potentielle par rapport à une hauteur repère h est donnée par le produit du poids et de la hauteur h.

$ E_m = E_c + E_p \,\!$ : l'énergie mécanique d'un système est la somme de son énergie cinétique et de son énergie potentielle

$ E_c = \frac{1}{2}{mv^2}\,\!$ : définition de l'énergie cinétique d'un corps

$ P = \frac{dE}{dt} = \int \vec{F}\cdot \vec{v} \,\!$

$ P_{moy} = \frac{\Delta E}{\Delta t}\,\!$

[[Mouvement harmonique simple]] et [[pendule simple]]

$ \vec{F} = -k \Delta \vec{x}\,\!$ : la force exercée sur un corps par un ressort est proportionnelle à l'allongement de celui-ci par rapport à sa position d'équilibre, et est orientée dans le sens opposé à cet allongement. C'est une force de rappel. k est la raideur du ressort) d'après la loi de Hooke

$ T_{ressort} = 2\pi\sqrt{\frac{m}{k}}\,\!$ : la période d'une masse m accrochée à une ressort de rigidité k est proportionnelle à la racine du rapport de la masse et de cette rigidité

$ \nu = \frac{1}{T}\,\!$

$ \omega = 2 \pi \frac{1}{T}\,\! = 2 \pi \nu = \sqrt{\frac{k}{m}} $

$ E_p = \frac{1}{2}kx^2\,\!$

$ v_{max ressort} = x\sqrt{\frac{k}{m}}\,\!$

$ T_{pendule} = 2\pi\sqrt{\frac{L}{g}}\,\!$ (pour un pendule simple)

[[Quantité de mouvement]]

La quantité de mouvement est la grandeur associée à la vitesse d'une masse, en mécanique classique.

$ \vec{p} = m\vec{v} \,\!$ – définition : la quantité de mouvement d'un corps est le produit de sa masse par sa vitesse.

$ I = \int F \,dt$ – définition : l' impulsion ou percussion mécanique reçue par un corps est, si la force exercée sur celui-ci est constante dans le temps, le produit de la force et du temps.

$ \Delta p \,\! = I $ : la variation de quantité de mouvement d'un corps durant un certain temps $ \Delta t $ est donnée par l' impulsion ou percussion mécanique communiquée à ce corps

$ m_1\vec{v_1} + m_2\vec{v_2} = m_1\vec{v_1'} + m_2\vec{v_2'} \,\!$ : dans un système pour lequel l'impulsion communiquée à un corps est nulle, la quantité de mouvement ne change pas. Ceci est une expression de la conservation de la quantité de mouvement

$ \frac{1}{2}m_1v_1^2 + \frac{1}{2}m_2v_2^2 = \frac{1}{2}m_1v_1'^2 + \frac{1}{2}m_2v_2'^2 \,\!$ (Note: ceci n'est valable que pour les collisions élastiques)

[[Mouvement circulaire uniforme]] et [[gravitation]]

Un objet, par exemple un satellite autour d'une planète ou une planète autour du soleil, se déplace sur une circonférence à vitesse dont la grandeur est constante.

Dans cette section, $a_c$ et $F_c$ réprésentent respectivement l'accélération centripète et la force centripète.

$ a_c = \frac{v^2}{r} = \frac{4\pi^2r}{t^2}\,\!$

$ F_c = \frac{mv^2}{r}\,\!$

$ F_g = G\frac{m_1 m_2}{r^2}\,\!$ où $r$ est la distance entre les centres des masses : loi de la gravitation universelle

$ a_{gravite} = G\frac{m_{planete}}{r^2}\,\!$

$ v_{satellite} = \sqrt{\frac{Gm_{planete}}{R}}$

$ E_p^{gravitationnelle} = G\frac{m_1 m_2}{r}$

$ E_c^{satellite} = G\frac{m_{soleil} m_{planete}}{2R}$

$ E_c^{satellite} = -G\frac{m_{soleil} m_{planete}}{2R}$

$ \frac{T_1^2}{a_1^3} = \frac{T_2^2}{a_2^3}$ exprime une des lois de Kepler

[[Thermodynamique]]

La thermodynamique concerne les liens macroscopiques entre énergie, mouvement et entropie des particules microscopiques.

$ Q = mc \Delta T \,\!$

$ \Delta L = L_i \alpha \Delta T \,\!$

$ \Delta V = V_i \gamma \Delta T \,\!$

$ PV = nRT \,\!$ est la loi des gaz parfaits

$ \frac{P_iV_i}{T_i} = \frac{P_fV_f}{T_f} \,\!$ est la loi de Dalton

$ \Delta E_p = \Delta Q + \Delta T \,\!$

$ e = 1-\frac{\Delta Q_{out}}{\Delta Q_{in}} \,\!$

[[Mouvement circulaire]]

$\boldsymbol \tau=r F \sin \theta$ : le couple $ \boldsymbol \tau $ associé à une force par rapport à un axe est égal au produit de la force par la distance à l'axe.

$\omega = \frac{\Delta \theta}{\Delta t}$

$\alpha = \frac{\Delta \omega}{\Delta t}$

$v_{tan} = r\omega\,$ : la vitesse tangentielle est le produit de la vitesse angulaire par le rayon de la trajectoire

$a_{tan} = r\alpha\,$

$a_{rad} = \omega^2r\,$

$\omega = \omega_0 + \alpha t\,$ (accélération circulaire constante )

$\theta = \omega_0 t + \frac{1}{2}\alpha t^2$ (accélération circulaire constante)

$\omega^2 = \omega_0^2 + 2\alpha \theta$ (accélération circulaire constante )

$\omega_{moy} = \frac{\omega + \omega_0}{2}\,$ (accélération circulaire constante )

$\sum \tau = I\alpha$

$E_c = \frac{1}{2}Mv^2_{CM} + \frac{1}{2}I_{CM}\omega^2$

$L = I\omega\,$

$\sum \tau = {\Delta L \over \Delta t}$

[[Mécanique des fluides]]

$ F_{Archimede} = \rho_{fluide} g V_{immerg\acute{e}}\,$ est le principe d'Archimède : tout corps dont le volume immergé $V_{immerg\acute{e}}$ dans un fluide de masse spécifique $\rho_{fluide}$ subit une force verticale de bas en haut égale au poids en fluide du volume immergé. Cette force s'applique au centre de masse du fluide. C'est la résultante de toutes les forces de pression exercées sur le corps. Ce fluide peut être un liquide ou un gaz.

$ p = p_{atmospherique} + \rho g h\,$

$ p = \frac{F}{a}\,\!$

$ Q = Av\,\!$

portail_physique

Physique

List of elementary physics formulae

 
home about buy book