Network protocols discussion

There is an interesting discussion about network protocols going on at hacker news. In just a few posts some very knowledgeable people stepped in to explain what is going on. I saved the URL in the Links section of /miniref/comp/network_programming but it made me think about how much better informal explanation is to formal explanations.

Here you have hackers talking to other hackers. Whether it is a javascript-wielding young blood or an old dude who writes server-side stuff in C, all these people need to send data on the network and know of some some protocols for doing that: TCP:HTTP for web dev mostly while systems people probably think more in terms of (TCP:*).

Everyone jumps in to check what is going on on the HN discussion. And then suddenly learning happens. The discussion is a bit disorganized (shown as discussion tree) but let me give you the walk through.

First chetanahuja tells it like it is:

IP layer is for addressing nodes on the internet and routing packets to them in a stateless manner (more or less… I’m not counting routing table caches and such as “state”). TCP […] are built on top of the IP layer to provide reliable end-to-end data transfers in some sort of session based mechanism.

The key thing to know is that IP is a best-effort protocol. If you send an IP packet from computer A to computer B, the network will try to deliver it but there are not guarantees that it will succeed. No problem though, we can build a reliable protocol (the Transmission Control Protocol) on top of the unreliable one. This is why the Internet is usually referred to as TCP/IP, not just IP even thought IP stands for Internet Protocol. TCP/IP is the internet made reliable.

TCP is important because it allows for reliable communication: When A sends some data to B, B will reply with an ACKnowledge packet to tell A when he received the data. Reliability comes from the fact that A will retransmit all the packets for which no ACKs are received (the sender assumes these packets got lost). The other thing the TCP protocol gives you is the notion of a port — a multiplexing mechanism for running multiple networks services on the same machine. Port 80 is the HTTP port (web browsing). When you type in 11.22.33.44 into the browser, your browser will send a TCP packet to port 80 on 11.22.33.44:80, where the TCP port is separated by a colon. Another important port is port 25 (SMTP) which is used for email. When an email to user@11.22.33.44 is to be delivered, a connection will made to 11.22.33.44:25.

Sometimes you don’t want to have so much transmission control overhead. Imagine that you are sending some voice data so you want to send as many packets (maybe use forward error correcting codes) but you don’t want to bother with retransmission of lost packets. The voice data simply isn’t useful if it is not delivered on time. In such cases we would prefer a more basic protocol which doesn’t isolate us from the unreliability of IP.

This protocol is called UDP and it is really barebones. UDP is basically IP with some added port numbers (and error detection checksums). From now on we can’t simply talk about “port 80″ on host but we must say whether we mean port 80 in the tcp protocol (TCP:80) or in the udp protocol (UDP#80).
Speaking of UDP ports, let me tell you about a really important one. The (206.190.36.45, 98.138.253.109, 98.139.183.24).” My browser will connect to one of these IPs (over HTTP = TCP:80) chosen at random.

Another useful UDP service is DHCP (dynamic host configuration protocol). This is the magical process by which you are automatically assigned an IP address when you join a network. DHCP bootstraps the communication (you joined a new network, ok, but what is the network number for this network? who should you talk to? What IP address should you respond to?). Either you know this information (someone gave it to you on a piece of paper [sysadmin]) or you can make a DHCP request (which is UDP broadcast) and a DHCP server will respond to you and assign you an IP address, tell you what the network number is and tell you which router to talk to to go towards the Internet (the route).

Ok so everyone knows the basics now, but HN doesn’t just give you the basics — it gives you the advanced stuff too. What are the current problems with TCP/IP?

advm says:

Maybe TCP’s issues aren’t apparent when you’re using it to download page assets from AWS over your home Internet connection, but they become apparent when you’re doing large file transfers between systems whose bandwidth-delay products (BDPs) greatly exceed the upper limit of the TCP buffers on the end systems.
This may not be an issue for users of consumer grade Internet service, but it is an issue to organizations who have private, dedicated, high-bandwidth links and need to move a lot of data over large distances (equating to high latency) very quickly and often; CDNs, data centers, research institutions, or, I dunno, maybe someone like Google.
The BDP and the TCP send buffer size impose an upper limit on the window size for the connection. Ideally, in a file transfer scenario, the BDP and the socket’s send buffer size should be equal. If your send buffer size is lower than the BDP, you cannot ever transfer at a greater throughput than buffer_size / link_latency, and thus you cannot ever attain maximum bandwidth. I can explain in more detail why that’s true if you want, but otherwise here’s this: http://www.psc.edu/index.php/networking/641-tcp-tune
Unfortunately for end systems with a high BDP between them, most of the time the maximum send buffer size for a socket is capped by the system to something much lower than the BDP. This is a result of the socket implementation of these systems, not an inherent limitation of TCP.
An accepted user-level solution to this issue is to use multiple sockets in parallel, but that has its own issues, such as breaking fairness and not working well with the stream model. I can explain this more if you want, too, just let me know.

There are other problems with TCP, such as
slow start being, well, slow to converge on high-BDP networks,
bad performance in the face of random packet loss (e.g., TCP over Wi-Fi),
congestion control algorithms being too conservative (IMO, not everyone needs to agree on the same congestion control protocol for it to work well, it just needs to converge to network conditions faster, better differentiate types of loss, and yield to fairness more quickly),
TCP features such as selective ACKs not being widely used,
default TCP socket settings sucking and requiring a lot of tuning to get right,
crap with NAT that can’t be circumvented at the user level (UDP-based stream protocols can do rendezvous connections to get around NAT), and more.

People write whole papers on all these things. Problem is most of the public research exists as a shitty academic papers you wouldn’t probably bother reading anyway, and most of the people actually studying this stuff in-depth and coming up with solutions are private researchers and engineers working for companies like Google.

The last paragraph is a good reason why there should be a “No BS guide to computer systems”. I bet I can show services at all layers of the OSI stack and really go into details. The sockets are pretty cool.

pjscott describes the crypto stack in just one sentence:

Their crypto stuff looks pretty reasonable. Key exchange uses ECDH with either the P-256 or curve25519 polynomials. Once the session key is established, it’s encrypted with AES-128 and authenticated with either GCM or HMAC-SHA256. None of this is implemented yet, but it’s at least cause for hope.

latitude also gives other considerations about doing crypto over the Internet.

I’ll tell you a dirty little secret of the protocol design.
Say, you want to design a protocol with reliable delivery and/or loss detection. You will then have ACKs, send window and retransmissions. Guess what? If you don’t follow windowing semantics of TCP, then one of two things will happen on saturated links – either TCP will end up with all the bandwidth or you will.
So – surprise! – you have no choice but to design a TCP clone.

That said, there is a fundamental problem with TCP, when it’s used for carrying secure connections. Since TCP acts as a pure transport protocol, it has no per-packet authentication and so any connection can be trivially DoS’d with a single fake FIN or RST packet. There are ways to solve this, e.g. by reversing the TCP and security layers and running TCP over ESP-over-UDP or TLS-over-UDP (OpenVPN protocol). This requires either writing a user-space TCP library or doing some nasty tunneling at the kernel level, but even as cumbersome as this is, it’s still not a reason enough to re-invent the wheel. Also, if you want compression, it’s readily available in TLS or as a part of IPsec stack (IPcomp). If you want FEC, same thing – just add a custom transform to TLS and let the client and server negotiate if to use it or not.

I mean, every network programmer invents a protocol or two in his lifetime. It’s like a right of passage and it’s really not a big deal.

I learned stuff today. I hope you guys learned something too.

Techzing interview

Earlier this year I launched my book on hacker news which resonated very positively with the hacker crowd.  This HN exposure landed me an  interview on the TechZing podcast  to discussed my textbook project. Even though it was an hour and a half long interview, there were a some things that we didn’t get to discuss. I want to take the moment now to write down my observations about the textbook business and the educational market.

This blog post is organized with the best stuff at the top so feel free to trail off at any point.

Insights

The most important things I’ve learned about the textbook business:

  1. Writing is tough, but writing down lecture notes after a lecture is easy.
  2. Teaching students is gold. By interacting with your students 1-on-1 you get feedback on your explanations.
    If you are lucky you will get a “Sorry, I didn’t get that”, which allows you to iterate.
  3. People still appreciate the printed book. Some people are willing to pay good money for a PDF.

Opportunities

Print-on-demand and eBook technology allow for everyone to publish and sell books. This is a revolution on a Gutenberg scale. One of the forefathers of the Internet/WWW, when asked about the motivation behind his inventions said he did it “so people will be able to earn a living from the fruits of their intellectual labour.”  We have now finally reached this moment where this idea is practical.  Could books be the missing monetization strategy for the Internet?

What have been traditionally two markets—the general audience and the educational market—are now becoming a single market of people who want to learn. Lord knows there are things to learn out there so there is an opportunity for knowledge products for people who want to learn. The key monetization routes will be through selling organized knowledge as textbooks, ebooks, or apps.

I used the term revolution above and I stand by this choice of wording because this is what we call it when a value chain collapses from six-plus levels to three levels. The value chain in the “book business” previously looked like this:

author
  __editor
       __copy-editor
            __typesetter
                 __printer
                     __distributor[1..]
                           __book store 
                                __client

With print-on-demand the new book business will look like this:

 
author -- printer -- shipping -- client
^^^^^^                           ^^^^^^ 

Let us call this “author centered” publishing. From now on, authors can expect to get up to 50% of the profits instead of 10% (which could be as low as 5% of the list price). Good times for authors. Incentive-giving-to-move-to-a-new-publisher times.

Even amazon looks like a dinosaur in this context:

author
  __editor
       __copy-editor
            __typesetter
                 __printer
                      __amazon
                          __shipping
                               __client

Why do you need the warehouse to store all the books? Why not ship from the printer?

There is one element in the traditional publishing value chain that we must keep. Copy editing is actually very important because you really want someone to go through
your writing and fix mistakes in it. You can use your target audience (crowdsource copy-editing), but nothing beats professional services.

OK, so you want to see the future of publishing? Here it is:

 author -- (1) pub.srvc. -- (2) printer  -- shipping -- client
                         _ (3) booksite -- client

 

The opportunities are (1) for small publishing houses (copy editor + creative person for covers + latex guy) to really come-in and take over the entire market within a couple of years. You could also have larger publishers who focus on marketing the book to certain audiences etc.

Opportunity (2) is for new print-on-demand shoppes to come up (compete with lulu.com and lightning source). These giants have as their main advantage the established processes they have in place, but how difficult would it be to build an “Espresso Book Machine”-like system based on a quality BW laser printer (think buying toner in gallon tubes at costco 😉 and some automation. The competitive advantage of a small print shop would be that they offer pick up service (0\$ shipping). Currently lulu charges you 6\$ for shipping to Canada (9\$ for 2 books, 12\$ for three books, …, 3+3n.) Shipping within the states is \$5 which is better, but still not free. In particular for printing small books (100-200pp) it would not make sense to order from lulu. They would charge you 5\$ for the printing and another 6\$ for the shipping. Your cost 11\$. If you go to a local print shoppe, they will charge you 7\$ for printing. Same product, half price.

The third opportunity is for high-level editorial services (think curation of content) which would collect book recommendations and let authors and readers interact. Ideally there should be independent “book blogs” for discovery of new content — not marketplaces. Something must be done about the current appstore monopoly. Every app you develop relying on Apple for your distribution is feeding the monster at 30%. Every web app you develop based on Google or FB apis could stop working tomorrow if the API is retired. Go get hosting somewhere and build your own website. Don’t depend on anyone. Okay sorry I got a little off the topic of textbooks. Let’s get back on topic.

I was telling you guys about the book and stuff from the interview. One thing which we talked a lot about was the hacker news launch.

The HN launch

I told Jason how surprized I was when I got 30 000 visitors in one day and how I didn’t get up from my chair for one day. There were roughly 7000 people who clicked on one of the modals. Of these 300 people bought the book in print. By the evening of Jan 1st and into Jan 2nd there were also 100 PDFs purchased from gumroad.

 

I still working out the numbers (conversion rates) and I don’t want to get too hyped up about them (ok ok, 7k –> 300 = 4.3%) because the HN audience is really VERY sympathetic to the product. I am not sure if everyone else on the internet will like it as much. 

(SIDENOTE: I am finding it hard to get the analytics I want for the book pageGA reports analytic en masse so I cannot see what individual visitors did when they came to  the site. I have basic questions I need answers for and it seems like the current state of analytics is very unimpressive (relative to my expectations). Here is what I would like to know:

  1. Which modal my visitors looked at before deciding to continue onto lulu.com/shop or gumroad?
  2. Which of the 800 people who clicked through to lulu.com/shop are the 300 that actually ended up buying the book?
  3. Which sections did they read (scroll to and stay for 4secs+)?
  4. What “path” did each visitor follow through the modals? (subquestion: did anyone see the apg-get install mechanics? did anyone see the integral calculus modal? )

Are there solutions for these? I think the only way I can have end-to-end information is if I run the whole show. If I want to have information about converstions I must build my own shopping cart. Wait, we are on the Internet — I can just submit a feature request to lulu.com support and gumroad support. I am working on the full writeup of the launch experience here which will have more graphs and numbers. (/SIDENOTE)

I got a lot of feedback from the discussion on hacker news. People really like the idea. The tech crowd of Hacker News is precisely the kind of crowd is interested in learning about advanced math and physics. Many programmers learn the about calculus in mechanics at University but never actually understood these subjects. This is way when the no bullshit guide to mass in physics the really wanted and the 29 dollars price range is definitely not an obstacle for them. Several people also asked for a PG 13 version cleaned up with out of cities in the references to park and alcohol. This is definitely something I will look into it because no told jokes need to be about these subjects. We can stick to the political stuff and the joke about the investment banker being dropped off a building.

What is the goal of the book?

The goal of the book and more generally of Minireference Co. is to teach. Teach students how to get rid of the exam stress when they’re doing their studies. If you know the material really well, then there is nothing tricky that the teacher
can do on the final. Understanding trumps memorization any day of the week. A secondary goal is to teach math to adults, grown ups, so they can let go of their math complexes. There is no reason why a forty year old person should avoid conversations about math and feel uncomfortable when their teenage daughter or son asks them about the solutions to a quadratic equation.

The third goal is to prevent the next generation of analytic reminded youth from going into the defence, pharmaceutical and finance sectors, which I consider to be detrimental to society. I grew up listening to Rage Against the Machine and I feel it is my duty to continue their work in educating the next generations about the system. By situating analytical knowledge in the context of the current world geopolitical situation, it is my hope that the next generation of Einsteins, Gates, Pages, and Zuckerbergs will make informed and moral choices. With knowledge comes responsibility, and I don’t want my students to think about the numbers without understanding what the numbers represent in the real world.

Textbook market

There are a couple of intrenched companies in the publishing world (the big five). Mainstream publishers in the educational market produce textbook that are so expensive, that we can talk about a textbook racket. The readers, subject to their teachers authority, are forced to buy specific textbooks, often at an exorbitant prices > \$100. Mainstream textbooks are also too long and full of fluff like full-page photos designed to pad the pages and impress the student with the “high endness” of the 1000-page publication.  Mainstream textbooks are the kind of product which is the signed by committee. They’re thick and boring.

On the other hand there are several positive things about textbooks.  Irrespective of the widened usage of electronic formats, the “book format” remains the primary medium of intellectual discourse, of which textbooks are a subset. Textbooks are old technology, but good technology. Textbook, as a mean for acquiring knowledge, are better than most educational resources produced for the web.  And it’s not just eBooks, print is here to stay because students don’t like the idea of ebooks replacing textbooks.  Having a PDF to go along with your printed textbook is definitely a feature, but not as a replacement.

 

Business model

The business model for Minireference Publishing Co. is quite simple: we sell math and science textbooks and PDFs. The specifics of the book “container” are not important. What is important and of value is that we offer an “information distillation” service: complicated science subjects are presented and explained in a concise coherent narrative, including all prerequisites. Instead of reading 100 wikipedia pages to learn about calculus in a month, students can read one chapter in the No BS guide and pick up the same material in a week. 

Backstory

During the interview, I had a chance to give the full story about the genesis of the book. At 7min40sec in the interview, I say how I started from a collection of notes on advanced physics subjects and that at some point decided to make those notes into a book. Jason replies to this jokingly “Wow that is a big jump!” but I totally missed his joke and just kept on blabbing.

Pivot 1: TOO ADVANCED. There are not that many physicists. We need to go for something more mainstream. New product will be a mini-reference book of formulas for all of science.

Pivot 2: FORMULAS ARE NOT ENOUGH to learn. Let’s have the formulas, but add enough context and explanations to explain where the formulas come from and how they are used.

Once you have the idea… It took two years and 200 commits. It wasn’t high intensity work: I just wrote down lecture notes and my favourite explanations after teaching. During the summer of 2012, I worked intensely to tie together and organize all the material into a coherent story with a beginning (solving equations), a middle (use equations to predict the motion of objects in physics), and an end (learn where the equations of physics arise from calculus).

Product

What is special about this book is the deed forms contains a complete dependency graph of topics. Each subject is explain along with all the prerequisite material. 

Another thing special about the book is its conversational tone. The narration in the book switches from serious to joke mode and back to serious again, and is intended to keep the reader engaged.  Everyone needs a little brake after learning pages and pages of formulas… 

Technology used

During the interview, I didn’t get a chance to discuss the technology stack I used to generate the book. The book started as a bunch of text file in dokuwiki. I then used the dokutexit plugin to export the book as LaTeX. 

Another important tool for the production of the book has been to use the text-to-speech tool in Mac OS X for proofreading. It allowed me to catch lots of mistakes and quickly. 

I use lulu.com for print-on-demand and gumroad.com for the PDF distribution.

Future

Some future directions for the development of the book are:

  • Finish the linear algebra textbook
  • Write Tome II on electricity and magnetism and vectors calculus
  • Future plans: Write a book about probability and stats 
  • Future plans: Make a No BS guide to Python and JavaScript

Speaking of JavaScript I’m currently exploring the using khan-exercises framework so I could offer practice problems on the site.

The main challenges we face right now is marketing the book to a wide audience.

UPDATE: Since the publication of this post, the No Bullshit guide to math and physics has been improved and revised several times. Sales going Okay. Need more word of mouth

 

Big data and R

Yesterday, I went to a Montreal meetup about R. The event was attended by quite a few people and the good people of Bolidea offered us beer and pizza. The talk was by the Wajam team and discussed how they make use of R for business analytics and system monitoring.

Instead of simply checking basic data like clicks, number of search and API calls, they combine all this data into a “health indicator” value which is much more accurate at predicting when intervention is required. Basically, dashboards are good but dashboards that can run machine learning algorithms are better.

Their workflow centers around MySQL as the main data store. Web servers et al. send logging information to Scribe for aggregation and all the data is then written to MySQL. The main stats/ML stuff they use for business intelligence is written in R. R pulls the data from MySQL and then produces report graphs and alerts. All this is automated through cronjobs. They said they are not going for “realtime” but they have some jobs running every minute, which is near-realtime.

It was all very cool stuff to hear about, but I was hoping to see some R code during the presentation or a demo.
Nevertheless, after the talk an interesting discussion followed up which got more technical.

Some of things mentioned were:

  • Pig: an SQL-like query language which converts your queries into map reduce jobs to run on HDFS (the Hadoop Distributed File System). Apparently it is very good. Listening to the guys talk about it made it sound like handling 50TB of data is just as easy as handling 1GB on your computer…
  • There was a discussion about running R in parallel but I am not sure about which packages they were talking about. The other day I saw a new project on HN also… so interesting things are happening on that front. Using such tools one could run “exploratory analyses” on the whole dataset instead of a subset which fits on your machine.
  • There is no central package management repository. The makers of R want to preserve the spirit of “scientific publication” spirit and don’t want to become software developers. In this spirit, when creating an R package you have to include a documentation tex file: think I am publishing a paper with some code attached.
    The process for approval to CRAN takes time so some people post their stuff on github.
  • Speaking of documentation, they talked about some literate-programming-like tools: sweave,roxygen and knitr.
    This is kind of cool — especially with the markdown support.
    I imagine this could be very useful for writing stats tutorials.
    Hey what about a minireference for stats?
  • Using Shiny it should be possible to make a nice web-app that teaches the basic concepts of Stats in a very short time. Of course you could make it available in print also, but an interactive version would be much better I think. Sales from the book, web tutorial (say 30% of the material) for free.
  • Speaking of books. One of the members of the audience said that there is an opportunity for writing a book on R.

    The old me would be like “hey I can learn about R and then write a minireference for R” book, but I know better now. Focus on math and phys! Don’t spread your energy too much. How could you teach people if you just learned the subject? Textbooks should be written by people who are at least two levels more advanced than the intended audience. You should know X, know what X is used for and also know what the stuff X is used for is used for. The reason is that people who can see two hops ahead in the graph of knowledge will have better answers to offer for the “why do I need to know this?” question.

  • Finally something related to the big data thread of the discussion here that I heard about this morning on hacker news. Drake is a way to automate handling large datasets using the “makefile” interface. There were links and discussion of other projects on HN. You need to install Clojure to run drake.

Ok! Link offload complete. Now I can finally step away from the computer and stretch a bit. You, my dear reader, should do the same. Go grab a glass of water or something and work some stretches in along the way.

Wow it is 2PM already! HN I want my morning back!!!

Hacker news launch

Two weeks ago, I posted the book on hacker news. There was an tremendous amount of interest on the first day (20k visits in one day!)
and plenty of good (i.e., critical) feedback. With this post, I want to take a moment and record my impressions from surfing the hacker news wave.

Conversion rates

1. Roughly 33000 people showed up on the “product” page.
2. Of these 7000 clicked on at least one of the modals (engagement).
3. About 1761 of them clicked on the “Buy Book” and went
to the print-on-demand site (lulu.com).
4. Of these 264 ordered the book.

The engagement rate is 7000/33000 = 21%.
The percentage of engaged visitors who clicked “Buy Book” is 25% (=1761/7000).
The final step conversion rate is 15% (=264/1761).
Overall we have 0.21*0.25*0.15 = 0.78% conversion from visitor to client.
Is this good or bad?

Perhaps the more interesting metric is the conversion rate
of engaged visitors (clicked at least on one modal) to client,
which is 3.75%.

A back-of-the envelope calculation tells me that my expected earning
per engaged visitor is about 50 cents. I feel confident that I will be
able to find buy some education keywords for

TODO: try mixpanel (GA is a PITA: full path of the referral url plz!), invest and test adwords.

Book product

The book — as a product — works. Even if there are

TODO: fix typos, add math exercises, add physics exercises.

PDF product

Some of the engaged visitors are also going to the PDF: 19% (= 847/4500).
Then there is another factor of 15% = (50+37+19+7+7+3+3)/847 = one week of PDF sales / one week of clicks to gumroad.
Thus 2.8% of engaged visitors went on to buy the PDF.

Overall this means that 6.55% = 3.75% + 2.8% of my engaged visitors go on to become clients.
Now that is cool!

similar stories

Tools used

New landing page

If you visit minireference.com you will now see a new design which conforms to the standard “book product webpage” format. I am very pleased with the result, which was an attempt to mimic other good book product pages.

The design process took me about three weeks. Most of the time was spent on the copy editing. The ability to “put stuff on the page” you have with html + css is much more powerful that LaTeX. And with webfonts becoming the norm now, one cam make very beautiful sites very quickly.

Check it out: minireference.com

The web we still have

The facebookification of the Internet brings with it a stupidification of the content that people produce and share.
The old web was about blog posts (long, though-out pieces of writing) which automatically form links to each other (through trackback) so that a conversation can emerge without the need for a centralized service.

Trackbacks are awesome! For example, I can make this post appear on quora if I embed some javascript (their embed code) which will ping the quora server:
Read Quote of Ivan Savov’s answer to Machine Learning: Does Topic Modeling need a training stage when using Gibbs sampling? And why does it work? on Quora

We need to cherish this kind of distributed technology, because it is the way out of the walled gardens. They are the living proof that you can have social without central.

LDA, BTW, is short for Latent Dirichlet Allocation which is a powerful way to classify documents according to the topics they contain.

Strang lectures on linear algebra

Professor Gilbert Strang’s video lectures on Linear Algebra have been recommended to me several times. I am very impressed with the first lecture. He presents all the important problems and concepts of LA in the first lecture and in a completely as-a-matter-of-fact way.

The lecture presents the problem of solving n equations in n unknowns in three different ways: the row picture, the column picture and the matrix picture.

In the row picture, each equation represents a line in the xy plane. When “solving” these equations simultaneously, we are looking for the point (x,y) which lies on both lines. In the case of the two lines he has on the board (2x-y=0 and -x+2y=3) the solution is the point x=1, y=2.

The second way to look the system of equations is to think of the column of x coefficients as a vector and to think of the column of y coefficients as another vector. In the column picture, solving the system of equations requires us to find the linear combination of the columns (i.e., $x$ times the first column plus $y$ times the second column) gives us the vector on the right hand side.

If students start off with this picture, they will be much less mystified (as I was) by the time they start to learn about the column space of matrices.

As a side benefit of this initial brush with linear algebra in the “column picture”, Prof. Strang is also able to present an intuitive picture for the formula for the product between a matrix and a vector. He says “Ax is the combination of the columns of A.”  This way of explaining the matrix product is much more intuitive than the standard dot-product-of-row-times-column approach. Who has seen them dot products? What? Why? WTF?

I will definitely include the “column picture” in the introductory chapter on linear algebra in the book. In fact, I have been wondering for some time how I can explain what the matrix product Ax. I want to talk about A as the linear transformation TA so that I can talk about the parallels between $x$, $f:R \to R$, $f^{-1}$ and $\vec{v}$, $A$, $A^{-1}$. Now I know how to fix the intro section!

Clearly you are the master of the subject. It is funny that what started as a procrastination activity (watching a youtube video to which I just wanted to link to) led to an elegant solution to an old-standing problem which was blocking my writing. Sometimes watching can be productive 😉  Thank you Prof. Strang!

Target revenue

I did a little calculation regarding what kind of sales figures I would need to make it to the 100k income range (which is my current standard for “success” in a technical field). If I can make deals with 100 Universities, and ship ship 100 copies of the book to each of them, then I am done:

I think it is totally doable with the MATH and PHYSICS title alone within the next couple of years. So fuck the job world. I am doing my own thing!

Showing off with python

2:57AM on a Monday. I have to be up at 8AM. The faster I get the job done the more sleep I get. Sounds like the kind of thing to motivate a person.

TASK: Parse an access.log file and produce page visit trace for each visitor. Ex:

11.22.33.90 on Monday at 3pm   (Montreal, Firefox 4, on Mac OS X):
  /contents          (stayed for 3 secs)
  /derivatives       (stayed for 2m20sec)
  /contents          (6 secs)
  /derivative_rules  (1min)
  /derivative_formulas  (2min)
  end

I had already found some access.log parsing code,  and setup a processing pipeline from last time I wanted to work on this. Here is what we have so far.

3:45AM. Here is the plan. All the log entries are in a list called entries, which I will now sort and split by IP.

4:15AM. Done. Though I have to cleanup the output some more.