Network protocols discussion

There is an interesting discussion about network protocols going on at hacker news. In just a few posts some very knowledgeable people stepped in to explain what is going on. I saved the URL in the Links section of /miniref/comp/network_programming but it made me think about how much better informal explanation is to formal explanations.

Here you have hackers talking to other hackers. Whether it is a javascript-wielding young blood or an old dude who writes server-side stuff in C, all these people need to send data on the network and know of some some protocols for doing that: TCP:HTTP for web dev mostly while systems people probably think more in terms of (TCP:*).

Everyone jumps in to check what is going on on the HN discussion. And then suddenly learning happens. The discussion is a bit disorganized (shown as discussion tree) but let me give you the walk through.

First chetanahuja tells it like it is:

IP layer is for addressing nodes on the internet and routing packets to them in a stateless manner (more or less… I’m not counting routing table caches and such as “state”). TCP […] are built on top of the IP layer to provide reliable end-to-end data transfers in some sort of session based mechanism.

The key thing to know is that IP is a best-effort protocol. If you send an IP packet from computer A to computer B, the network will try to deliver it but there are not guarantees that it will succeed. No problem though, we can build a reliable protocol (the Transmission Control Protocol) on top of the unreliable one. This is why the Internet is usually referred to as TCP/IP, not just IP even thought IP stands for Internet Protocol. TCP/IP is the internet made reliable.

TCP is important because it allows for reliable communication: When A sends some data to B, B will reply with an ACKnowledge packet to tell A when he received the data. Reliability comes from the fact that A will retransmit all the packets for which no ACKs are received (the sender assumes these packets got lost). The other thing the TCP protocol gives you is the notion of a port — a multiplexing mechanism for running multiple networks services on the same machine. Port 80 is the HTTP port (web browsing). When you type in 11.22.33.44 into the browser, your browser will send a TCP packet to port 80 on 11.22.33.44:80, where the TCP port is separated by a colon. Another important port is port 25 (SMTP) which is used for email. When an email to user@11.22.33.44 is to be delivered, a connection will made to 11.22.33.44:25.

Sometimes you don’t want to have so much transmission control overhead. Imagine that you are sending some voice data so you want to send as many packets (maybe use forward error correcting codes) but you don’t want to bother with retransmission of lost packets. The voice data simply isn’t useful if it is not delivered on time. In such cases we would prefer a more basic protocol which doesn’t isolate us from the unreliability of IP.

This protocol is called UDP and it is really barebones. UDP is basically IP with some added port numbers (and error detection checksums). From now on we can’t simply talk about “port 80″ on host but we must say whether we mean port 80 in the tcp protocol (TCP:80) or in the udp protocol (UDP#80).
Speaking of UDP ports, let me tell you about a really important one. The (206.190.36.45, 98.138.253.109, 98.139.183.24).” My browser will connect to one of these IPs (over HTTP = TCP:80) chosen at random.

Another useful UDP service is DHCP (dynamic host configuration protocol). This is the magical process by which you are automatically assigned an IP address when you join a network. DHCP bootstraps the communication (you joined a new network, ok, but what is the network number for this network? who should you talk to? What IP address should you respond to?). Either you know this information (someone gave it to you on a piece of paper [sysadmin]) or you can make a DHCP request (which is UDP broadcast) and a DHCP server will respond to you and assign you an IP address, tell you what the network number is and tell you which router to talk to to go towards the Internet (the route).

Ok so everyone knows the basics now, but HN doesn’t just give you the basics — it gives you the advanced stuff too. What are the current problems with TCP/IP?

advm says:

Maybe TCP’s issues aren’t apparent when you’re using it to download page assets from AWS over your home Internet connection, but they become apparent when you’re doing large file transfers between systems whose bandwidth-delay products (BDPs) greatly exceed the upper limit of the TCP buffers on the end systems.
This may not be an issue for users of consumer grade Internet service, but it is an issue to organizations who have private, dedicated, high-bandwidth links and need to move a lot of data over large distances (equating to high latency) very quickly and often; CDNs, data centers, research institutions, or, I dunno, maybe someone like Google.
The BDP and the TCP send buffer size impose an upper limit on the window size for the connection. Ideally, in a file transfer scenario, the BDP and the socket’s send buffer size should be equal. If your send buffer size is lower than the BDP, you cannot ever transfer at a greater throughput than buffer_size / link_latency, and thus you cannot ever attain maximum bandwidth. I can explain in more detail why that’s true if you want, but otherwise here’s this: http://www.psc.edu/index.php/networking/641-tcp-tune
Unfortunately for end systems with a high BDP between them, most of the time the maximum send buffer size for a socket is capped by the system to something much lower than the BDP. This is a result of the socket implementation of these systems, not an inherent limitation of TCP.
An accepted user-level solution to this issue is to use multiple sockets in parallel, but that has its own issues, such as breaking fairness and not working well with the stream model. I can explain this more if you want, too, just let me know.

There are other problems with TCP, such as
slow start being, well, slow to converge on high-BDP networks,
bad performance in the face of random packet loss (e.g., TCP over Wi-Fi),
congestion control algorithms being too conservative (IMO, not everyone needs to agree on the same congestion control protocol for it to work well, it just needs to converge to network conditions faster, better differentiate types of loss, and yield to fairness more quickly),
TCP features such as selective ACKs not being widely used,
default TCP socket settings sucking and requiring a lot of tuning to get right,
crap with NAT that can’t be circumvented at the user level (UDP-based stream protocols can do rendezvous connections to get around NAT), and more.

People write whole papers on all these things. Problem is most of the public research exists as a shitty academic papers you wouldn’t probably bother reading anyway, and most of the people actually studying this stuff in-depth and coming up with solutions are private researchers and engineers working for companies like Google.

The last paragraph is a good reason why there should be a “No BS guide to computer systems”. I bet I can show services at all layers of the OSI stack and really go into details. The sockets are pretty cool.

pjscott describes the crypto stack in just one sentence:

Their crypto stuff looks pretty reasonable. Key exchange uses ECDH with either the P-256 or curve25519 polynomials. Once the session key is established, it’s encrypted with AES-128 and authenticated with either GCM or HMAC-SHA256. None of this is implemented yet, but it’s at least cause for hope.

latitude also gives other considerations about doing crypto over the Internet.

I’ll tell you a dirty little secret of the protocol design.
Say, you want to design a protocol with reliable delivery and/or loss detection. You will then have ACKs, send window and retransmissions. Guess what? If you don’t follow windowing semantics of TCP, then one of two things will happen on saturated links – either TCP will end up with all the bandwidth or you will.
So – surprise! – you have no choice but to design a TCP clone.

That said, there is a fundamental problem with TCP, when it’s used for carrying secure connections. Since TCP acts as a pure transport protocol, it has no per-packet authentication and so any connection can be trivially DoS’d with a single fake FIN or RST packet. There are ways to solve this, e.g. by reversing the TCP and security layers and running TCP over ESP-over-UDP or TLS-over-UDP (OpenVPN protocol). This requires either writing a user-space TCP library or doing some nasty tunneling at the kernel level, but even as cumbersome as this is, it’s still not a reason enough to re-invent the wheel. Also, if you want compression, it’s readily available in TLS or as a part of IPsec stack (IPcomp). If you want FEC, same thing – just add a custom transform to TLS and let the client and server negotiate if to use it or not.

I mean, every network programmer invents a protocol or two in his lifetime. It’s like a right of passage and it’s really not a big deal.

I learned stuff today. I hope you guys learned something too.

Big data and R

Yesterday, I went to a Montreal meetup about R. The event was attended by quite a few people and the good people of Bolidea offered us beer and pizza. The talk was by the Wajam team and discussed how they make use of R for business analytics and system monitoring.

Instead of simply checking basic data like clicks, number of search and API calls, they combine all this data into a “health indicator” value which is much more accurate at predicting when intervention is required. Basically, dashboards are good but dashboards that can run machine learning algorithms are better.

Their workflow centers around MySQL as the main data store. Web servers et al. send logging information to Scribe for aggregation and all the data is then written to MySQL. The main stats/ML stuff they use for business intelligence is written in R. R pulls the data from MySQL and then produces report graphs and alerts. All this is automated through cronjobs. They said they are not going for “realtime” but they have some jobs running every minute, which is near-realtime.

It was all very cool stuff to hear about, but I was hoping to see some R code during the presentation or a demo.
Nevertheless, after the talk an interesting discussion followed up which got more technical.

Some of things mentioned were:

  • Pig: an SQL-like query language which converts your queries into map reduce jobs to run on HDFS (the Hadoop Distributed File System). Apparently it is very good. Listening to the guys talk about it made it sound like handling 50TB of data is just as easy as handling 1GB on your computer…
  • There was a discussion about running R in parallel but I am not sure about which packages they were talking about. The other day I saw a new project on HN also… so interesting things are happening on that front. Using such tools one could run “exploratory analyses” on the whole dataset instead of a subset which fits on your machine.
  • There is no central package management repository. The makers of R want to preserve the spirit of “scientific publication” spirit and don’t want to become software developers. In this spirit, when creating an R package you have to include a documentation tex file: think I am publishing a paper with some code attached.
    The process for approval to CRAN takes time so some people post their stuff on github.
  • Speaking of documentation, they talked about some literate-programming-like tools: sweave,roxygen and knitr.
    This is kind of cool — especially with the markdown support.
    I imagine this could be very useful for writing stats tutorials.
    Hey what about a minireference for stats?
  • Using Shiny it should be possible to make a nice web-app that teaches the basic concepts of Stats in a very short time. Of course you could make it available in print also, but an interactive version would be much better I think. Sales from the book, web tutorial (say 30% of the material) for free.
  • Speaking of books. One of the members of the audience said that there is an opportunity for writing a book on R.

    The old me would be like “hey I can learn about R and then write a minireference for R” book, but I know better now. Focus on math and phys! Don’t spread your energy too much. How could you teach people if you just learned the subject? Textbooks should be written by people who are at least two levels more advanced than the intended audience. You should know X, know what X is used for and also know what the stuff X is used for is used for. The reason is that people who can see two hops ahead in the graph of knowledge will have better answers to offer for the “why do I need to know this?” question.

  • Finally something related to the big data thread of the discussion here that I heard about this morning on hacker news. Drake is a way to automate handling large datasets using the “makefile” interface. There were links and discussion of other projects on HN. You need to install Clojure to run drake.

Ok! Link offload complete. Now I can finally step away from the computer and stretch a bit. You, my dear reader, should do the same. Go grab a glass of water or something and work some stretches in along the way.

Wow it is 2PM already! HN I want my morning back!!!

Showing off with python

2:57AM on a Monday. I have to be up at 8AM. The faster I get the job done the more sleep I get. Sounds like the kind of thing to motivate a person.

TASK: Parse an access.log file and produce page visit trace for each visitor. Ex:

11.22.33.90 on Monday at 3pm   (Montreal, Firefox 4, on Mac OS X):
  /contents          (stayed for 3 secs)
  /derivatives       (stayed for 2m20sec)
  /contents          (6 secs)
  /derivative_rules  (1min)
  /derivative_formulas  (2min)
  end

I had already found some access.log parsing code,  and setup a processing pipeline from last time I wanted to work on this. Here is what we have so far.

3:45AM. Here is the plan. All the log entries are in a list called entries, which I will now sort and split by IP.

4:15AM. Done. Though I have to cleanup the output some more.

Math problems solved on computers

Here is a link to an interesting talk/text about how we could get people to like MATH: use computers for the tedious steps.  http://news.ycombinator.com/item?id=4376306

He has a good point, but I still think children should know how to do math with only pen and paper.

Once you see what the computer can do (solve quadratic equation), then you should be able to solve it on your own too. View source, yes, but also explain source.

Come to think of it “explain source” is quite good as a name for what Minireference is meant to be. This could be the turning point in terms of naming. Come to explain source dot com and learn about anything. For any function/operation/algorithm out there, I will give you a js/py implementation and explain the source code from first principles. For example, a physics simulation engine could be used to teach you classical mechanics, wink wink.

Ok. So I hear you saying now “I am reading this blog and there is one guy saying that computers are good and another guy saying that people need to understand the details and be able to do things with pen and paper. A modernist giving a TED talk and the blogger is some sort of idealist who wants people to actually understand math.” and wondering where the story may be going.

In reality, there are not two opposing views. Both points are important. Computers need to be used to do cool math and physics stuff and demonstrations. Afterwords, the interested student can explore what is under the hood of each function/operation/algorithm whenever he wants to learn how it works. But definitely one should always start with the applications of any given concept. This is the pitch part of any lesson. Why would I want to know about that? People really don’t care about what you have to say in general, and on top of this you will be teaching them math which generally doesn’t get warm receptions as people have phobias and complexes. The pitch therefore becomes that much more important.

Ok I will end this with the summary. To teach concept c, you have to start with a really good story about c, to get people interested, then you show some cool applications of c (possibly using a computer for more wow effect) and then you explain what c is all about. Most importantly (in math) you need to discuss the connections between c and other stuff.