The economics of writing books people want

I just saw this article on priceonomics about profits from books(via HN). It hits some facts right on the nail, gets others wrong, and finally misses the main point. Let me quote here the best parts (with comments) and give you my interpretation of what print-on-demand and ebooks will bring about.


With publishing houses only offering a royalty rate of between 6-15%.

Let’s say more like 2%–5% because this is from the *profits*, not a percentage of the sale price.


Email marketing was critical.

You got that right!


Self-publishing is arguably a wonderful innovation. It is historically unprecedented, providing the means for millions of people around the world to bypass the elitism of the publishing establishment […] In terms of access to a worldwide marketplace, it is fantastically democratic. In terms of access to financial success, it is far from it.

The OP misunderestimates the power of a democratic marketplace. There is a process of natural selection for book products. It used to act on a time scale of years and decades in the old days, but with the ease that information spreads now, I predict increased competition on the marketplace and unprecedented advances in book quality. As authors start to earn money from writing books, better books will be written. Also, the higher margins of self-publishing (think 50%) make solo authors and mini-publishers much more competitive that the old dinosaurs.

It is my hope that if books become better, more youth will escape their brain being crushed like a jujube by fast-moving pixels activities and learn to think bigger thoughts—hopefully constructive ones. Could an increase in interest in books bring about a new golden age of reason?

That would definitely be nice; enough with the consumerism, warring, and financial schemes. Let’s have another renaissance or something…

Problem sets ready

Sometime in mid-December I set out to create problem sets for the book. My friend Nizar Kezzo offered to help me write the exercises for Chapter 2 and Chapter 4 and I made a plan to modernize the calculus questions a bit and quickly write a few more questions and be done in a couple of weeks.

That was four months ago! Clearly, I was optimistic (read unrealistic) about my productivity. Nizar did his part right on schedule, but it took me forever to write nice questions for the other chapters and to proofread everything. After all, if the book is no bullshit, the problem sets must also be no bullshit. I’m quite happy with the results!

noBS problem sets: letter format or 2up format.

Please, if you find any typos or mistakes in the problem sets, drop me a line so I can fix them before v4.1 goes to print.

Tools

In addition to work on the problem sets, I also made some updates to the main text. I also developed some scripts to use in combination with latexdiff to filter only pages with changes. This automation saved me a lot of time as I didn’t have to page through 400pp of text, but only see the subset of the pages that had changes in them.

If you would like to see the changes made to the book from v4.0 to v4.1 beta, check out noBSdiff_v4.0_v4.1beta.pdf.

Future

Today I handed over the problems to my editor and once she has taken a look at them, I’ll merge the problems into the book and release v4.1. The coming months will be focussed on the business side. I know I keep saying that, but now I think the book is solid and complete so I will be much more confident when dealing with distributors and bookstores. Let’s scale this!

Ghetto CRM

Say you want to extract the names and emails from all the messages under given tag in your gmail. In my case, it’s the 60 readers who took part in the “free PDF if you buy the print version” offer. I’d like to send them an update.

I started clicking around in gmail and compiling the list, but Gmail’s UI is NOT designed for this, you can’t select-text the email field because a popup shows up, and yada yada…. If you’re reading this, you probably got to this post because you have the same problem so I don’t need to explain.

Yes this is horribly repetitive, and yes it can be automated using python:

import imaplib
import email
from email.utils import parseaddr
import getpass


user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")

m = imaplib.IMAP4_SSL('imap.gmail.com', 993)    
m.login(user,pwd)    

# see IMAP client
# m
# see tags (i.e. mailboxes) using
# m.list()


# select the desired tag
m.select('miniref/lulureaders', readonly=True)
typ, data = m.search(None, 'ALL')


# build a list of people from (both FROM and TO headers)
people = []
for i in range(1, len(data[0].split(' '))+1 ):
    typ, msg_data = m.fetch(str(i), '(RFC822)')
    for response_part in msg_data:
        if isinstance(response_part, tuple):
            msg = email.message_from_string(response_part[1])
            name1, addr1 = parseaddr( msg['to'] )
            name2, addr2 = parseaddr( msg['from'] )
            d1 = { "name":name1, "email":addr1 }
            d2 = { "name":name2, "email":addr2 }
            people.extend([d1,d2])
            # uncomment below to see wat-a-gwaan-on 
            #for header in [ 'subject', 'to', 'from' ]:
            #    print '%-8s: %s' % (header.upper(), msg[header])
            #print "-"*70

# lots of people, duplicate entries
len(people)

# filter uniq
# awesome trick by gnibbler 
# via http://stackoverflow.com/questions/11092511/python-list-of-unique-dictionaries
people =  {d['email']:d for d in people}.values()     # uniq by email

# just uniques
len(people)

# print as comma separated values for import into mailing list
for reader in people:
    print reader['email'] + ", " + reader['name']
    
# ciao!
m.close()


A nice question for coding interviews

I was discussing mortgage calculations with a friend today and realized this calculation would make an excellent interview question.
The problem is simple enough, but still requires some thought…

Writeup: interest-rate-calculations-using-recursion (PDF)

Source: interest-rate-calculations-using-recursion.js

If extra time, the candidate can be asked to write a solve function to solve for the payment P given the other values, e.g., solve for P in Zr(25*12,315000, 0.005,P) = 0.

Getting started with ML in python

Next week I’m interviewing for a Data Scientist position so I figured I better brush up my machine learning skills. I found some neat youtube tutorials [1,2] on using scikit-learn so I thought this would be a good place to start.

From experience, I was expecting that setting up the dev-environment with numpy, scipy, ipython notebook, etc, would take me half a day (compiling and debugging things that don’t work out of the box), but I was pleasantly surprised when a few pip commands later I had a fully functional environment. I’ve pasted the sequence of commands below for all those in case you want to learn yourself some ML too.

Create a virtualenv

The first part is to create an isolated virtualenv for the project. Think of this as “basic python hygiene”: we want to isolate the python libraries used to follow the tutorial from my system-wide python library. (For most people this is just “best practices” but in my case my system-wide site-packages contains outdated versions, and or half-broken dependencies because of the dysfunctional relationship between fink, macports, and homebrew that plays out on my computer.) To setup  a virtualenv in a given directory and activate it, proceed as follows:


$ cd ~/Projects/MLpractice
$ virtualenv pyML
$ . pyML/bin/activate # . is the same as source

Install prerequisites

Next we’ll install all the prerequisite packages and scikit-learn. Note that the command line starts with (pyML) which indicates that pip will install these packages in the pyML virtualenv and not system-wide.


(pyML)$ which python
(pyML)$ which pip

(pyML)$ pip install numpy
(pyML)$ pip install pyzmq
(pyML)$ pip install ipython[all]
(pyML)$ pip install scipy
(pyML)$ pip install pyparsing

$ brew update
$ brew install freetype
$ brew link --force freetype
$ brew install libpng
$ brew link --force libpng
$ brew install libagg
(pyML)$ pip install matplotlib
(pyML)$ pip install psutil

(pyML)$ pip install scikit-learn

Done

Now everything is ready and setup for us.
We can clone the repositories with the example code and start the ipython notebook as follows.


$ git clone git@github.com:jakevdp/sklearn_scipy2013.git
$ git clone git@github.com:ogrisel/parallel_ml_tutorial.git
(pyML)$ cd sklearn_scipy2013/notebooks/
(pyML)$ ipython notebook --pylab inline

Your default browser should open showing you iPython notebooks for the first tutorial.
Let the learning begin—both for machine and human alike!

No bullshit guide to linear algebra

I’m happy to announce the No bullshit guide to linear algebra (student edition) is ready: gum.co/noBSLA. The core chapters—the stuff that shows up on exams are done. If you have a linear exam coming up, we’ve got what you need.

For the price of a case of beer, you could have an understanding of linear algebra.

Now if you’re a cheapo like me, you’ll say “why the hell do I need to give you money, when there are free books out there?” I understand you. Perhaps you’d like this free tutorial: LA. See also MECH. By reading these short tutorials, I hope to convince you that synthesis of information (i.e. the order of the concepts and choosing an appropriate level of detail) is possible and desirable. Synthesis helps with understanding. If a subject can be summarized in just a few pages, then a full textbook on the subject shouldn’t be bigger than a couple hundred pages, including prerequisites. I call this “information distillation.”

The 1000pp+ textbooks are a scam. Don’t be duped. Get the No bullshit guide to linear algebra. It’s 1/10th the price, 1/2 the size, and 3 times better than a mainstream textbook.  In the news: [HN1], [HN2]. The price is 50% OFF until April 1st.

 

BTW, this is the second book in the “No bullshit” series. The No bullshit guide to math and physics is the first. It covers high school math, mechanics, differential calculus,  and integral calculus in 383 pages. You should definitely check it if you’re taking one of these classes.

Opportunity costs

Recently, after conversations with friends who work in industry, I’ve been questioning my “career strategy” of pursuing a textbook publishing startup. Generally speaking, the employability of a new graduate is at its peak at graduation. Industry accepts young CS graduates and tells them “Here is 70k, write this code for us” and after a few years they could be pulling in 130+k, which is prof-level income. Regardless of one’s future goals in life a little injection of cash for a person in their thirties sounds like a good thing to have. In general working is a good career move.

Using the language of economics there are opportunity costs of doing the startup thing. First there is the short term financial losses of not having a San Francisco software developer salary right now. Second, and perhaps more importantly, I may be sabotaging my career options should I ever decide to go to industry. Recruiters will ask “what did you do for the past two years?” So doing the startup thing (i.e. not doing the corporate thing) has multiple opportunities costs.

Though such thoughts do turn around in my head, I remained and remain undeterred. I just realized why—this is the inspiration for this post. There are opportunity costs with the corporate career too. This knowledge that I have fresh in my mind after teaching undergraduate math and physics for the last ten years will soon be forgotten. Certainly after two years in industry, I would not remember half the things I can recall off the top of my head right now.

So this is why, now I know, I subconsciously chose this path. We godda do this now and we’ll code later, si besoin.

Aside: I just previewed the latest linear algebra draft and it looks awesome! I’ve been slogging through the corrections during the past couple of weeks (actually months!) and I was feeling low on energy, but now that I see how close we are to the finished product I’m getting all enthusiastic again.

Thoughts and strategy for scaling distribution

I just received news from York University that the book sold out and they need replenishment. The McGill bookstore already sold out twice and I had to replenish their supplies. So in-store sales are working. I’m counting this as validation. Now let’s scale things!

I’ll have to equip the website with a “order a box of 10” option and make a deal with a fulfillment centre so they will take care of the shipping for me. How do I get into WorldCat? Who will print the book in large quantities (Lightning Source?).

I’ve been busy working so much on the Linear Algebra book and preparing exercises that I lost track of the business side of things. I’m going to finish up LA, because it is so close to being done, but I’m vetoing any work on the Electricity and Magnetism title—we’ll pick that up in October and it will be ready for January 2015.

Okay Ivansky. Put on the business hat and get things done!

A scriptable future for the Web and home servers

I’m organizing papers today, and I keep finding dev-notes and plans for my big “home server” idea about being able to run all your “cloud services” on your own hardware with all the data protection this entails. But what is easy to imagine can be difficult to bring to reality. There are a lot of technological aspects to figure out (dyndns, mail, www, filesharing, apps?), but there is also the lack of interest in privacy matters of the general public.

The freedom of computing and the Internet is a question that depends on technology but also on public relations. I recently came up with a plan for one possible way to get FOSS into homes. PR is indicated in brackets.

  • Phase 0: Develop FOSS clones for most popular cloud software. [100% done]
  • Phase 1: Non-tech-savvy users learn to deploy “own server” in the cloud based on a FOSS software stack. [2015]
    (Run your own Google with just one click! Customize and automate everything. Don’t let anyone tell you what to do on the Internet.)
  • Phase 2: Non-tech-savvy users move their existing “own servers” to run on their “home server.” [2020]
    (The Internet is distributed; be the Internet. Who got ur logs? Protect your privacy and that of your family and friends. Political discussion is not a crime. Unlimited storage—just add USB drives to the RAID. )

I think the two-step process for the home server is much more likely, even realistic. Both phases involve transitions to better features. The transition to Phase 1 will be interesting for power users, but if everything is scripted, then even non-tech users could “run their own” thing. For it to happen, we need to get to “same thing as … but with more ….”  Only after we have a mature system of own apps can we then move to Phase 2 where we say: “same thing as own, but at home.”

I’m a big believer in humanity and our ability to learn adapt and advance so I think we will be able to “domesticate” the power of computing as we previously domesticated fire and electricity.