The school of the future

It has been a long time since I last wrote something, but I haven’t been idle altogether. I’ve been planning what to do next for the books, calculating my move, so to say. It’s slowly starting to become clearer. Given my background as linux sysadmin and tutor, it only makes sense if the revolution looks to me like an open source software stack for schools. Read on to find the general strategy. Listen to this good tune if you’re missing an accompanying soundtrack; here is some more tunes in case the blog post ends up long.

I’m going to write a toolbox for every school out there to print their own textbooks and other educational material.

The best-in-class existing exercise framework is already available as open source and can be hosted on the school’s server (or virtual on AWS for $40/mo), but since we’re going to have a “school server,” wouldn’t we want to run some other stuff on there too?

I’m thinking:

  • printable exercises sheets for doing in class
  • printable problems to give as homework
  • printable exams (different version for each student)
  • print affordable textbooks (using POD service like lulu.com)

I’m happy for all the kids with iPods and computers at home, but not every school has the budget for computers, so let’s try to keep the costs to at most $20 per class for the week. A few good laser printers could keep the school running with students learning from top notch material, at the cost of a few thousand dollars per month. Toner and paper.

Okay, but where are you going to get the content.

There’s plenty of OER out there, but there could be more and it could be made more easily accessible.

Content framework

Here I must confess, I will be biased, because as a django person, I see the world through my own prism. A very good way for presenting structured content exists already, as free software. That’s the best place to start. Everything else on the content management side, and exporting printable PDFs I can script myself. The main content pipeline will be something like .md --> .tex --> .pdf; the softcover/polytexnic toolchain is very good at this.

UX

The key is to make it easy for teachers to browse, use, and contribute content. The graph structure (and possible common core math categorization) will help the browsing, the .toPDF() methods will make the content immediately usable, and the only real problem remaining is the user experience that entices teachers to contribute content. That’s the biggie. But it can be done.

Software requirement specification

Given a collection of content items (paragraphs of text, entire sections of book, exercises, or problems), the system allows teachers to assemble custom “playlists” which consists of a sequence of content items with a `.toPDF` method.

No way I’m going to let them run the high schools. We’re taking over that business too now.

Aren’t big publishers better?

The market forces will prevail. What is better for a cash-strapped school, to order some crusty mainstream algebra textbook, that may or may not be standard-compliant, but sure is long and talks to students as retards, or to demand the printing of one copy of the best free book on the subject, for 1/10 of the cost.

How will you make money?

Nothing changes really. Minireference continues to sell university-level textbooks; we just make the high school material and the toolchain free. As for the potential loss of business due to high schools printing on their own instead of buying textbooks from me, I’d call that a win overall.

Books as tools for scaling knowledge

A great quote from Richard Hamming from his talk You and your research. It highlights the importance of the “information distillation” that books play.

In this day of practically infinite knowledge, we need orientation to find our way. Let me tell you what infinite knowledge is. Since from the time of Newton to now, we have come close to doubling knowledge every 17 years, more or less. And we cope with that, essentially, by specialization. In the next 340 years at that rate, there will be 20 doublings, i.e. a million, and there will be a million fields of specialty for every one field now. It isn’t going to happen. The present growth of knowledge will choke itself off until we get different tools. I believe that books which try to digest, coordinate, get rid of the duplication, get rid of the less fruitful methods and present the underlying ideas clearly of what we know now, will be the things the future generations will value.

Digital vs. print and the future of books

I’m reading an interesting paper by M. Julee Tanner that compares the cognitive aspect of digital vs. print delivery for book-length material. In summary, the printed book is not dead!

I’ve always thought the print medium (especially typeset by LaTeX) is far superior for learning and comprehension, but I figured this was my “old timer” ways (I’m 32). It seems I’m not the only one though:

Despite decades of work by computer and e-reader engineers and designers to improve the optics, display, and ease of navigation of virtual texts, readers still have a general preference for the print presentation, especially when it comes to longer, more challenging material.

The author states many good things print books have going for them, but the most interesting to me is the following quote:

[…] the greatest difference in metacognitive strategy was also found among the users of e-readers, in their reluctance to review previously read passages by virtually turning back pages. It seems that the perceived unwieldiness of screen-tapping to turn pages did negatively impact comprehension of expository texts on the e-reader platform (Margolin et al., 2013).
Since monitoring one’s understanding while reading, reviewing previously read material if necessary, underlining, and taking marginal notes are so vital to the comprehension of more challenging texts, it is important for students and educators to know how applicable these metacognitive strategies are to virtual texts.

Indeed, think about it—if you’re reading a complicated passage in a math book, wouldn’t you want to flip back and look at the equation which you saw five pages ago? In a printed book you could do that (you could in fact leave you finger on that spread and conveniently flip between the two pages). In a PDF read on the computer, it’s also somewhat passible to flip back (though a bit imprecisely), but on an eBook reader it’s not easy to do.

Learning math/physics (or other cognitively demanding material) from an eBook reader feels a bit like I’m placed in front of a slide deck: information comes, then it’s quickly taken away, leaving me in a disorganized state of mind.

Here’s the full reference: Tanner, M. J. (2014). Digital vs. print: Reading comprehension and the future of the book. SJSU School of Information Student Research Journal, 4(2). scholarworks.sjsu.edu/slissrj/vol4/iss2/

Internet propaganda

TL;DR: The fight on the Internet is not just about True vs False, but also True vs Noise.

I’m reading this article about Internet propaganda in China, and I can’t help but wonder how much of this exists in the West. How many PR firms employ armies of paid commenters ready to intervene and vote up (or down) any content item? How many politicians employ the services of these PR firms?

I find it very interesting to dissect the tools of the System—to try to deconstruct the methods which the powers that be use to control the People. There seems to be three forces at play for any issue X. Voices in support of X, voices in opposition to X, and noise. Assuming X is something the people want, people-opposing forces (which we’ll call the System for simplicity), have at least two options to silence the X discussion:

  1. Pay shills to post opinions against X.
  2. Produce noise to drown out the X discussion altogether.

In totalitarian regimes (think Russia), official police and secret police act to suppress the supporters of X, while in “open societies” (the West) corporate media control (which is a mix of options 1. and 2. above) are used to suppress X.

It seems China’s regime is siding with the Western approach for their Internet censorship. Here’s a quote from the above article:

It’s not clear the degree to which paid comments influence the conversation the way Communist Party members hope they do. Xiaolan says the paid commenters could be adding noise to the conversation simply to drown out normal people’s desire to converse online.

The noise strategy reminds me of the “jammer towers” (заглушителни станции) that I’ve seen in Bulgaria during the communist days. The idea was to isolate the Bulgarian people from FM transmissions from neighbouring Greece and Turkey. I’m guessing they were going for FM and/or TV, because I know AM radio is more difficult to block.

Could it be that the whole education system is intentionally dysfunctional? An informed and educated citizenry would be much harder to indoctrinate and control. No. Surely this is a crazy idea to imagine science education is intentionally restricted to a small group of people, who are indoctrinated with the “you’ll get a big paycheque”-mentality in school, and forced to join the System immediately upon graduation to repay their student debt. In any case it’s worth checking. If science textbooks were made intentionally inaccessible by the system, then making more accessible science and technology textbooks will lead to more politically active citizens, armed with metis. Let’s see.

Books vs Apps

I just read (actually the computer read it to me) a Business Insider interview with Jeff Bezos in which he talks about book pricing. Specifically, the conversation seems to be about eBooks, but the conversation seems to cover books in general—as a medium.

The most important thing to observe is that books don’t just compete against books. Books compete against people reading blogs and news articles and playing video games and watching TV and going to see movies. Books are the competitive set for leisure time. It takes many hours to read a book. It’s a big commitment. If you narrow your field of view and only think about books competing against books, you make really bad decisions. What we really have to do, if we want a healthy culture of long-form reading, is to make books more accessible.

Part of that is making them less expensive. Books, in my view, are too expensive. Thirty dollars for a book is too expensive. If I’m only competing against other $30 books, then you don’t get there. If you realize that you’re really competing against Candy Crush and everything else, then you start to say, “Gosh, maybe we should really work on reducing friction on long-form reading.” –J.B.

Good point. I never thought of it this way. Books, taken down from their high-horse position as the vehicle of intellectual thought, are forced to deliver the same entertainment-per-dollar value to the consumer.

But that’s a very mechanistic and anti-intellectual way of looking at things, don’t you think? Who said books are for entertainment? Who said delivering the feeling of “information buzz,” commonly associated with news articles and feed-based apps, is the universal goal of all media?

Aren’t books valuable precisely because they are different from the torrent of superficial-level information that is the Internet. Isn’t the main point of a book to summarize and distill information (which is plentiful and free) into a high-value package that earns the reader’s attention, not for it’s entertainment value, but for its quality of insight?

If you ask me, we should optimize for value-per-page (or value-per-unit-of-attention if you prefer) not compete on price with other media. The price of a book should be proportional to the value it brings to the reader. The Internet is an amazing tool that allows anyone to access gigabytes of information on any subject in the click of a button. This abundance of information is actually a problem for readers, as they have to separate the wheat from the chaff. This is where knowledgeable authors come in, whose curation and distillation of ideas brings order to the chaos. When we buy books, we don’t buy ideas, we buy the synthesis of ideas.


In the internet era, almost all of the tools for reading have been reducing the friction of short-form reading. The internet is perfect for delivering three paragraphs to your smartphone. The Kindle is trying to reduce friction for reading a whole book. It’s working. […] We’re making books easier to get, more affordable, more accessible. You are getting more reading. Mostly things have gotten better, and we live in a world where I hope things continue to get better. Surely making reading more affordable is not going to make authors less money. Making reading more affordable is going to make authors more money. –J.B.

Hm. That’s a big step in terms of logic. Bezos’ claim is that lower prices will lead to more readers, which in turn will lead to more revenue for authors. It’s a question of two rates: if the authors’ margins decrease at a rate higher than the rate at which their readership increases, the new publishing paradigm will be a net loss for authors’ bottom line.


Overall, I will qualify the Interview as an advanced-level PR effort to dismiss this summer’s battle between Amazon and Hachette over contol of book pricing, which in turn is tied to the existence of non-amazon sales channels. Amazon’s CEO used this interview as a platform to reiterate the same talking points—that Amazon is out, fighting for better prices on behalf of readers. This is a convenient position when you’re in the business of competing on distribution through economies of scale, and your business model is based on offering the lowest price based on the distributor discount offered by manufacturers and publishers. It’s a neat business model, which attracts customers in troves. It works. Good job. But pretending amazon is doing all this on behalf of the customer is a bit intellectually dishonest. The price on amazon is 10–15% cheaper than anywhere else so that Amazon can maintain and increase its market share, and not some sort of act of robynhoodism.

Survive the exams special

I know many students are currently worried about their upcoming CALCULUS and MECHANICS exams. Are you one of them? If so, don’t worry. Chill. Forget about reading your thick textbooks–I have a better option for you! I wrote a math textbook based on 15 years of tutoring experience.

Contents:

    1. Review of math basics (equations, functions, geometry)
    2. Differential calculus ($\lim$, $f'(x)$, $\max/\min$)
    3. Integral calculus ($\int f(x)dx$, $\sum_i a_i$)
    4. Mechanics ($\vec{F}=m\vec{a}$, $a(t)=x”(t)$, $\vec{p}=m\vec{v}$, $\sum E_i= \sum E_f$, $x(t)=A\sin(\omega t+\phi)$ )

All this in a tiny book you can read in a week. Available in PDF or in print. Checkout the preview and the free tutorials[2,3,4].

Lulu.com has a crazy special on print books this weekend. You can order the No bullshit guide to math and physics for just $19 using coupon code WQT32 (expires Dec 3rd). I’ve also decreased the price of the PDF on gumroad to match the print price.

Trust me on this one, this will be the best twenty bucks you ever spent on your science education.

Book pricing for optimal growth

I had a meeting with a business mentor last week (a McGill professor who has started dozens of companies). He helped me realize the most important thing for the company right now is growth—not margins. The success of Minireference Co. depends on how many students read the book by the end of this year, and by the end of the next. We’ll work on the margins once we have scale.

Below are some observations about book pricing for optimal growth.

Amazon technique

Since I listed the book on amazon.com/.ca/.co.uk/.de/.fr, all the print book sales have gone to them (mostly .com but a bit of .co.uk). Why would you buy the book from lulu.com for \$29 when you can get it from amazon for \$25? People are not stupid, even though I link to lulu.com from the main site, they all go to amazon to check if it’s available and order from there.

To be honest, I think the print quality of the books from lulu.com is superior to the book produced by amazon CreateSpace and IngramSpark. I think lulu.com has been in the “print on demand” business for the longest of the three companies, so they know what they’re doing. The print quality of lulu books is very uniform and crisp, unlike the CreateSpace version which had some pages in darker ink and some in lighter. The main reason why I like lulu.com is because their paper is thinner, which makes the 445pp book actually look much more approachable; we’re talking about a difference of 3-4 mm in overall thickness, but the psychological effects are important.

How can I communicate this difference to my readers? Maybe something like “I recommend you purchase the book through lulu.com, because of the higher print quality and thinner paper” could do. Though is this better print quality worth the \$4 extra, which is really \$10 extra when combined with the shipping. I guess I should tell my readers the facts, and let them decide for themselves.

Ingram channel

The other new channel I’ve been working on is the “serious” distribution to bookstores via Ingram. Printing through IngramSpark makes the book available through all kinds of online distributors (e.g. barnesandnoble.com) and also makes the book available to order from the Ingram catalog, which is how most physical bookstores order their books.

The list price is \$29, and the printing cost are \$6.7, which means there is \$22.3 of “value” to split between me, Ingram, and the bookstore. There are two options for the wholesale discount I can offer 55% wholesale discount (bookstore margin: 35%-40%) or offer 40% wholesale discount (bookstore margin: 20%–30%).

If I provide a wholesale discount of 40% (a.k.a short discount or academic discount), the wholesale price is 17.40, which leaves me with 17.40-6.7 = \$10.7 of profit per book sold.

If instead I offer 55% (industry standard, a.k.a trade discount), the wholesale price will drop to \$13.05, which leaves me with 13.05-6.7 = \$6.35 gains per book sold.

After yesterday’s conversation with my mentor, I’m switching to the 55% discount. Give everyone a cut. We’ll jack-up the prices when everyone is hooked on the knowledge buzz that learning mathematics provides 😉 .

eBook pricing

In parallel to the print book distribution, there is the question of eBook pricing. Both the print book and the eBook version have been \$29. Historically, eBook sales have been very good to me, but ever since the book has appeared on the amazons, the eBook sales have slowed to a trickle. Why would you buy a eBook for \$29 if you can get the print book for \$25?

I’m thinking of dropping the eBook price to \$19. Hopefully this will make more sense for potential customers. Surely the years of effort invested to write the best calculus book there could be is worth a twenty…

Devaluating the book

The other consideration to keep in mind in the face of these price deviations from the old price of \$29 is the psychological effects of “perceived value” of the book. How could this be a good book if it’s just \$25? Mainstream university-level math and physics textbooks cost hundreds of dollars. Specifically, \$190 for precalculus, \$209 for calculus, and \$192 for physics. How can your book cost only a fraction of that and be of comparable quality? No way, I don’t believe it!

I had picked the price \$29 to be as low as possible (looking out for the student’s interest) while still making the book business profitable for me. As the price is dropping below \$29 due to discounts, the situation with the “this book looks suspiciously cheap” problem is getting worse. I might think about bringing the price up to \$39 for the 6th edition, to better communicate the value.

Take home message

There is a general lesson to learn here, which is advice I’ve heard from several successful entrepreneurs, but I never took seriously until now. Price your products for growth not profit. You might lose some money at first, but reaching a wider audience is worth much more than short term profits.

Binary search in three languages

Hola! Regardons ensemble un peu de code. The binary_search algorithm. It will get a little technical, pero no es mucho complicado. A ver. En ingles. En anglais, parce que le code—ça va foule mieux en anglais.

Assume you’re given a array of integers sorted in increasing order [3,6,19,21,87]. You job is to write a “search” function that returns the 0-based index of query value, or -1 if val is not in array.

Binary search algorithm

The “usual” algorithm use the start and finish pointers in a weird way, which I found difficult to understand, so I wrote another one. The invariant “we’ve already checked the limits” feels more logical to me.

In JavaScript

In JavaScript the code for the binary search strategy is as follows:

SearchableArray.prototype.binary_search = function (val) {
    var data = this.data;

    if (data.length === 0) return -1;
    if (data[0] === val) return 0;
    if (data[data.length-1] === val) return data.length -1 ;

    var bin_search_limits = function(start,finish) {
        // invariant: data[start] and data[finish] have been checked already
        var mid;
        //console.log(start, finish);
        if (start === finish || start+1 === finish) 
            return -1;

        mid = start + Math.floor((finish-start)/2);

        if (data[mid]===val) {
            return mid;
        } else if (data[mid] < val) {
            return bin_search_limits(mid,finish);
        } else if (data[mid] > val) {
            return bin_search_limits(start,mid);
        }
    };
    return bin_search_limits(0, data.length-1);

};

The full javascript code (with tests;) is here.

In C

It was surprisingly easy to transform the JavaScript code into C. See the code and some basic tests here. The main functions essentially the same:

int bin_search_limits(int *data, int start, int finish, int val) {
      // invariant: data[start] and data[finish] have been checked already
      int mid;

      if (start == finish || start+1 == finish) 
          return -1;

      mid = start + (finish-start)/2;  

      if (data[mid]==val) {
          return mid;
      } else if (data[mid] < val) {
          return bin_search_limits(data,mid,finish, val);
      } else if (data[mid] > val) {
          return bin_search_limits(data,start,mid, val);
      }
  };

int binary_search(int *data, int length, int val) {
  if (length == 0) return -1;
  if (data[0] == val) return 0;
  if (data[length-1] == val) return length-1;

  return bin_search_limits(data,0,length-1, val);
};

In python

The pleasure of implementing binary search in python is left to the reader.


I’ve got to go code learn how to make a hash function to make the C test suite go faster 😉

Student workload

I just read an interesting account of what life is like for high school students. An adult shadowed a student for a whole day, going to class, taking exams, sitting all day, etc. The article is definitely worth a read for anyone in ed.

I liked the idea of structuring lessons starting from students questions. It wouldn’t work for 1-on-1 tutoring (a student may have only unknown unknowns) but collectively as a class, the set of all known unknowns probably covers a lot material that would make sense to cover in the current class.

Here’s an idea. Rather than writing advanced software that “measures” the student’s level of understanding and schedules appropriate material for them, why not let students tell you what they do or do not know, and—more importantly—what they would like to learn.

Update Oct 19: Grant posted an followup post.

Scaling sales, a quantitative appoach

Yesterday I watched an excellent video tutorial about startup growth tactics by Adora Cheung, a guest lecturer in Sam Altman’s startup class. The speaker has experience from user-testing 13 business ideas so you can tell she knows what she’s talking about. Growth, cohort analysis, and segmentation are all essential tools startup founders should know about.

It made me think about how little analytics I’ve been doing for minireference.com and this blog. What happens when a visitor comes to the homepage? Do they read the whole page? Do they click on any of the questions? Do they download the free PDF tutorials? Most important of all, do they click on one of they Buy links.

Let’s find out…

I previously evaluated the effectiveness of the landing page and found 3-4% conversion rates for the print book and similar rates for the Buy PDF link, and I remember being pleased about those numbers.

These days I see similar numbers: combined Print+PDF conversion is ~= 7.7%.

Looks good right? (Please, ignore the abysmal mobile conversion rates. I’m on bootstrap2 and everything will be fixed when I upgrade to bootstrap3 in a few weeks.)

The problem is many potential readers drop off after clicking through to lulu.com and gumroad.com. I lose contact with my potential customers as soon as they leave my site, so I don’t know how many of them actually bought something. Today I set out to calculate my real conversion rates by cross correlating the data form the google analytics, lulu.com, and gumroad.com.

Burst analysis

My main “marketing channel” is hacker news. Each time I release a new printable PDF tutorials, I post it to HN. It’s an elaborate ploy to gain readers’ trust by gifting them something useful (e.g. a 4-page printable tutorial) and upsell them to buy one of the books. I consider this to be ethical advertisement.

Because of this mono-site marketing strategy, the traffic on minireference.com is generally calm, but every now and then there is a huge spike that occurs when I post something to HN. This bursty nature of the traffic allows us to do a deeper analysis of the conversion rates.

Using google analytics, The visitors, and the two conversions goals are plotted below:

I chose to analyze the events surrounding two bursts of Aug 29th and Oct 3rd. The Aug 29th spike is thanks to the announcement of the
SymPy tutorial on HN.

I was able to calculate the post-minireference.com conversion rate for the print book:

Buy book link --> ordered from lulu: 14%(3/21) on Aug 29 and 10%(3/29) on Oct 3

The post-minireference.com conversion rate for the PDF is:

Buy PDF link --> ordered from gumroad: 33%(3/12) on Aug 29 and 21%(3/14) on Oct 3

Not cool y’all! I better work on this. What do people not like about lulu.com? Is it because they’re not used to it, should I put an Amazon link there?

What are dem visitors doing?

Another question that’s pertinent is how far down the page to users scroll.
We can obtain this information from the graph of scroll-depth events from google analytics (I have some .js the fires events at 0% (baseline), 25%, 50%, and 100% scroll depth.

It’s hard to read anything from that graph, but I’m saving the data for posterity—I want to have something to compare with when I switch to the new landing page…

Does the free tutorial marketing strategy work?

It took me more than a mont of part-time work to write the SymPy tutorial. It’s almost like a little book, since it covers so many topics. I also incurred $90 in copy-editing costs to make sure the writing is solid, since the tutorial was to become an appendix in the book. Was this effort worth it? Let’s see the traffic that resulted.

A total of 10k people downloaded the PDF. I had to use the server logs to get this data, because many people linked directly to the PDF, which means google analytics won’t see these hits. Using zgrep and wc the count the number of lines in the logs that contain the pdf url and HTTP code 200:

  zgrep '"GET /static/tutorials/sympy_tutorial.pdf HTTP/1.1" 200' mr.access.log mr.access.log* | wc
  10058  203840 2369042
  ^^^^^

The initial link to HN pointed to the blog post announcement, so we can see some part of that traffic on the blog:

Ultimately, what is of interest is how much traffic to minireference.com did the SymPy tutorial generate. We see the tutorial led to a spike of about 300 visitors to the main page.

From the numbers we have so far, we can estimate the conversion rate for the referral strategy via free PDF tutorials to be 3% = 300/10000. The SymPy tutorial also led to a “livelying” effect of the traffic in the following days, as can be seen in the graph. Clearly, more people are hearing about minireference.com and coming to check out the site.

The final profit from sales for this spike is \$150 so I’ve recouped the external expenses, and earned a salary of \$60/month, which is not great but still positive. The 3% conversion rate is very interesting, IMHO. I’ll pursue this strategy further, because it has great potential for viral growth—wouldn’t you send a kick-ass PDF tutorial on subject X to your classmates? (caveat: some engineering schools grade “on the curve” so for these students, it is actually game-theoretically disadvantageous to share quality learning material with their peers)

Conclusions and further research

The purpose of this blog post was to reduce the level of guilt I felt about not playing enough with analytics for my web properties. I feel I’ve achieved some level of guilt-diminishment and, more importantly, now I’ve got numbers that I can use as the baseline for comparison with the new homepage. Surely the conversion rates can be improved; the book is great, I just need to have simple messaging that explains how great the book is, and how it is good value-for-money for students, and also good knowledge-buzz-delivered-per-unit-time for adult learners.

Questions to followup on:

  1. How many landing pages do I need? There are essentially three “touching points” with my potential readers. A cold visit to the homepage (e.g. google search), a warm visit to the homepage (via recommendation), a tutorial referral visitor (which is super warm). Will the same marketing message work for all three types of traffic, or should I have three landing pages?
  2. What is the optimal order of the sales pitch? (A/B/../Z-test of section ordering.
  3. Should each book have its own landing page (/noBSmath and /noBSLA) or focus on a single page with two products?
  4. A/B test lulu.com vs amazon.com for the “Buy Book” link.
  5. What channels to develop next?

Okay, enough blogging. Let’s go write some kick-ass marketing copy. And from now on, we’ll be tracking them sales!