January Optimism about OER

Last year in March I did a lot of soul searching about my mission in the EdTech space. At the time, figuring out the incentives for authors and teachers to produce open educational resources (OER) seemed like an insurmountable mountain to climb. I didn’t see a clear path for interoperability between content sources. OER yes, but OER how?

Since then I’ve learned a lot more about the open content landscape and I’m starting to feel more optimistic about the prospects for OER. Could year 2018 be when the switch-over happens? I think so.


Tech Strategy for 2018

Below is a list of technology building blocks that will be half-built by Mar 2018 and fully built by Mar 2019. Together they represent the foundation for OER becoming a mainstream phenomenon.

Content library

One year from now, accessing all the CC-licensed educational material from the web will be a solved problem. When a school principal wants to setup a local OER server, they’ll be able to import educational content from a choice of multiple free content libraries. (Technical details such as the data format wget+zip, web archive, OpenZIM, cartridge, kchannel, etc. and the browsing and search interfaces can be solved by something like pandoc for OER content and listing syndication and indexing service integration.)

Editing tools

Given a global content channel like Khan Academy, a teacher might want to edit the content structure to: change folder titles and descriptions; reorder items within folders; cut, copy, and paste items; and add new items by adding files or creating them from scratch with the authoring tools.

Authoring tools

A lot of the OER content out there isn’t that good. If learning will happen on a new medium like a tablet, then it might be easier to start from scratch and produce new content adapted for tablets. We need more tools for educators, teachers, and students to produce learning activities.

Make some sushi and feed the kids for a day, or teach them how to make sushi on their own and feed them for life.

Diff tools

Since content channels are not static, we need to make an easy-to-use system for distributing and applying updates. The source channel that you imported has changed, do you want to update your local copy from the source channel? Click Yes or review changes by looking at diff between old content and new content: nodes added, nodes removed, nodes moved, nodes whose content was edited.

When looking at individual content items that consist mostly of text, there could be copyediting and typo correction tools. Any contributor could fix a typo in the description of a content item and submit a typo fix “pull request” upstream to the original content node owner (use word-level diffs git diff --color-words ... for showing text changes).

Standards alignment tool

Imagine an editing interface for manipulating educational standards and setting parallels. For example, let MATH.US and MATH.MX be the math education standards for the US and Mexico. We can setup links that say “Every content item tagged with MATH.US.tagP should be automatically tagged with MATH.MX.tagQ” and this rule can be applied whenever importing content. (i.e. do the standards alignment once up front, instead of doing it for every content item).

Student backpack (like in RPG games)

I’m generally against gamification techniques because I don’t see what’s the point of introducing metaphorical rewards like points and badges. However, if achievement rewards are related to the content matter then things could be very interesting.

What if badges were awarded when students pass some exam or “validation test” that confirms they know the concept inside out. You get the badge for X when you’ve completed all the tests required for X. For example the QEQN badge can be awarded whenever students have proved they know the formula x = (-b +/- sqrt(b^2-4ac))/(2*a) for obtaining the solution set to the equation  ax^2 + bx +c = 0.

When you unlock the QEQN badge, the “quadratic equation tool” will be available for all the problems you will solve in the future. By earning the QEQN badge you proved you know the rule x = (-b +/- sqrt(b^2-4ac))/(2*a) by doing all the exercises, therefore from now we’ll stop forcing you to do this calculation by hand and let you use the quadratic equation tool when solving problems (click view source to see how it works).

I’m not sure every skill can be turned into an applet, so we could instead give them a “knowledge scroll” as the achievement—a short document that summarizes the concept that students can use whenever they need to use an X-related formula. Another option would be to earn “effort” badges for investing a lot of time to read/practice.


Perhaps I’m being too optimistic and the OER revolution is still far away, but I don’t think so. Now that I know the good folks at LE are thinking about these problems, I feel free software for universal education is coming up soon! It will take a combination of vision, technical expertise, implementations experience, and partnerships to make this work. As we say in Bulgaria, “Сговорна дружина, планина повдига,” which roughly translates to “An organized group can lift a mountain.” If you’re interested in lifting some mountains with us, check out the LE jobs page.

Impressions from NIPS 2015

Last week I attended the NIPS conference and it felt like grappa shot: intense but good for brain function. There are so many advances in research, and industry is shipping ML in products, and GPUs make previously-impossible things possible. Definitely an exciting time to be.


The opening talk was about deep earning. In fact, a lot of the conference was about deep learning. Non-conformist as I am, I tried not to focus too much on that. All the new applications and interest from industry is great, but I don’t think the research is that revolutionary. I read this review paper Deep learning (paywall, door) and I’m going to limit myself to this level of understanding for now. With 4000 people in one place and 500+ posters to look, it’s hard enough to keep track of topic-modelling topics covered!


I attended the Bayesian Nonparametrics workshop which was the who-is-who of the community. I figured that was my only chance to be in a community where I’ll understand more than every second word said. The morning started with a very interesting “theory” talk by Peter Orbanz. I’m sure he’ll post the slides at some point, but in the meantime I found a 100pp PDF of lecture notes by him: Notes on Bayesian Nonparametrics. There’s also a video of a workshop from 4 years ago. This guy knows his stuff, and knows how to explain it too.

Another excellent talk was by Mike Hughes on Scalable variational inference that adapts the number of clusters. This looked like good ideas to manage fragmentation (too many topics) and finally starts to show BNP’s killer app — automatically learning the right number of topics for a given corpus.

During the discussion panel, the question of open source code for BNP arose and the following projects were mentioned: bnpy and BNP.jl.

Around lunch time I caught part of the talk by David Blei which talked about the papers Black Box Variational Inference and Hierarchical Variational Models. Very interesting general-purpose methods. I should look into some source code, to see if I can understand things a bit better.

In the afternoon, Amr Ahemed gave an interesting talk about large-scale LDA and efficient LDA sampling using alias method. First for data-parallelism, the workload can be split to thousands of machines, and each machine keeps topic-sparse word-in-topic “counts replica” on individual machines (that syncs asynchronously with shared-global state). If global topic model knows about K different topics, the local node x need to know only about the $k_x$ topics that occur on the documents it will be processing, since $k_x << K$, this allows to push the $K$. Very neat. Another interesting trick they use is alias sampling which performs some preprocessing of any n-dimensional miltinomial distribution to allow to take samples from it efficiently. It doesn’t make sense if you want just one sample, but if you’re taking many samples then the upfront cost of creating the “alias distribution” is amortized overall. It feels like we’re seeing a 3rd-generation parallel LDA ideas start to come-up.

The school of the future

It has been a long time since I last wrote something, but I haven’t been idle altogether. I’ve been planning what to do next for the books, calculating my move, so to say. It’s slowly starting to become clearer. Given my background as linux sysadmin and tutor, it only makes sense if the revolution looks to me like an open source software stack for schools. Read on to find the general strategy. Listen to this good tune if you’re missing an accompanying soundtrack; here is some more tunes in case the blog post ends up long.

I’m going to write a toolbox for every school out there to print their own textbooks and other educational material.

The best-in-class existing exercise framework is already available as open source and can be hosted on the school’s server (or virtual on AWS for $40/mo), but since we’re going to have a “school server,” wouldn’t we want to run some other stuff on there too?

I’m thinking:

  • printable exercises sheets for doing in class
  • printable problems to give as homework
  • printable exams (different version for each student)
  • print affordable textbooks (using POD service like lulu.com)

I’m happy for all the kids with iPods and computers at home, but not every school has the budget for computers, so let’s try to keep the costs to at most $20 per class for the week. A few good laser printers could keep the school running with students learning from top notch material, at the cost of a few thousand dollars per month. Toner and paper.

Okay, but where are you going to get the content.

There’s plenty of OER out there, but there could be more and it could be made more easily accessible.

Content framework

Here I must confess, I will be biased, because as a django person, I see the world through my own prism. A very good way for presenting structured content exists already, as free software. That’s the best place to start. Everything else on the content management side, and exporting printable PDFs I can script myself. The main content pipeline will be something like .md --> .tex --> .pdf; the softcover/polytexnic toolchain is very good at this.


The key is to make it easy for teachers to browse, use, and contribute content. The graph structure (and possible common core math categorization) will help the browsing, the .toPDF() methods will make the content immediately usable, and the only real problem remaining is the user experience that entices teachers to contribute content. That’s the biggie. But it can be done.

Software requirement specification

Given a collection of content items (paragraphs of text, entire sections of book, exercises, or problems), the system allows teachers to assemble custom “playlists” which consists of a sequence of content items with a `.toPDF` method.

No way I’m going to let them run the high schools. We’re taking over that business too now.

Aren’t big publishers better?

The market forces will prevail. What is better for a cash-strapped school, to order some crusty mainstream algebra textbook, that may or may not be standard-compliant, but sure is long and talks to students as retards, or to demand the printing of one copy of the best free book on the subject, for 1/10 of the cost.

How will you make money?

Nothing changes really. Minireference continues to sell university-level textbooks; we just make the high school material and the toolchain free. As for the potential loss of business due to high schools printing on their own instead of buying textbooks from me, I’d call that a win overall.

Digital vs. print and the future of books

I’m reading an interesting paper by M. Julee Tanner that compares the cognitive aspect of digital vs. print delivery for book-length material. In summary, the printed book is not dead!

I’ve always thought the print medium (especially typeset by LaTeX) is far superior for learning and comprehension, but I figured this was my “old timer” ways (I’m 32). It seems I’m not the only one though:

Despite decades of work by computer and e-reader engineers and designers to improve the optics, display, and ease of navigation of virtual texts, readers still have a general preference for the print presentation, especially when it comes to longer, more challenging material.

The author states many good things print books have going for them, but the most interesting to me is the following quote:

[…] the greatest difference in metacognitive strategy was also found among the users of e-readers, in their reluctance to review previously read passages by virtually turning back pages. It seems that the perceived unwieldiness of screen-tapping to turn pages did negatively impact comprehension of expository texts on the e-reader platform (Margolin et al., 2013).
Since monitoring one’s understanding while reading, reviewing previously read material if necessary, underlining, and taking marginal notes are so vital to the comprehension of more challenging texts, it is important for students and educators to know how applicable these metacognitive strategies are to virtual texts.

Indeed, think about it—if you’re reading a complicated passage in a math book, wouldn’t you want to flip back and look at the equation which you saw five pages ago? In a printed book you could do that (you could in fact leave you finger on that spread and conveniently flip between the two pages). In a PDF read on the computer, it’s also somewhat passible to flip back (though a bit imprecisely), but on an eBook reader it’s not easy to do.

Learning math/physics (or other cognitively demanding material) from an eBook reader feels a bit like I’m placed in front of a slide deck: information comes, then it’s quickly taken away, leaving me in a disorganized state of mind.

Here’s the full reference: Tanner, M. J. (2014). Digital vs. print: Reading comprehension and the future of the book. SJSU School of Information Student Research Journal, 4(2). scholarworks.sjsu.edu/slissrj/vol4/iss2/

Internet propaganda

TL;DR: The fight on the Internet is not just about True vs False, but also True vs Noise.

I’m reading this article about Internet propaganda in China, and I can’t help but wonder how much of this exists in the West. How many PR firms employ armies of paid commenters ready to intervene and vote up (or down) any content item? How many politicians employ the services of these PR firms?

I find it very interesting to dissect the tools of the System—to try to deconstruct the methods which the powers that be use to control the People. There seems to be three forces at play for any issue X. Voices in support of X, voices in opposition to X, and noise. Assuming X is something the people want, people-opposing forces (which we’ll call the System for simplicity), have at least two options to silence the X discussion:

  1. Pay shills to post opinions against X.
  2. Produce noise to drown out the X discussion altogether.

In totalitarian regimes (think Russia), official police and secret police act to suppress the supporters of X, while in “open societies” (the West) corporate media control (which is a mix of options 1. and 2. above) are used to suppress X.

It seems China’s regime is siding with the Western approach for their Internet censorship. Here’s a quote from the above article:

It’s not clear the degree to which paid comments influence the conversation the way Communist Party members hope they do. Xiaolan says the paid commenters could be adding noise to the conversation simply to drown out normal people’s desire to converse online.

The noise strategy reminds me of the “jammer towers” (заглушителни станции) that I’ve seen in Bulgaria during the communist days. The idea was to isolate the Bulgarian people from FM transmissions from neighbouring Greece and Turkey. I’m guessing they were going for FM and/or TV, because I know AM radio is more difficult to block.

Could it be that the whole education system is intentionally dysfunctional? An informed and educated citizenry would be much harder to indoctrinate and control. No. Surely this is a crazy idea to imagine science education is intentionally restricted to a small group of people, who are indoctrinated with the “you’ll get a big paycheque”-mentality in school, and forced to join the System immediately upon graduation to repay their student debt. In any case it’s worth checking. If science textbooks were made intentionally inaccessible by the system, then making more accessible science and technology textbooks will lead to more politically active citizens, armed with metis. Let’s see.