Git for authors

Using version control is very useful for storing text documents like papers and books. It’s amazing how easy it is to track changes to documents, and communicate these changes with other authors. In my career as a researcher, I’ve had the chance to initiate many colleagues to the use of mercurial and git for storing paper manuscripts. Also, when working on my math books, I’ve had the fortune to work with an editor who understands version control and performed her edits directly to the books’ source repo. This blog post is a brainstorming session on the what a git user interface specific to author’s needs could look like.

The other day I was onboarding a new author and had a chance to explain to him the basics of git, and I realized how complicated the action verbs are. To save some work, you need to put files in the staging area using git add <filename>, commit the change to the local repo, then push the changes to the remote repo. These commands, and the corresponding commands for pulling changes from the remote repo to your local one, and updating your working directory from the local repo, are very logical after you get used to them, and represent necessary complexity. The diagram below illustrates well the different git verbs newcomers to git need to get used to.

git verbs explained

(Credit: Kieran Healy‘s excellent guide to git)

 

So what would git for authors look like?

It’s my non-expert opinion that this is too much complexity for the average non-technical person. Imagine a teacher who wants to use an OER textbook with her students, and in the process of producing the document for her class she finds some typos, which she wants to contribute back to the OER textbook project. Let’s do a thought experiment and imagine a humane interface that would make sense for this task. To make the thought experiment more concrete, we’ll personify the teacher as Jane,  a university professor who is in charge of a first-year physics class.

We’ll assume github is used as the storage backend, but most of author’s OER browsing,  and collaboration happens on a different site (say ezOER.com) whose users are authors, teachers, students, and parents. Suppose the OER book that Jane wants to use is College Phyisics by OpenStax, and this book is available in “source” format from the github repo openstax/physics, which we’ll refer to as upstream below. Given this preexisting setup, here are the steps the teacher would use:

  1. Login to ezOER.com
  2. Copy openstax/physics  to janesmith/physicsbook  (note we don’t say “fork” because it has different connotation as to the permanence and authority of the repo)
  3. Clone janesmith/physicsbook to her ~/Documents/School/Textbooks/OpenStaxPhysics
  4. Follow instructions for “building” the book locally. (e.g. running pdflatex three times)
  5. Performs customization like:
    1. Change cover page
    2. Remove chapters she doesn’t plan on covering in her class
    3. Add a custom preface with references specific to her class
    4. Choose values for “configuration variables” like font size, paper size, etc.
  6. Generate custom book for her class (PDF for print, PDF for screen, .epub, and .mobi)

At this point, she can distribute the eBooks to her students using her school’s LMS’ “file uploads” feature and setup the print PDF for print-on-demand using lulu.com, so students will be able to order the book in print. Her students will benefit from a world-class textbook for $20-30 when printed as a two-tome softcover, black-and-white print book. No payment or further engagement with ezOER.com would be required.

If she doesn’t like her school’s LMS system she could “host” her custom book on ezOER.com. These are the steps she would take to publish her changes to her public-copy repository ezOER.com/janesmith/physicsbook:

  1. git save: combines the effect of git add and git commit using a two-prompt wizard
  2. git publish

She could now give the links to the “build” directory of ezOER.com/janesmith/physicsbook.

Now suppose that halfway through the course, she finds some typos in Chapter 2 of the book, which she wants to correct, and furthermore she wants to share her corrections with the “upstream” copy of the textbook. (Bear with me with this scenario, we’ll have to think more about good incentives to share your corrections with others, but for the purpose of this thought experiment let’s assume Jane is feeling altruistic today). These are the commands she’ll have to use to “suggest edits” to the upstream authors who manage openstax/physics:

  1. Make the corrections in her working directory
  2. git save
  3. git publish (to her copy)
  4. git suggestedits which pops up a wizard asking her to give a short label for her edit suggestions, and pick the commits that should be part of the “suggested edit” (a pull request behind the scenes). The suggestedits command will perform the following steps behind the scenes.
    git checkout -b typoFixesChapter2
    git rebase -i   (choosing only corrections commits, and not the customization commits)
    – open github pull request

To keep things simple, Jane will never be shown the typoFixesChapter2 branch, and for all intents and purposes the rest of the workflow will be done entirely through the ezOER.com web interface. For example, if the upstream maintainers wants her to change something in her “suggested edits” (pull request), she’ll have to make these changes through the web interface, rather than edit the branch typoFixesChapter2 and push again. For all intents and purposes, Jane is always working on the master branch of her copy of the book.

I think introducing the new verbs save, publish, and suggestedits would be easier to use and correspond more closely to authors’ needs.

More power tools for authors

Assuming the source format is text based, git’s basic diff functionality will prove to be useful for “watching” changes made to large collections of text.  If the source is LaTeX documents, ezOER could run latex-diff to generate diff documents showing “rendered” differences between revisions, also know as red-blue diffs.

The build process could be automated using a generic continuous integration server. A script could run after each commit to regenerate the book in various PDF and eBook formats, and also generate diffs. We could even have some “language checks” scripts, that act like linters for text.


 

I’ve thought about this previously, but now the “authoring workflow” is becoming clearer. I need something like this for managing Minireference Co.’s (closed-source) content, but I plan to build all the tooling as open source. Would love to hear you feedback about this idea in the comments below.

Digital vs. print and the future of books

I’m reading an interesting paper by M. Julee Tanner that compares the cognitive aspect of digital vs. print delivery for book-length material. In summary, the printed book is not dead!

I’ve always thought the print medium (especially typeset by LaTeX) is far superior for learning and comprehension, but I figured this was my “old timer” ways (I’m 32). It seems I’m not the only one though:

Despite decades of work by computer and e-reader engineers and designers to improve the optics, display, and ease of navigation of virtual texts, readers still have a general preference for the print presentation, especially when it comes to longer, more challenging material.

The author states many good things print books have going for them, but the most interesting to me is the following quote:

[…] the greatest difference in metacognitive strategy was also found among the users of e-readers, in their reluctance to review previously read passages by virtually turning back pages. It seems that the perceived unwieldiness of screen-tapping to turn pages did negatively impact comprehension of expository texts on the e-reader platform (Margolin et al., 2013).
Since monitoring one’s understanding while reading, reviewing previously read material if necessary, underlining, and taking marginal notes are so vital to the comprehension of more challenging texts, it is important for students and educators to know how applicable these metacognitive strategies are to virtual texts.

Indeed, think about it—if you’re reading a complicated passage in a math book, wouldn’t you want to flip back and look at the equation which you saw five pages ago? In a printed book you could do that (you could in fact leave you finger on that spread and conveniently flip between the two pages). In a PDF read on the computer, it’s also somewhat passible to flip back (though a bit imprecisely), but on an eBook reader it’s not easy to do.

Learning math/physics (or other cognitively demanding material) from an eBook reader feels a bit like I’m placed in front of a slide deck: information comes, then it’s quickly taken away, leaving me in a disorganized state of mind.

Here’s the full reference: Tanner, M. J. (2014). Digital vs. print: Reading comprehension and the future of the book. SJSU School of Information Student Research Journal, 4(2). scholarworks.sjsu.edu/slissrj/vol4/iss2/

Dwarslezer

I’m visiting Amsterdam and I saw this young lady on the ferry who was reading a small book. The young lady was stunningly beautiful but ferries being public transport and all I wasn’t about to chat her up. The tiny book continued to intrigue me though, so I mustered the courage to go talk to her. “This is about the business after all—not a pick up line,” I said to myself.


She turned out to be the nicest girl ever and explained to me this book format is called DWARSLEZER, which roughly translates to cross-reader. She even wrote it down for me—because let’s face it, Dutch is a pretty incomprehensible language for anyone non-Dutch.

It seems the first publisher to use this format is Jongbloed who called it the “dwarsligger” meaning “cross-beam” or “cross-bar”. Other publishers (AW Bruna Uitgevers, Dutch Media en Nieuw Amsterdam) have released books in this format and there might  be some legal action going on.

This format is a great idea because it halves the overall size of “the object you carry” or equivalently we can say it doubles the size of the page you read. Also the book she was reading was 500pp-long but no thicker than 1.5cm, so the “bible paper” helps to make the format compact.

Watch out for a dwarslezer edition of the No bullshit guide to math and physics coming soon!