Using version control is very useful for storing text documents like papers and books. It’s amazing how easy it is to track changes to documents, and communicate these changes with other authors. In my career as a researcher, I’ve had the chance to initiate many colleagues to the use of mercurial and git for storing paper manuscripts. Also, when working on my math books, I’ve had the fortune to work with an editor who understands version control and performed her edits directly to the books’ source repo. This blog post is a brainstorming session on the what a git user interface specific to author’s needs could look like.

The other day I was onboarding a new author and had a chance to explain to him the basics of git, and I realized how complicated the action verbs are. To save some work, you need to put files in the staging area using git add <filename>, commit the change to the local repo, then push the changes to the remote repo. These commands, and the corresponding commands for pulling changes from the remote repo to your local one, and updating your working directory from the local repo, are very logical after you get used to them, and represent necessary complexity. The diagram below illustrates well the different git verbs newcomers to git need to get used to.

git verbs explained

(Credit: Kieran Healy‘s excellent guide to git)

 

So what would git for authors look like?

It’s my non-expert opinion that this is too much complexity for the average non-technical person. Imagine a teacher who wants to use an OER textbook with her students, and in the process of producing the document for her class she finds some typos, which she wants to contribute back to the OER textbook project. Let’s do a thought experiment and imagine a humane interface that would make sense for this task. To make the thought experiment more concrete, we’ll personify the teacher as Jane,  a university professor who is in charge of a first-year physics class.

We’ll assume github is used as the storage backend, but most of author’s OER browsing,  and collaboration happens on a different site (say ezOER.com) whose users are authors, teachers, students, and parents. Suppose the OER book that Jane wants to use is College Phyisics by OpenStax, and this book is available in “source” format from the github repo openstax/physics, which we’ll refer to as upstream below. Given this preexisting setup, here are the steps the teacher would use:

  1. Login to ezOER.com
  2. Copy openstax/physics  to janesmith/physicsbook  (note we don’t say “fork” because it has different connotation as to the permanence and authority of the repo)
  3. Clone janesmith/physicsbook to her ~/Documents/School/Textbooks/OpenStaxPhysics
  4. Follow instructions for “building” the book locally. (e.g. running pdflatex three times)
  5. Performs customization like:
    1. Change cover page
    2. Remove chapters she doesn’t plan on covering in her class
    3. Add a custom preface with references specific to her class
    4. Choose values for “configuration variables” like font size, paper size, etc.
  6. Generate custom book for her class (PDF for print, PDF for screen, .epub, and .mobi)

At this point, she can distribute the eBooks to her students using her school’s LMS’ “file uploads” feature and setup the print PDF for print-on-demand using lulu.com, so students will be able to order the book in print. Her students will benefit from a world-class textbook for $20-30 when printed as a two-tome softcover, black-and-white print book. No payment or further engagement with ezOER.com would be required.

If she doesn’t like her school’s LMS system she could “host” her custom book on ezOER.com. These are the steps she would take to publish her changes to her public-copy repository ezOER.com/janesmith/physicsbook:

  1. git save: combines the effect of git add and git commit using a two-prompt wizard
  2. git publish

She could now give the links to the “build” directory of ezOER.com/janesmith/physicsbook.

Now suppose that halfway through the course, she finds some typos in Chapter 2 of the book, which she wants to correct, and furthermore she wants to share her corrections with the “upstream” copy of the textbook. (Bear with me with this scenario, we’ll have to think more about good incentives to share your corrections with others, but for the purpose of this thought experiment let’s assume Jane is feeling altruistic today). These are the commands she’ll have to use to “suggest edits” to the upstream authors who manage openstax/physics:

  1. Make the corrections in her working directory
  2. git save
  3. git publish (to her copy)
  4. git suggestedits which pops up a wizard asking her to give a short label for her edit suggestions, and pick the commits that should be part of the “suggested edit” (a pull request behind the scenes). The suggestedits command will perform the following steps behind the scenes.
    git checkout -b typoFixesChapter2
    git rebase -i   (choosing only corrections commits, and not the customization commits)
    – open github pull request

To keep things simple, Jane will never be shown the typoFixesChapter2 branch, and for all intents and purposes the rest of the workflow will be done entirely through the ezOER.com web interface. For example, if the upstream maintainers wants her to change something in her “suggested edits” (pull request), she’ll have to make these changes through the web interface, rather than edit the branch typoFixesChapter2 and push again. For all intents and purposes, Jane is always working on the master branch of her copy of the book.

I think introducing the new verbs save, publish, and suggestedits would be easier to use and correspond more closely to authors’ needs.

More power tools for authors

Assuming the source format is text based, git’s basic diff functionality will prove to be useful for “watching” changes made to large collections of text.  If the source is LaTeX documents, ezOER could run latex-diff to generate diff documents showing “rendered” differences between revisions, also know as red-blue diffs.

The build process could be automated using a generic continuous integration server. A script could run after each commit to regenerate the book in various PDF and eBook formats, and also generate diffs. We could even have some “language checks” scripts, that act like linters for text.


 

I’ve thought about this previously, but now the “authoring workflow” is becoming clearer. I need something like this for managing Minireference Co.’s (closed-source) content, but I plan to build all the tooling as open source. Would love to hear you feedback about this idea in the comments below.

3 thoughts on “Git for authors

  1. Annual general update – Minireference blog
  2. We definitely need to brainstorm about OER production, at some point. For instance, there’s a whole lot to be said about the role of Learning Management Systems, especially in the move towards what Michael Feldstein and Phil Hill call the “post-LMS”. Open Apereo’s Open Summit left a lot of room for this type of thing, at the junction of FLOSS, Open Education, Open Access, Open Data, and Open Research.

    As you can guess, many people are encountering very similar ideas and there are many resources (software, human, conceptual…) already available to improve on existing workflows and practices.
    Won’t flood this comment box with links (your inbox already got enough of those). The next step would be to get people together to hash these things out.

    In other words, if other people read this, let’s get in touch to move things forward.

  3. The state of open educational resources in 2017 – Minireference blog

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php