Next week I’m interviewing for a Data Scientist position so I figured I better brush up my machine learning skills. I found some neat youtube tutorials [1,2] on using scikit-learn so I thought this would be a good place to start.
From experience, I was expecting that setting up the dev-environment with numpy, scipy, ipython notebook, etc, would take me half a day (compiling and debugging things that don’t work out of the box), but I was pleasantly surprised when a few pip commands later I had a fully functional environment. I’ve pasted the sequence of commands below for all those in case you want to learn yourself some ML too.
Create a virtualenv
The first part is to create an isolated virtualenv for the project. Think of this as “basic python hygiene”: we want to isolate the python libraries used to follow the tutorial from my system-wide python library. (For most people this is just “best practices” but in my case my system-wide site-packages contains outdated versions, and or half-broken dependencies because of the dysfunctional relationship between fink, macports, and homebrew that plays out on my computer.) To setup a virtualenv in a given directory and activate it, proceed as follows:
$ cd ~/Projects/MLpractice
$ virtualenv pyML
$ . pyML/bin/activate # . is the same as source
Install prerequisites
Next we’ll install all the prerequisite packages and scikit-learn. Note that the command line starts with (pyML)
which indicates that pip will install these packages in the pyML virtualenv and not system-wide.
(pyML)$ which python
(pyML)$ which pip
(pyML)$ pip install numpy
(pyML)$ pip install pyzmq
(pyML)$ pip install ipython[all]
(pyML)$ pip install scipy
(pyML)$ pip install pyparsing
$ brew update
$ brew install freetype
$ brew link --force freetype
$ brew install libpng
$ brew link --force libpng
$ brew install libagg
(pyML)$ pip install matplotlib
(pyML)$ pip install psutil
(pyML)$ pip install scikit-learn
Done
Now everything is ready and setup for us.
We can clone the repositories with the example code and start the ipython notebook as follows.
$ git clone git@github.com:jakevdp/sklearn_scipy2013.git
$ git clone git@github.com:ogrisel/parallel_ml_tutorial.git
(pyML)$ cd sklearn_scipy2013/notebooks/
(pyML)$ ipython notebook --pylab inline
Your default browser should open showing you iPython notebooks for the first tutorial.
Let the learning begin—both for machine and human alike!