Next week I’m interviewing for a Data Scientist position so I figured I better brush up my machine learning skills. I found some neat youtube tutorials [1,2] on using scikit-learn so I thought this would be a good place to start.

From experience, I was expecting that setting up the dev-environment with numpy, scipy, ipython notebook, etc, would take me half a day (compiling and debugging things that don’t work out of the box), but I was pleasantly surprised when a few pip commands later I had a fully functional environment. I’ve pasted the sequence of commands below for all those in case you want to learn yourself some ML too.

Create a virtualenv

The first part is to create an isolated virtualenv for the project. Think of this as “basic python hygiene”: we want to isolate the python libraries used to follow the tutorial from my system-wide python library. (For most people this is just “best practices” but in my case my system-wide site-packages contains outdated versions, and or half-broken dependencies because of the dysfunctional relationship between fink, macports, and homebrew that plays out on my computer.) To setup  a virtualenv in a given directory and activate it, proceed as follows:

 $cd ~/Projects/MLpractice$ virtualenv pyML $. pyML/bin/activate # . is the same as source  Install prerequisites Next we’ll install all the prerequisite packages and scikit-learn. Note that the command line starts with (pyML) which indicates that pip will install these packages in the pyML virtualenv and not system-wide.  (pyML)$ which python (pyML)$which pip  (pyML)$ pip install numpy (pyML)$pip install pyzmq (pyML)$ pip install ipython[all] (pyML)$pip install scipy (pyML)$ pip install pyparsing $brew update$ brew install freetype $brew link --force freetype$ brew install libpng $brew link --force libpng$ brew install libagg (pyML)$pip install matplotlib (pyML)$ pip install psutil (pyML)$pip install scikit-learn  Done Now everything is ready and setup for us. We can clone the repositories with the example code and start the ipython notebook as follows. $ git clone git@github.com:jakevdp/sklearn_scipy2013.git $git clone git@github.com:ogrisel/parallel_ml_tutorial.git (pyML)$ cd sklearn_scipy2013/notebooks/ (pyML)\$ ipython notebook --pylab inline 

Your default browser should open showing you iPython notebooks for the first tutorial.
Let the learning begin—both for machine and human alike!