{"id":374,"date":"2014-03-13T21:08:34","date_gmt":"2014-03-14T01:08:34","guid":{"rendered":"http:\/\/minireference.com\/blog\/?p=374"},"modified":"2014-03-13T21:08:34","modified_gmt":"2014-03-14T01:08:34","slug":"getting-started-with-ml-in-python","status":"publish","type":"post","link":"https:\/\/minireference.com\/blog\/getting-started-with-ml-in-python\/","title":{"rendered":"Getting started with ML in python"},"content":{"rendered":"<p>Next week I&#8217;m interviewing for a Data Scientist position so I figured I better brush up my machine learning skills. I found some neat youtube tutorials [<a href=\"http:\/\/www.youtube.com\/watch?v=r4bRUvvlaBw\">1<\/a>,<a href=\"http:\/\/www.youtube.com\/watch?v=iFkRt3BCctg\">2<\/a>] on using <a href=\"http:\/\/scikit-learn.org\/stable\/index.html\">scikit-learn<\/a>\u00a0so I thought this would be a good place to start.<\/p>\n<p>From experience, I was expecting that setting up the dev-environment with numpy, scipy, ipython notebook, etc, would take me half a day (compiling and debugging things that don&#8217;t work out of the box), but I was pleasantly surprised when a few pip commands later I had a fully functional environment. I&#8217;ve pasted the sequence of commands below for all those in case you want to learn yourself some ML too.<\/p>\n<h3>Create a virtualenv<\/h3>\n<p>The first part is to create an isolated virtualenv for the project. Think of this as &#8220;basic python hygiene&#8221;: we want to isolate the python libraries used to follow the tutorial from my system-wide python library. (For most people this is just &#8220;best practices&#8221; but in my case my system-wide site-packages contains outdated versions, and or half-broken dependencies because of the dysfunctional relationship between fink, macports, and homebrew that plays out on my computer.) To setup \u00a0a virtualenv in a given directory and activate it, proceed as follows:<\/p>\n<p><code><br \/>\n$ cd ~\/Projects\/MLpractice<br \/>\n$ virtualenv pyML<br \/>\n$ . pyML\/bin\/activate    # . is the same as source<br \/>\n<\/code><\/p>\n<h3>Install prerequisites<\/h3>\n<p>Next we&#8217;ll install all the prerequisite packages and scikit-learn. Note that the command line starts with <code>(pyML)<\/code> which indicates that pip will install these packages in the pyML virtualenv and not system-wide.<\/p>\n<p><code><br \/>\n(pyML)$ which python<br \/>\n(pyML)$ which pip<\/p>\n<p>(pyML)$ pip install numpy<br \/>\n(pyML)$ pip install pyzmq<br \/>\n(pyML)$ pip install ipython[all]<br \/>\n(pyML)$ pip install scipy<br \/>\n(pyML)$ pip install pyparsing<\/p>\n<p>$ brew update<br \/>\n$ brew install freetype<br \/>\n$ brew link --force freetype<br \/>\n$ brew install libpng<br \/>\n$ brew link --force libpng<br \/>\n$ brew install libagg<br \/>\n(pyML)$ pip install matplotlib<br \/>\n(pyML)$ pip install psutil<\/p>\n<p>(pyML)$ pip install scikit-learn<\/p>\n<p><\/code><\/p>\n<h3>Done<\/h3>\n<p>Now everything is ready and setup for us.<br \/>\nWe can clone the repositories with the example code and start the ipython notebook as follows.<\/p>\n<p><code><br \/>\n$ git clone git@github.com:jakevdp\/sklearn_scipy2013.git<br \/>\n$ git clone git@github.com:ogrisel\/parallel_ml_tutorial.git<br \/>\n(pyML)$ cd sklearn_scipy2013\/notebooks\/<br \/>\n(pyML)$ ipython notebook --pylab inline<br \/>\n<\/code><\/p>\n<p>Your default browser should open showing you iPython notebooks for the first tutorial.<br \/>\nLet the learning begin&#8212;both for machine and human alike!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Next week I&#8217;m interviewing for a Data Scientist position so I figured I better brush up my machine learning skills. I found some neat youtube tutorials [1,2] on using scikit-learn\u00a0so I thought this would be a good place to start. From experience, I was expecting that setting up the dev-environment with numpy, scipy, ipython notebook, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,15],"tags":[],"class_list":["post-374","post","type-post","status-publish","format-standard","hentry","category-computers","category-tools"],"_links":{"self":[{"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/posts\/374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/comments?post=374"}],"version-history":[{"count":0,"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/posts\/374\/revisions"}],"wp:attachment":[{"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/media?parent=374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/categories?post=374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/minireference.com\/blog\/wp-json\/wp\/v2\/tags?post=374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}