Blog
Words about design, the industry, and everything in between.

Machine Learning in Six Lines

Posted on

I’ve only very recently started experimenting with Machine Learning, but Python has made is super simple. First, set up an scikit-learn environment (I used Anaconda) and import the decision tree classifier.

from sklearn import tree

And that’s line 1. Compile this python script, and, if there are no errors, we have our environment set up. Now let’s get some data. In the following, we’re using two one-dimensional arrays for features and labels. Consider a phone app where we save the names of contacts I called, corresponding to the time when I called them.

features = [[10.00], [10.30], [12.10], [12.55], [14.00], [15.00], [18.00], [18.07], [20.00], [21.00]]
labels = ['Mom', 'Mom', 'Doctor', 'Doctor', 'Friend', 'Friend', 'Girlfriend', 'Girlfriend', 'Mom', 'Mom']

We’ve reached ’til line 3. This list can be populated using the history of your phone app, where labels correspond to features, and we use this information to predict who you might want to call. Let’s set up a classifier, in this case the Decision Tree Classifier, and start predicting after fitting the data.

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
clf.predict([[15.30]])

And that’s it. When you execute this script, you’ll get the following output:

['Friend']

Which is precisely what we were aiming for. Even though we hadn’t explicitly told the computer who we might want to call at 3:00 pm, it recognized the calling pattern to generate this answer. That’s machine learning in six lines.

This is what a simple application of this could look like. We’ve converted the current time to decimal, and we’ll print who you might want to call right now.

from datetime import datetime
from sklearn import tree
features = [[10.00], [10.30], [12.10], [12.55], [14.00], [15.00], [18.00], [18.07], [20.00], [21.00]]
labels = ['Mom', 'Mom', 'Doctor', 'Doctor', 'Friend', 'Friend', 'Girlfriend', 'Girlfriend', 'Mom', 'Mom']
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
time = float("{:%H:%M}".format(datetime.now()).replace(":", "."))
print clf.predict([[time]])