Statistics and Machine Learning

I first heard about Machine Learning sometime between 2002 and 2006 when I was working on my PhD in statistics. At one conference I attended there was an academic using Machine Learning, people talked about him as someone doing something "a bit odd". If you have ever watched Monty Python's Life of Brian then you can think of Frequentists as the "Judean People's front", Bayesians as the "People's front of Judea" and this guy was the "Judean Popular People's Front".

By the time I co-founded an edtech startup in 2014, Machine Learning had become mainstream and it was pretty hard to find anyone with a pitch deck that didn't include at least one slide claiming that they would disrupt industry X with Machine Learning.

At this point I got a bit sceptical. How was this different from statistics? Was it different from statistics? I signed up for the Andrew Ng Machine Learning course on Coursera to find out more. Andrew Ng is passionate about Machine Learning and is exceptionally good at explaining the intuitions behind the mathematics. Early in the course you are introduced to Linear Regression, well this is pretty comfortable territory for a statistician and I began to suspect that Machine Learning and Statistics differed only in naming conventions, e.g. "features" instead of "predictors".

As I worked through the rest of the course the lectures started to cover topics I had not seen before and, more than that, the focus seemed a little different. I enrolled in a couple of other online courses but it was not until I came across the 2001 paper "Statistical Modelling: The Two Cultures" by Leo Breiman, that I felt I started to understand the philosophical differences. It is wonderful paper because it also has comments from leading Statisticians Cox and Efron.

Breiman started as an academic Statistician, then became a consultant where he found Machine Learning to be far more successful for the practical problems he was faced with. He returned to academia where he argued for algorithmic modelling over data modelling approaches. Breiman is very critical of data modelling, claiming it is unrealistic that a Statistician...

...can invent a reasonably good parametric class of models for a complex mechanism devised by nature.

Both Machine Learning and Statistics exist to derive insights from data. Although they have some simple models in common (e.g. Linear and Logistic Regression), they come from fundamentally different places. Statistics sacrifices predictive accuracy for better interpretability and Machine Learning sacrifices interpretability for higher accuracy.

In Statistics we assume the data comes from a distribution and we use the data to estimate the parameters of that distribution. It is the distribution assumption that means we can interpret our model. We can check the "goodness of fit" of this assumed model

In Machine Learning we do not assume a distribution, we take the inputs and outputs and learn the black box function from the data. We check the result by trying to predict a test set.

Comparing these two fields and their approaches to Linear Regression is unhelpful because it is where they overlap the most. When we look more widely we can discern the differences more clearly.

An area I am researching at the moment is Statistical Learning which is an attempt, led by Hastie and Tibshirani, to create develop a mathematical framework for Machine Learning.