he answer to the question “What machine learning algorithm should I use?” is always “It depends.” It depends on the size, quality, and nature of the data. It depends on what you want to do with the answer. It depends on how the math of the algorithm was translated into instructions for the computer you are using. And it depends on how much time you have. Even the most experienced data scientists can’t tell which algorithm will perform best before trying them.
Machine Learning Studio provides state-of-the-art algorithms, such as Scalable Boosted Decision trees, Bayesian Recommendation systems, Deep Neural Networks, and Decision Jungles developed at Microsoft Research. Scalable open-source machine learning packages, like Vowpal Wabbit, are also included. Machine Learning Studio supports machine learning algorithms for multiclass and binary classification, regression, and clustering. See the complete list of Machine Learning Modules. The documentation provides some information about each algorithm and how to tune parameters to optimize the algorithm for your use.
The Machine Learning Algorithm Cheat Sheet
The Microsoft Azure Machine Learning Studio Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Azure Machine Learning Studio library of algorithms. This article walks you through how to use this cheat sheet.
To download the cheat sheet and follow along with this article, go to Machine learning algorithm cheat sheet for Microsoft Azure Machine Learning Studio.
This cheat sheet has a very specific audience in mind: a beginning data scientist with undergraduate-level machine learning, trying to choose an algorithm to start with in Azure Machine Learning Studio. That means that it makes some generalizations and oversimplifications, but it points you in a safe direction. It also means that there are lots of algorithms not listed here.
These recommendations are compiled feedback and tips from many data scientists and machine learning experts. We didn’t agree on everything, but we’ve tried to harmonize our opinions into a rough consensus. Most of the statements of disagreement begin with “It depends…”
How to use the cheat sheet
Read the path and algorithm labels on the chart as “For <path label>, use <algorithm>.” For example, “For speed, use two class logistic regression.” Sometimes more than one branch applies. Sometimes none of them are a perfect fit. They’re intended to be rule-of-thumb recommendations, so don’t worry about it being exact. Several data scientists we talked with said that the only sure way to find the very best algorithm is to try all of them.
Here’s an example from the Azure AI Gallery of an experiment that tries several algorithms against the same data and compares the results: Compare Multi-class Classifiers: Letter recognition.
To download an easy-to-understand infographic overview of machine learning basics to learn about popular algorithms used to answer common machine learning questions, see Machine learning basics with algorithm examples.
Flavors of machine learning
Supervised learning algorithms make predictions based on a set of examples. For instance, historical stock prices can be used to make guesses about future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm looks for patterns in those value labels. It can use any information that might be relevant—the day of the week, the season, the company’s financial data, the type of industry, the presence of disruptive geopolitical events—and each algorithm looks for different types of patterns. After the algorithm has found the best pattern it can, it uses that pattern to make predictions for unlabeled testing data—tomorrow’s prices.
Supervised learning is a popular and useful type of machine learning. With one exception, all the modules in Azure Machine Learning Studio are supervised learning algorithms. There are several specific types of supervised learning that are represented within Azure Machine Learning Studio: classification, regression, and anomaly detection.
- Classification. When the data are being used to predict a category, supervised learning is also called classification. This is the case when assigning an image as a picture of either a ‘cat’ or a ‘dog’. When there are only two choices, it’s called two-class or binomial classification. When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.
- Regression. When a value is being predicted, as with stock prices, supervised learning is called regression.
- Anomaly detection. Sometimes the goal is to identify data points that are simply unusual. In fraud detection, for example, any highly unusual credit card spending patterns are suspect. The possible variations are so numerous and the training examples so few, that it’s not feasible to learn what fraudulent activity looks like. The approach that anomaly detection takes is to simply learn what normal activity looks like (using a history of non-fraudulent transactions) and identify anything that is significantly different.
In unsupervised learning, data points have no labels associated with them. Instead, the goal of an unsupervised learning algorithm is to organize the data in some way or to describe its structure. This can mean grouping it into clusters or finding different ways of looking at complex data so that it appears simpler or more organized.
In reinforcement learning, the algorithm gets to choose an action in response to each data point. The learning algorithm also receives a reward signal a short time later, indicating how good the decision was. Based on this, the algorithm modifies its strategy in order to achieve the highest reward. Currently there are no reinforcement learning algorithm modules in Azure Machine Learning Studio. Reinforcement learning is common in robotics, where the set of sensor readings at one point in time is a data point, and the algorithm must choose the robot’s next action. It is also a natural fit for Internet of Things applications.
Considerations when choosing an algorithm
Getting the most accurate answer possible isn’t always necessary. Sometimes an approximation is adequate, depending on what you want to use it for. If that’s the case, you may be able to cut your processing time dramatically by sticking with more approximate methods. Another advantage of more approximate methods is that they naturally tend to avoid overfitting.
The number of minutes or hours necessary to train a model varies a great deal between algorithms. Training time is often closely tied to accuracy—one typically accompanies the other. In addition, some algorithms are more sensitive to the number of data points than others. When time is limited it can drive the choice of algorithm, especially when the data set is large.
Lots of machine learning algorithms make use of linearity. Linear classification algorithms assume that classes can be separated by a straight line (or its higher-dimensional analog). These include logistic regression and support vector machines (as implemented in Azure Machine Learning Studio). Linear regression algorithms assume that data trends follow a straight line. These assumptions aren’t bad for some problems, but on others they bring accuracy down.