Student and Faculty Guide - 10 easy steps to get up and running with Azure Machine Learning

Article
11/25/2015

My colleague Amy Nicholson is the UK expert on Azure Machine Learning, the following blog post is after a quizzing session to get understand how to get started with Azure Machine Learning ”

Step 1. Setup your Azure Machine Learning account/service

1. Create Microsoft Account, Use Microsoft Account to set up DreamSpark Account, Verify your DreamSpark Account, Register for Microsoft Azure for DreamSpark Go to Azure Machine Learning Studio and click the "Get Started" link Microsoft, DreamSpark and Azure Account Set-up Instructional Video

2. Azure for Education is for Faculty running courses using Azure, including Azure ML. Each student receives $100 of Azure credit per month, for 6 months. The Faculty member receives $250 per month, for 12 months. You can apply anytime at https://www.microsoftazurepass.com/azureu.

3. Azure Machine Learning for Research is for University Faculty running data science courses who may need greater amounts of Azure storage and additional services such as HDInsight (Hadoop and Spark) or DocumentDB (NoSQL). Proposals are accepted every two months, you can find out more and apply at https://research.microsoft.com/en-us/projects/azure/ml.aspx.

4. Azure Passes I have access to a limited number of $100 1 Month Azure passes so if your interested in running a class, tutorial or session on Machine learning at a UK institution please get in touch @lee_stott

Step 2: Understand the basics

The Azure machine learning team provided a very nice walkthrough tutorial which covers a lot of the basics.

This tutorial is really useful as it takes you through the entire process of creating an AzureML workspace, uploading data, creating an experiment to predict someone’s credit risk, building, training, and evaluating the models, publishing your best model as a web service, and calling that web service.

Step 3: Getting Data to work with

Now you need to learn how to import a data set into Azure Machine Learning, and where to find interesting data to build something amazing.

You can upload local data (like a .csv file) from your machine or access data from elsewhere on the internet (like an OData feed provider).

Great example of data are

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-use-sample-datasets/

and Azure DataMarket https://azure.microsoft.com/en-us/marketplace/?source=datamarket

Step 4: Create your first Machine Learning Experiment

Many predictive experiments using supervised learning (regression, classification, or anomaly detection) will follow this basic pattern.

Drag the data set that you chose in step 3 onto your AzureML workspace. Then you may want to use the various Data Transformation modules to clean or reformat your data (such as removing rows with missing data, etc).

Then you will split your data set in a training and test set.

Best practice is to split 75% training and 25% test.

Why do we have to split it? Well, remember that with supervised learning, you need data with labeled examples. So, the reason you split the data is to provide most of the data to train the model (it will process the data to figure out correlations between the inputs and outputs in the “train model” module), but we want to hold back some of that labeled data to test the model that we built. Then, we can compare the output of the trained model generates against the actual test dataset (in the “score model” module) to see how well the model is performing. (We can't use the same data for both…the model is built using the training data, so it will perform pretty accurately with that; we hold back unused data to test.)

Finally, the “evaluate model” module lets us compare two models against each other to determine which performs better for our needs.

Step 5: Choosing the right Algorithm

There are 4 categories of algorithms currently supported in Azure Machine Learning:

Clustering: grouping similar data together
Regression: predicting a value
Classification: predicting a discrete category
Anomaly detection: identifying data that is outside of the norm

Once you determine the category of algorithm that makes sense for your problem, you need to choose a specific algorithm within that category.

The best resource for this is the Azure Machine Learning Cheat Sheet.

The Cheat Sheet It is a useful flowchart that helps you analyze your data and figure out which algorithm may perform best.

Step 6: Refine your model.

Each algorithm contains a number of initial parameters. Tweaking the initial parameters can greatly improve your results. The "Sweep Parameters" module can help by trying many different input parameters for you, and you can specify the metric that you want to optimize for (such as accuracy, precision, recall, etc.).

Changing algorithms and adjusting their initial parameters can greatly affect your results. Here are some resources to help you learn to perfect your model:

How to choose parameters to optimize your algorithms in Azure Machine Learning

Run and Fine-Tune Multiple Models" video by Data Science Dojo

To evaluate your model, right-click on the output node of the “Evaluate Model” module and select “Visualize”.

The data provided is different depending on what category of algorithm you are using:

Regression models give you the mean absolute error, root mean squared error, relative absolute error, relative squared error, and the coefficient of determination. You want the errors to be as close to 0 as possible, and you want the coefficient of determination to be as close to 1 as possible.

Binary (two-class) classification models provide metrics on accuracy, precision, recall, F1 score (which is a combination of precision and recall), and AUC (area under the curve). You want all of these numbers to be as close to 1 as possible. It also provides the number of true positives, false positives, false negatives, and true negatives. You want the number of true positives and true negatives to be high, and the number of false positives and false negatives to be low.

Multiclass classification models provide a confusion matrix of actual vs. predicted instances.

Here are some resources to help you with evaluating your model:

How to evaluate model performance in Azure Machine Learning

How to interpret model results in Azure Machine Learning

Step 7: Publish your model as a web service.

To publish your model, click the “SET UP WEB SERVICE” button in the bottom toolbar in Azure Machine Learning Studio. If there are multiple trained models in your experiment, select the “Train Model” module for the algorithm/trained model you want to use before clicking the button.

Select the creation of a “Predictive Web Service”. The tool will generate a new experiment with web service inputs and outputs. Verify that all of your data preprocessing modules still make sense when you call the service with new data. You can also use the “Project Columns” module to remove some columns from the web service inputs and outputs. Then, run your predictive experiment and click “DEPLOY WEB SERVICE”.

There is further documentation on publishing your web service here. (You can also reference this step in the walkthrough)

Step 8: Call your web service.

Finally, you need to write a little code (or grab some sample code) to call your web service. The Azure web service that you created can operate two different ways:

Request/Response - The user sends one or more rows of credit data to the service by using an HTTP protocol, and the service responds with a set of results.
Batch Execution - The user sends to the service the URL of an Azure blob that contains one or more rows of credit data. The service stores the results in another blob and returns the URL of that container.

When you published the web service in the previous step, you were taken to a webpage documenting the different ways to call your service. Sample code is provided in C#, Python, and R. An Excel spreadsheet with macros to call the web service is also provided.

The official documentation on calling your web service is here.

Step 9: Retrain your model over time.

You may have new data coming in continually, and want to occasionally retrain your ML model based on that new data.

Here is the official documentation on how to retrain machine learning models programmatically.

Hopefully you learned something and others will benefit from your knowledge, troubleshooting efforts, and lessons learned as well. You can also share your machine learning model to the Azure Machine Learning gallery with a button click from the bottom toolbar in AzureML Studio.