Tips & tricks for using the Data Science Virtual Machine (DSVM) with GPU support for machine learning on Azure

Wow, what a wordy title.  In this post, I want to share the tricks I’ve learned for using the Data Science Virtual Machine on Azure with GPU hardware. 

Rationale

To begin, why would you want to do this?  Here’s the value prop:

  • Data Science Virtual Machine: the DSVM enables rapid development for data scientists.  Say that you usually use { TensorFlow | CNTK | PyTorch | etc } for your deep learning framework, but you found a great sample in one of the other deep learning frameworks.  You’d like to quickly try your data with that sample code, but don’t want the overhead of getting a new deep learning framework with all of its dependencies set up on your machine.  Use the DSVM!  It has a huge list of popular machine learning tools already installed and configured.  It is also great for scaling out training of your models on VMs. 
  • GPU: I’m not going to do an in-depth processor comparison here, but essentially GPUs have great parallel-processing capabilities and can perform faster than CPUs in many batch-processing/data-intensive scenarios.  For example, I ran the two following commands – the first with GPU support and the second with CPU only – to train a simple machine learning model in PyTorch, and you can see the resulting speedup from about 56 minutes with CPU to less than 16 minutes with GPU. 

GPU: 15m38.556s

th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json

CPU: 55m51.655s

th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json -gpu -1

Data Science Virtual Machine versions

There are multiple versions of the Data Science Virtual Machine.  There are similar tools on all of them, but for example, the Ubuntu image contains additional deep learning frameworks that aren’t supported on Windows. 

  • Windows Server 2016 DSVM
  • Windows Server 2012 DSVM
  • Ubuntu DSVM
  • CentOS DSVM
  • Deep Learning VM (DLVM): this is a variant of the DSVM with GPU support ready to go (so you don’t need to understand the tips & tricks below!).  It is available with a Windows Server 2016 or Ubuntu base image.  The DLVM actually uses the same core VM images as the DSVM, but the main differences are that the setup wizard is optimized for easy provisioning on GPU and the DLVM auto-downloads a set of end-to-end deep learning samples from GitHub when the VM instance is created.
  • Geo AI DSVM: this is a Windows Server 2016 DSVM with extra support for geospatial analytics.  It comes preinstalled with ESRI’s ArcGIS Pro software and several geospatial code samples.  


Tips & tricks

To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use the GPU. 

The biggest tip is to use the Deep Learning Virtual Machine!  The provisioning experience has been optimized to filter to the options that support GPU (the NC series – see below), which make it easier to set it up correctly. 

Outside of the Deep Learning Virtual Machine, the big gotchas to creating a vanilla Data Science Virtual Machine for deep learning on GPU are:

NC-series

NOTE: this is a screenshot, so it might not be accurate for you, future reader! I also only had the US and Canada regions selected, and there are many more datacenters available. Click here to change the region filters and get the latest data.  

  • You need to use an image with GPU drivers installed.  As documented here, GPU drivers are provided on the following machines: Linux (Ubuntu), Linux (CentOS), Windows 2016, and the Deep Learning VM.  The Windows 2012 DSVM does not have GPU support. 
  • You need to understand if you need HDD or SSD.  Even though solid state drives seem “better”, not all GPU machines support them.  The different VM series have different requirements for their Azure storage disk support.  NC and NV VMs only support VM disks that are backed by Standard Disk Storage (HDD).  NCv2, ND, and NCv3 VMs only support VM disks that are backed by Premium Disk Storage (SSD).

Connecting to the DSVM

If you are using a Windows data science virtual machine, once the DSVM is provisioned, you can remote desktop into it. 

If you are using a Linux data science virtual machine, once the DSVM is provisioned, you have a couple of choices on how to connect to it.  More details are here, but the quick summary is that you can use any of these options:

  • For terminal/console sessions: use SSH.  In the Azure portal, after provisioning your VM, you can click on the “Connect” button to get the exact ssh command.  If you have Bash on Windows support*, you can use ssh right from the Bash app in Windows.  Otherwise, you can download a third-party tool like Putty.   
  • For graphical sessions: use the X2Go client
  • For Jupyter notebooks, use JupyterHub by browsing to https://your-vm-ip:8000 or JupyterLab by browsing to https://your-vm-ip:8000/lab (fill in the appropriate IP address for your virtual machine). 

* In Windows 10, you can enable the Windows subsystem for Linux by running this PowerShell command as administrator and rebooting: “Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux”. 

Resources

Azure Data Science Virtual Machine documentation: don’t forget to explore the whole tree in the left-hand pane 

Data Science Virtual Machine Plans and Pricing: note that this is for the Windows Server 2016 version specifically

Intro to Deep Learning VM

Create a Deep Learning VM

Get the Deep Learning VM

Virtual Machines with GPU support

Get to know your DSVM: shows all of the tools, platforms, utilities, and samples that are included in the Data Science Virtual Machine, neatly organized by category

Data Science Virtual Machine product webpage: this is more of a high-level overview