Using GPU-powered Virtual Machines in the Cloud for your Machine Learning and Deep Learning workloads

Until a few years ago running Machine Learning (ML) and Deep Neural Networks required owning really powerful computers with lots of CPUs or even clusters of machines. With the adoption of GPUs for non-graphics related workloads new opportunities became available. You can now run many of your workloads in a much shorter period of time making use of modern GPUs you already have instead of using CPUs. The good thing is, you do not even have to own your own servers equipped with pricy graphics hardware any more. As we have become used to for CPU-powered machines we can now also get GPU powered machines on-demand in the cloud.

Microsoft Azure offers two categories of GPU powered machines with our N-Series. NC VMs feature NVIDIA Tesla K80 GPUs for compute and machine learning workloads. NV VMs are tailored towards 3d workloads and rendering and feature NVIDIA Tesla M60 GPUs. Learn more in these VM sizes in this announcement.

So how do you get started for Machine Learning and Deep Learning workloads?

The easiest way to get going (after getting your Azure account) is to start up a GPU powered VM, based on our Data Science VM template. This VM template has the NVIDIA CUDA Toolkit with driver, CUDA and cuDNN already installed. The template is available both with Linux and Windows OS. There are small differences between the software installed, but they both feature popular runtimes and almost all relevant tools for Machine Learning and Deep Learning on GPUs and CPUs. Check this website for a full list of preinstalled tools. The VM is really easy to set up in Azure and you can choose between lots of VM sizes in Azure, both with or without GPUs. The only thing to keep in mind is, that GPU VMs are not available in all of the Azure regions. Check this overview for the availability of NC-Series VMs.

I don't need all these tools

What if you do not need all the tools included in the Data Science VM and you want to get started with your own VM and tools?

There are just a few steps to go and a few choices to make. I collected most relevant links for you and will describe most of the steps to get going with a GPU-powered Linux VM here:

  • Start you own NC-Series VM using the Azure CLI or the Azure Portal. I recommend using Ubuntu Linux as most of the documentations have detailed steps for Ubuntu and you will get ready-to-use packages for most tools.
  • Log into your VM using SSH (on Windows you can use PuTTY or Bash on Ubuntu on Windows)

! You might need to download some of the following tools with a desktop browser and copy them to your VM via SCP !

  • Install the NVIDIA GPU drivers for Linux inside your VM. You can find a detailed guide on the Azure Website. Check that you install the lasted version of the drivers as the Azure Website could be outdated. You can also find a manual for installing the driver on the NVIDIA website.
  • If required by your workload or application you might also want to install the cuDNN for running deep neural networks on your VM. This download requires signing up for the NVIDIA developer program (which is free). For me downloading the drivers involved jumping through some additional hoops. The problems and solutions are described on this website. Especially the wrong file extension was confusing, but easy to fix.

Now that the required drivers and tools for GPU workloads are installed you get started with your workloads based on frameworks like Tensorflow, CNTK or Caffe.

Let's use containers

A quick way to get started after installing the NVIDIA driver is using Docker containers that already have the required tools installed. Luckily there is an extension for Docker that allows us to make use of a NVIDIA GPU inside our Docker containers: nvidia-docker.

If using Ubuntu you can quickly get going with GPU powered container-workloads:

Many deep learning tools are already available as a Docker image. One example is H2O who already offer a Docker container including their GPU-powered Deep Learning environment. You will find all information on H2O Deepwater on GitHub. What is Deepwater (from the linked website)?

  • Native implementation of Deep Learning models for GPU-optimized backends (MXNet, Caffe, TensorFlow, etc.)
  • Train user-defined or pre-defined deeplearning models for image/text/H2OFrame classification from Flow, R, Python, Java, Scala or REST API

You can pull and start your H2O Deepwater container via: nvidia-docker run -it --rm opsh2oai/h2o-deepwater

Whether you are inside your container or on your GPU-equipped host VM, you can monitor processes and GPU utilization with the nvidia-smi tool. You can monitor it live using: watch -d -n 1 nvidia-smi

If hope this information was useful for you to get started with GPU-powered Deep Learning workloads in the cloud. An don't forget to shut down your virtual machines after finishing your jobs or using the auto-shutdown feature in Azure VMs.

Links: