CPUs, GPUs, TPUs, oh my!

Introduction

With the rebirth of AI, the demand for computing has gone through the roof. If you break up an AI program broadly into learning and inference parts, many of the learning algorithms take weeks to execute on traditional CPUs as more and more pertinent data becomes available.

CPU clock speeds are flattening and it's no longer viable to run these learning algorithms on CPUs even if they have multiple cores and their execution pipeline is highly optimized. Enter Graphical Processing Units (GPUs).

What is a GPU?

If you're unfamiliar with what GPU is there is a short and to the point entertaining two minute video created by the same folks who bring the popular show mythbusters at https://www.youtube.com/watch?v=-P28LKWTzrI.

Essentially the difference between a CPU and a GPU is that the CPU has a few cores optimized for sequential processing whereas a GPU has thousands of more efficient cores suitable for handling multiple tasks in parallel.

Utilizing these GPUs enables speeding up some of the training algorithms from weeks to days or sometimes even hours so much so that GPUs have become the norm in compute intensive tasks that exploit parallelism.

The GPU Tech Conference

I attended the GPU Tech conference billed as the #1 GPU Developer event held at the San Jose convention center in May 2017. My hope was to take the learnings and help my customers, especially in the research community.

The conference was a multi-day event with NVIDIA taking the center stage. It was a very well attended event with a main keynote, break out sessions, Hands-On Labs, posters and plenty of nerd hallway talk. There were a variety of topics covered including Robots, self-driving vehicles, Virtual Reality (VR), High Performance Computing (HPC) and so on.

I attended a number of sessions including a Hands-On Labs on Microsoft Cognitive Toolkit which takes advantage of GPUs when present. The session replays should be available shortly. In the meantime, 2016 session replays are at https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php.

Jensen Huang, the CEO of NVIDIA was the main keynote which was packed with announcements around AI, Deep Learning and of course the next generation GPU codenamed Volta. It's capable of delivering 100 Teraflops per second (TFLOPS). The entire video is in multiple parts starting with https://www.youtube.com/watch?v=ddBIF1fnvIM&index=5&list=PLZHnYvH1qtOZPJtv1WNYk0TU4L3M_rnj4 and the complete transcript is at https://blogs.nvidia.com/blog/2017/05/10/live-jensen-huang-gpu-technology-conference-2017/.

Jason Zander, VP at Microsoft and Matt Wood, GM at Amazon both announced their intention to provide Volta on their respective clouds soon. Jason Zander infact missed the //build conference to signal the intention on how closely the companies are collaborating to make GPU mainstream to developers.

Based on the attendees and the buzz the conference it looks like GPUs are mainstream. As one of my fellow attendees remarked, "The lunch service made my head spin because it was so large and overwhelming (imagine a building the size of an aircraft hanger, completely full with people, everywhere, practically piled up!)."

GPUs on Azure

Due to our close partnership with NVIDIA, GPUs was announced for GA last year as detailed in the blog https://azure.microsoft.com/en-us/blog/azure-n-series-general-availability-on-december-1/.

The NV series VMs are suitable for running hardware-accelerated workstation applications, designing the next concept car, or creating the next blockbuster movie. These instances support applications utilizing both DirectX and OpenGL.

The NC series VMs are suitable for demanding HPC and AI workloads. Additionally, RDMA (Remote Direct Memory Access) over InfiniBand provides close to bare-metal performance even when scaling out to 10s, 100s, or even 1,000s of GPUs across hundreds of machines with platforms such as Microsoft Cognitive Toolkit (CNTK), Caffe or TensorFlow. These are available in the NC24r VMs.

More N-series VMs based around the more recent NVIDIA GPUs has been announced for Microsoft Azure.

There are many case studies of successful adoption of GPUs on Microsoft Azure.

TPUs

Google recently announced TPUs which significantly accelerate an aspect of neural networks used in Machine Learning i.e. multiplying matrices.

TPUs came about since many of the Machine Learning algorithms were taking too long on traditional CPUs and GPUs were expensive at scale. The ultimate goal was still to accelerate machine learning by utilizing parallelism.

Summary and Conclusions

The inevitable conclusion that most attendees and I came to was that GPUs are becoming mainstream and is not esoteric to graphics or game development. Although GPUs are not replacing CPUs (each serves a different purpose), as a decicion maker or a developer/architect still trying to figure out how to leverage GPUs, the cloud provides an easy way to deploy HPC clusters based around these GPUs without having to make a significant capital investment.

Even amongst GPUs, there is no one size fits all. For instance Azure offers multiple N-series VMs, some good for computing, some for visualization and some good for message passing workloads.

Many companies and research institutions large and small are using or investigating machine learning. GPUs will significantly help in these efforts so much so that the face of computing may look significantly different a decade from now.