Introducing Shevlin Park–A Proof of Concept C++ AMP Implementation on OpenCL

Our friends at Intel have been working on a project called Shevlin Park, which is a proof of concept project that augments CLANG and LLVM with C++ AMP, and uses OpenCL as its underlying compute driver substrate.  Dillon Sharlet from Intel presented this work on November 8th at the LLVM Developer’s Meeting.

It is important to stress that Shevlin Park is a proof of concept project and that Intel has not announced any release plans for the Shevlin Park technology.  Intel has used it to perform comparisons on Intel hardware and Intel drivers to compare potential implementation efficiencies between different driver API implementations.  There’s more about that in Dillon’s presentation, and I’ll touch on that toward the end of this post.

When Microsoft announced C++ AMP back in June 2011, we told you that we would release the C++ AMP specification under the Microsoft Community Promise – essentially opening up the specification to allow any C++ compiler implementer to add C++ AMP to their compiler.  Shevlin Park serves as an example of the platform portability potential intended by the Community Promise.

We are delighted by this announcement for several reasons.

First, while we have always claimed that the C++ AMP specification is platform neutral (i.e., independent of DirectX), the fact is we at Microsoft have never verified this claim by attempting to implement it on top of any other substrate.  The Shevlin Park project suggests that the mandatory parts of the open specification are indeed platform neutral by demonstrating a working implementation on top of OpenCL.

Second, it validates the simplicity and approachability of the C++ AMP programming model.  As discussed in the presentation, C++ AMP “integrates host and device in the same programming language”, is “simpler to program”, and “hides the driver API”.

Third, it validates portability.  Although the Shevlin Park project and its experiments were all performed on Windows, there is now very little doubt that it can easily (if not effortlessly) be ported to other operating systems.  In a cross-platform world, this demonstrates that using C++ AMP does not tie customers to a single deployment platform.

Since the main purpose of Shevlin Park was to enable experimentation and comparison of GPU compute technologies, Dillon spent a couple of slides talking about the performance difference between C++ AMP and OpenCL on their Ivy Bridge hardware.  If you look at Dillon’s results, you’ll note that on the GPU (slide 26) you’ll plainly see that Shevlin Park and Microsoft’s C++ AMP are pretty close for the GEMM and Convolution benchmarks; however, Shevlin Park handily outperforms Microsoft’s C++ AMP on Histogram and FFT.  Dillon explains how this difference is due to the fact that Intel’s OpenCL implementation (on which Shevlin Park sits) takes advantage of Ivy Bridge’s shared memory architecture that can avoid copying memory between CPU and GPU.  Since DirectX does not have this capability, Microsoft’s C++ AMP on DirectX cannot yet take advantage of this.  To summarize, these kinds of performance differentials illustrate how rapidly the GPU computing space is evolving, and that it can take time for platforms to evolve to keep pace.

If you’re interested in the technology behind a C++ AMP implementation, I encourage you to have a look at Dillon’s Shevlin Park presentation.  Further inquiries about Shevlin Park can be directed to Arnon Peleg at Intel, or through Intel’s Visual Computing Source OpenCL forum.