C++ AMP for the CUDA Programmer


Hello CUDA programmers!

If you’re familiar with CUDA, then this blog post and the attached guide may be for you.  This learning guide will introduce you to C++ AMP by mapping familiar CUDA terms and concepts to similar terms and concepts in C++ AMP.  It’s divided into two parts:

  1. A step-by-step guide to rewriting a CUDA implementation of matrix multiplication in C++ AMP.  Using this familiar algorithm should help you focus on the process rather than the algorithm.
  2. A non-exhaustive set of tables that map the most common CUDA features to C++ AMP features.

Please keep in mind that this guide is an introduction to C++ AMP only.  It covers only the most common features of both CUDA and C++ AMP, omitting many other important topics.  It provides no guidance on how to use C++ AMP effectively or how to port your overall solution, which may be using additional libraries.  In addition, this guide is not intended to be used to compare the merits of C++ AMP with those of CUDA.  By presenting this guide, I am not suggesting you drop what you are using and adopt C++ AMP.  Rather, the purpose of this guide is to help you more quickly ramp up on C++ AMP should you be interested in learning about this novel approach to accelerating general computations on a GPU using your existing knowledge.  You can learn more about C++ AMP by following the links to the right of our C++ AMP blog.

With those caveats out of the way, please download and enjoy the PDF guide:

C++ AMP for the CUDA Programmer

We encourage and welcome your questions and feedback.  For questions and feedback specific to this guide, please use the comments section of this blog post below. In response to a question, we have also written a blog post on CUDA libraries. For questions and feedback related to C++ AMP in general, please continue to use our MSDN forum.

Please note that similar guides are available for OpenCL and DirectCompute programmers.

C++ AMP for the CUDA Programmer.pdf

Comments (3)

  1. Jack says:

    Great work, thanks.

  2. Tommy says:

    Hi Steve,

    Thanks for sharing the pdf file, which shows many CUDA keywords that can be replaced by C++ AMP codes. But still, I have wonders on how to replace several other CUDA keywords. Can you give advice on what C++ AMP can replce the following CUDA codes?

    1. __align__(16)

    2. Cuda::HostMemoryReference1D<int>

    3. DeviceMemoryLinear2D<int>

    4. DeviceMemoryPitched3D<int>

    5. cuda_safe_call() related

    5.1 cuda_safe_call( cudaGetLastError() );

    5.2 cuda_safe_call( cudaUnbindTexture(…) );

    5.3 cuda_safe_call( cudaDeviceSynchronize() );

    6. cudaBindTexture()

    Thanks in advance!

  3. Amit says:

    1. __align__(16)

    >> The equivalent of this capability in the MS VC++  compiler is __declspec(align(#)) which allows you to control the alignment of user-defined data.

    2. Cuda::HostMemoryReference1D<int>

    >> The C++ AMP concurrency::array_view type enables multidimensional views over existing CPU memory.

    3. DeviceMemoryLinear2D<int>

    >> The C++ AMP concurrency::array and concurrency::array_view types are equivalent abstractions of multidimensional data containers.

    4. DeviceMemoryPitched3D<int>

    >> There is not direct equivalent of this in C++ AMP. However, if you want to use multidimensional data with specific pitch, you can achieve the same through using the "section" capability of array and array_view types in C++ AMP. Note that when doing this, you would be responsible for defining the pitch unlike the CUDA pitched allocations where the CUDA runtime determines the pitch.

    5. cuda_safe_call() related

    5.1 cuda_safe_call( cudaGetLastError() );

    >> AFAIK cuda_safe_call is just a macro for better error diagnostics for CUDA API calls in debug mode. C++ AMP uses exceptions for runtime errors and when compiling C++ AMP programs in debug mode, you would automatically get detailed debug diagnostics to help you better understand the error.

    5.2 cuda_safe_call( cudaUnbindTexture(…) );

    6. cudaBindTexture()

    >> Please refer to our blog post on textures  to learn about texture capabilities in C++ AMP.

    5.3 cuda_safe_call( cudaDeviceSynchronize() );

    >> accelerator_view::wait is the equivalent C++ AMP API.

    Please feel free to ask any further questions on our MSDN forum.