Learn C++ AMP

So you are a newbie to C++ AMP – you know nothing and want to quickly get started, we have you covered – keep on reading!

Step 1 – Do this first!

While it is typical for many to learn a new technology by writing your own "Hello World", if you don’t want to wander off into the minefield of GPU programming on your own, please start with what we provide in terms of articles and videos.

There are two articles specifically aimed at newcomers:

When you are done with the articles, please invest 68 minutes to sit back and watch 4 screencast videos:


Step 2 – optional if you are already an expert in GPU computing

At this point you’ve spent 2 hours and you understand not only how to hit the ground running with your own algorithms, but also how C++ AMP is positioned. This is important so if you skipped step 1, please go back and complete it before even reading on!

Now, if you are further familiar with other approaches, you may want to optionally browse one of our three learning guides: C++ AMP for the OpenCL Programmer, C++ AMP for the CUDA Programmer, and C++ AMP for the HLSL/DirectCompute Programmer.


Step 3 – Dive into code

This is probably the step most of you wanted to start with, and I hope you resisted until now. Now that you have the necessary theoretical background, it is time to explore our many samples and also the code in the open source libraries.

Both of these will help you internalize patterns for bringing C++ AMP into your own code base, and some of it may also be code that you can actually use as-is in your code. Go ahead and write your own algorithms, or convert your existing CPU code to use C++ AMP. If you run into situations where you need help, ask questions in our MSDN forum – given that you consumed the resources above, rest assured you won’t be asking newbie questions 😉


Step 4 – Become an expert

After step 1 and step 3 and cutting your own code, you may feel like you are ready to take it to the next level and that you want to consume more advanced resources – here are some pointers

Comments (9)

  1. jVangsnes says:

    Thank you for this useful post! The only thing that keeps me from trying c++ amp are cross plattfrom considerations and a real advantage in compression to opencl. Maybe someone could do a blogpost about what the benefits of using c++ amp instead of opencl, cuda etc would be in the future. Thanks again!

  2. dmccrady says:

    Thanks, we're glad you found this useful.  We'll consider your suggestion about showing the competitive advantages & disadvantages compared to other platforms.  However, one thing to note right up front is that this is a fast-moving space, and the advantages & disadvantages of any given platform — including C++ AMP — are likely to change very quickly.  Rest assured that we're committed to making sure that C++ AMP fulfills its ultimate goals of being portable, productive, and performant in the coming years.

  3. Bruce Bagwell says:

    We have found your blogs and videos excellent for showing us how to leverage the AMP system.  One problem we have encountered in introducing amp code into our projects is that many of our applications need to be compiled with the /Fcwchar_t- switch turned off.  When we add amp.h to this code it ultimately generates a number of linker errors.  Is there a workaround for this?  I suspect that this is and will be a issue for many developers.

  4. bobyg says:


    bruce, as noted in response to your query in the forum, this is a known issue and we were wondering whether you can segregate the parts of the project that use AMP to compile differently. We are more interested in knowing about your scenario and please do contact me (bobyg AT Microsoft dot com) with your scenario details.


  5. Hi Daniel, is AMP compatible with OpenMP? For example, can I put AMP code inside a OpenMP style for loop?  

  6. Hi Daniel, followup to my previous question about the compatible between AMP and OpenMP. Actually I have tried to put AMP code, i.e. MatAdd.cpp (example given in the artivle "A Code-Based Introduction to C++ AMP") inside a OpenMP style for loop, which preformed no problem. I wonder whether it always performs OK?

  7. Hi Y.T. Tommy, whether OpenMP and AMP would work together or not depends on your scenarios. In general, OpenMP tries to execute the code_block on multiple threads in parallel. If you put a parallel_for_each inside the code_block of an OpenMP style for loop, you now have two levels of parallelism: first parallelize on multiple threads on cpu enabled by OpenMP, and then within each thread, parallelize on an data parallel accelerator (e.g. GPU) enabled by AMP. As long as there is no data race between each instance of the code_block, it should work.

  8. KirkPatrick says:

    This is great info, and allows divide-and-conquer GPGPU techniques normally driven by shaders to be applied dynamically.

    However, saying it should only take 2 hours to get to step 3 seems like best case scenario for sure. For example, it doesn't "take 68 minutes" to watch 4 videos that total 69 minutes, 33 seconds. It also takes time to open the links themselves, and to get a cup of coffee (recommended).

    Next, the first 2 links explode to 7 if you need to learn about lamdas (and Functors). So there are only 52 minutes allotted to read (up to) 26 pages of heavy, source-code based examples (2 hours minus 68 minutes). The reading meanwhile is about as light as a Siggraph article (to the author's credit).

    Finally, it's almost comical to see that – 12 minutes into a (source-code accelerated) video explaining how AMP-lifying ™ a one line for-loop required several changes to the C++ language itself – the narrator says "See? Simple!"

    I suspect there were metrics involved, underlying an agenda to minimize the overhead of this. While this might fall slightly short of a full-blown conspiracy, readers still might apply the "multiply by 3" standard for software estimates to the authors time estimates. I'm sure hoping my boss did when he assigned this to me yesterday.

  9. andrei says:

    hello :)

    can anybody give me some examples(code based) of how to use a multi-gpu platform?

    I am trying to perform different types of operations (like matrix summation or matrix multiplication) on a multi gpu enviroment in order to see improvement from using more than one gpu. And it's giving me a hard time because the lack of examples, documentation in this particular area of how to use multiple gpus

    i would be very thankful if you can help me in this area.

    have a good day!