What features do you want in C++ AMP V3?

A little over year ago, we released the first version of C++ AMP technology as a part of Visual Studio 2012. In Visual Studio 2013, we are on track to deliver the next version of C++ AMP. Hopefully, by now, you had a chance to learn about what is new for C++ AMP in Visual Studio 2013. We are delighted to see how C++ AMP has been received by the community. C++ AMP is being used in demos such as BUILD 2013 keynote demo, and by various customers including Aviary and Kinect Fusion. Additionally, it’s encouraging to see our partners like Intel doing proof of concept implementation of C++ AMP’s Open specification.

With the second release getting ready for release, the team is actively planning new features and improvements for the next version of C++ AMP. We want to reach out to the community for suggestions and feature requests. Your suggestions and requests play an integral role in our planning activities. If you have any feature request or suggestion, we would love to hear about it. So please do let us know about your suggestions either in this blog or at MSDN forum.

Comments (20)

  1. Harry Dev says:

    Support for handling int8,uint8,int16, uint16, int32, uint32 vectorized on CPU and/or GPU. E.g. for doing uint8 image processing etc.

  2. ronag says:

    Unified memory textures.

  3. cpu says:

    Just make it like CUDA in terms of classes/structs support

  4. Tom Kirby-Green says:

    Jim mentions there that future versions of the autovectorizing C++ compiler would not just target many conventional cores and vector units across those cores but also GPGPU cores.


    How does AMP fit into this picture? If we can have general mapping of regular C++ code into GPGPU cores then does that mean we won't have to decorate C++ functions in the future with the 'amp' highlight.

  5. Josh Reuben says:

    As C++ AMP is an open spec and is already a great abstraction, why not propose to the C++ standards committee to setup a working group to roll it into C++17 proper?

  6. Oscarbg says:

    Just allow same feature set as cuda and opencl 2.0 so nested parallellism i.e. Kernels call kernels, pipes concept in opencl 2.0 to pass data between kernels, and also pointers shipping since cuda 1.0.. Also recursion as supported in cuda since fermi.. Function pointers.. And new gpu instructions like intra warp instructions shuffle in kepler.. 64biti atomics float atomics.. Named barriers i.e threads in threadblock synchronize only to threads with same barrier id see cudadma project for use case.. And graphics features as support new tiled sparse textures in dx11.2, for depth textures, msaa textures, mipmaps,cubemap textures.etc..

    Also hope we get bindless texture in dx12 and amp expose it too..

    Also last year you anounced a spec with a roadmap hope next year we get whats called 2.0 in spec..

  7. adhie says:

    Just more synchronization, please 🙂

  8. LKeene says:

    Some manner – other than launching a new kernel – of waiting for all device threads to complete before continuing.

  9. antediluvian says:

    complex numbers? haven't really tried the FFT library though yet

  10. A compiler option to generate warnings on all usage of doubles in AMP-lambdas would be very useful. For example, manually searching for double literals when porting code is a pain and if you miss only a single one colleagues developing using Win7+warp will get a crash. Also useful to avoid inadvertently running (parts of some lambdas) at <5% of possible speed on many cards…

    Ability to query if ECC is supported on accelerator (and enabled in driver). Please distinguish full ECC support (memory and registers etc) from the partial support available in some recent cards.

    A "C++AMP standard library" of highly-optimized utility functions (available directly as part of VS). Common reductions, prefix sum, sort, etc… I know there are parts of this available on the web, I just think it would be reasonable to have it as part of VS. Also, wrt quality/regulatory issues, getting it as part of VS is *much* nicer that "something-I-found-on-the-web".

    64-bit integers and atomic operations (already mentioned). (My usage: there are no atomic operations for floats, and even if there where, they would not be deterministic — sometimes you can use fixed-point math and int atomics instead, but 32-bit ints are often too limited).

    8/16-bit datatypes (already mentioned) (the lack of such datatypes are actually limiting real use cases involving 3D data due to out-of-memory issues).

    Some kind of automatic lambda capture clause (everything by value except arrays).

    I have noticed that lambdas using WARP sometimes are slower than (very very similar) lambdas executed by PPL. Maybe the JIT compiler can recognize such cases and only use WARP when beneficial.

    I am already very happy with C++AMP. Looking forward to v3…

  11. David Cuccia says:

    2D and 3D (and higher!) texture & array/array_view interpolation ala CUDA text2D/tex3D. Cubic/hermite spline interpolation in 3D or higher up to 6D would be amazing.

  12. bobyg says:

    Thanks everyone for your feedback. keep them coming..

  13. Dan Ritchie says:

    Code that can execute for days or weeks without interruption.  Ie, true multitasking.  I don't think the hardware supports it yet, but it seems like it's coming along in a year or so.  We need to get those 15 hour per frame that Pixar is using down a bit.  You know, render farms would benefit from GPUing.

  14. Dan Ritchie says:

    >A "C++AMP standard library" of highly-optimized utility functions (available directly as part of VS). Common reductions, prefix sum, sort, etc… I know there are parts of this available on the web, I just think it would be reasonable to have it as part of VS.

    I agree.  We normally try not to use code we "find on the web" because of the difficulty and legality of adding the proper credits to our about box.  It sounds like a stupid problem, but it's a problem.

  15. Arman Schwarz says:

    Remove the ban on zero-length Concurrency::array, it seems arbitrary and impedes readability of code where such arrays are needed (e.g. varying topology neural networks).

  16. 3DMashUp says:

    API's to monitor device performance, throughput and state? How can we deploy  C++AMP applications as compute 'services', if we can't monitor and measure their performance remotely or determine when they hang-up?

  17. xianghe says:

    It would be nice if I can write pixel/vertex/geometry/hull/domain shader with c++ AMP.

    I really love the feature that C++ AMP runtime takes care of copying data between CPU/GPU for me.

    Right now c++ AMP is limited to compute shader.

    It would be nice if I could do the same for the other shaders mentioned above.

  18. Ivan says:

    It'd be really cool to see some HLSL intrinsics like normalize(), mul(), dot() and so on.

  19. Dan Ritchie says:

    I would really like native uint support.

    I would still love to see the api contain Perlin noise in 2 and 3 dimensions (it already has 1 dimension)

    I would still love to see native support for random numbers.

  20. Dan Ritchie says:

    Sorry, should have read "native uchar"