New Release of the AMP Algorithms Library

If you are fond of high-performance algorithms, you will be pleased to find out that our friend Ade Miller has just issued a new iteration of the AMP Algorithms Library. As usual, Ade's work is top notch, and it brings notable improvements across the board; in his own words:

Finally, there is a new release of the C++ AMP Algorithms Library! It has taken a while, largely due to other things, like CppCon taking up my time. This release contains the following:

  • New C++ AMP features:

    • AMP and STL algorithms no longer depend on DirectX scan implementation.

    • New implementation of amp_algorithms::scan that does not have a direct dependency on the ID3DX11Scan and ID3DX11SegmentedScan interfaces.

    • The amp_stl_algorithms::copy_if and remove_if algorithms use the new scan implementation now, for improved performance.

    • Implementation of radix sort amp_algorithms::radix_sort.

    • New utility functions: log2, is_power_of_two, count_bits, padded_read, padded_write, pack_byte and unpack_byte.

    • New namespace added for DirectX dependent features, amp_algorithms::direct3d. All DirectX code now in a separate header file amp_algorithms_direct3d.h.

  • New C++ AMP STL features:

    • inner_product

    • minmax

    • pair<T1, T2>

    • rotate_copy

  • New SAXPY example.
  • Reorganized unit tests, consistent names and test categories.

As usual, you can download the latest iteration from:, and enjoy the benefits of heterogeneous parallelism.

Comments (9)

  1. Royi says:

    You really should create something comparable to Intel IPP and MKL to really drive this to the market.

  2. bdh0404 says:

    I have a simple question, is there any plan to remake C++ amp runtime with using DirectX 12?

  3. Alexey says:

    Do you really have any plan to develop this project?

    I ve optimized matrix multiplication and my version gives following results in managed code:

    1. Sequential realisation runs better than parallel in random traveler!

    2. My parallel code runs 2-5% faster than c++ amp warp version.

    Does c++ amp warp really use sse?

  4. Alexey says:

    Concerning Random Traveler again…

    I have tested c++ compiler vectorization and visual studio 2015 new c# compiler.

    So vectorized tiled version of matrix multipication (C++ .dll best verision without any transposition) on 2900*2900 matrix loads only one core and runs at the speed approximately 5 GFlops on my computer, C# Parallel loop with partitioner and unsafe pointer on Visual Studio 2015 approximately 4 GFlops. Warp version 1.1 GFlops. And the first test of naive C# was at the speed 0.04 GFlops, your verision of parallel loop where you decided to use pointers was about 0.35 GFlops.

    GPU is needed only with bigger matrices.

    Seems you are giving unfair information.

  5. Mike says:


    VS 2015 and Win 10 are out.

    Do you have any plans for DirectX 12 or is c++ amp dead?


  6. Peter Hauser says:

    Is c++ amp dead?

  7. Richie says:

    Any news on C++ AMP with Windows 10 / WDDM 2.0?

  8. gpu_compute says:

    guys what's the state of C++amp?

  9. It's dead Jim says:

    Unfortunately, after 2 years without any update or news, we can now safely say that this project is dead.


Skip to main content