The C++ AMP FFT Library

Following up on the previous post in which I introduced Fast Fourier transforms (FFT) on the GPU, in this post I will talk about the C++ AMP FFT Library and explain how to use it in your application.

As was noted in the previous post, DirectX already contains an FFT API. So what was left for us to do is provide a simple C++ AMP wrapper on top of it which is available as the C++ AMP FFT Library on CodePlex. To use it follow these steps:

  1. Download the C++ AMP FFT Library from CodePlex. The library is in the form of a DLL and includes the headers and lib files to link to the FFT library DLL. The library sources are also available for download from CodePlex can be built in the Visual Studio 2012 IDE using the provided Visual Studio project file.
  2. Include inc\amp_fft.h in your source.
  3. Compile your .cpp source files.
  4. Link the resulting object files with the amp_fft.lib or amp_fftd.lib available as part of the library.

If you download the library sources, the sample directory has an example Visual Studio project showing how to use C++ AMP FFT library to perform forward and inverse transforms.

Let’s go over a very simple use of the fft class in the C++ AMP FFT Library and explain it.

  1. fft<float,1> fft_transform(extent<1>(100));
  2. array<float,1> input_array(extent<1>(100), pointer_to_input_data);
  3. array<std::complex<float>,1> output_array(extent<1>(100));
  4. fft_transform.forward_transform(input_array, output_array);

The first line creates the transform object, of type fft<float, 1>. Note that like C++ AMP arrays, the transform type captures the element type of the transformation, and
its dimension:

  • Element type. This must be either float, or std::complex<float>. No other types are supported at this point.
  • Dimension. The dimension can be 1, 2 or 3. DirectX can actually handle higher dimensions, but the C++ AMP library currently only supports up to 3D transforms. Higher dimensions may be supported in future. Also, like concurrency::array, an fft object is initialized with a C++ AMP extent object. The extent associated with an fft determines the shape of arrays that could be transformed by the fft object. Because FFT transforms are very sensitive to input sizes, a single fft object can only handle a single size, which is why you have to specify it at initialization time.

After the fft object is created, you’d typically cache it and reuse it to transform many different inputs, albeit all have to have the same extent.

Lines 2 and 3 define such possible input and output, and line 4 transforms the input into the output. As is typical with C++ AMP, the output is left on the accelerator, and you have the choice if and when to copy it back to the CPU. You could, for example, use the output of an FFT transform in a subsequent parallel_for_each call, without ever bringing the results back to main memory.

Some caveats

Because the FFT library is implemented using DirectCompute and not using C++ AMP, it has some disadvantages when used from C++ AMP.

  1. The FFT library only works with floats and complex-numbers based on floats. It doesn’t work with doubles, and it doesn’t take advantage of the new C++ AMP high-accuracy double-precision math library for the GPU.
  2. The library only accepts arrays, not array_views. This is due to the fact that we can only provide interoperation with DirectX buffers on the bases on whole arrays, rather than arrays views, which don’t have a good counterpart concept in DirectX.

Up next

In my next post on the topic of FFT I will share a test program which illustrates higher dimensional transforms and inverse transforms.

Part I: Fast Fourier Transforms (FFT) on The GPU

Part II: C++ AMP FFT Library

Part III: C++ AMP FFT Test Application


Comments (5)

  1. P.S.M. Goossens says:

    Will there be an option to do a batch oif 1D FFT like CUFFT and FFTW?

  2. Don McCrady says:

    Currently the FFT sample only does a single transform at a time.  The ability to batch transformations is something we're considering for future releases.  Thanks for the input and please keep it coming.

  3. Michael says:

    It would be nice if we could use this (or at least the DirectX FFT API) in Metro style apps.

    d3dcsx.h / d3dcsx.modified_copy.h contains

    #pragma region Desktop Family


    Do you know if there is any reason for this (most of the other DirectX stuff is available under Metro)?

  4. Asif Bahrainwala says:


    I have implemented an AMP wrapper for FFT using DX11…/Fast-Fourier-transform-using-DX

    I get incorrect values when running inverse FFT for input samples size of 70,101 etc.

    It works perfectly for input sizes of 10,20,100.

    I have verified this by implementing an inverse FT (not FFT)

  5. Steve says:

    Continually asking for small 1024 sets of sampled audio data to transform on the GPU hits a copy on and off GPU bottleneck. How can I speed up FFT calls  by sending in more data, processing multiple fft windows, and passing out more data from the GPU.

    Like Input set of data of length 1024 + (64*10) = 1664 sample floats as my FFT input array/vector, and operate FFT on each window from 1-1024, 64-1088,  etc… for each of the 10 windows, and return each of the result in its own 1024 vector of complex floats?

    Current Code:

    Performs fft on three consecutive windows, returns only the last 1024 float vector of data.

    std::vector<std::complex<float>> RealFFTD1_Window1024(float* input_data, int inputSize, int windowSkipCount, int outputSize)


    const int windowLength = 1024;

    const int y_FREQOUT_Resolution = windowLength;

    accelerator accl = accelerator();

    extent<1> e;

    e[0] = windowLength;

    // Create the FFT transformation object

    fft<float, 1> transform(e);

    int windowPosition = 0;

    float test[windowLength];

    std::vector<std::complex<float>> transformed_vec;

    for (int nextWindow = 0; nextWindow < inputSize; nextWindow = nextWindow + windowSkipCount)


    // Copy the input to the accelerator

    for (int index = 0; index < windowLength; index++)


    test[index] = input_data[index];


    std::vector<float> input_vec{ test, test + windowLength };

    array<float, 1> in_array(e, input_vec.begin());

    // Apply the forward transformation

    array<std::complex<float>, 1> transformed_array(e);

    transform.forward_transform(in_array, transformed_array);

    // Copy the results back and print them

    transformed_vec = transformed_array;

    windowPosition = ++windowPosition;


    return transformed_vec;


Skip to main content