Fast Fourier Transforms (FFT) on the GPU

The Fast Fourier Transform (FFT) is a family of numerical algorithms which has a large number of uses in many fields of computational science and in particular in signal and image processing. Typically the transformation can be thought of as taking a signal which is a function of time, for example, the amplitude of an audio track as it oscillates through time, and transforming it into an equivalent representation in the frequency domain, i.e., expressing the signal in terms of the sinusoid frequencies which it is comprised of. Sometimes the input signal describes a function not of time, but rather of space, and rather than a single-dimensional signal, the signal can be two dimensional, three dimensional or even of a higher rank. For example, jpeg and mpeg encodings transform images (two-dimensional spacial signals) into the frequency domain, using the Discrete Cosine Transform, which is a close relative of the Discrete Fourier transform.

A good FFT library is not easy to implement. Usually, algorithms aren’t dependent on the exact number of elements fed into them as input. FFT is different, in an annoying manner: an efficient FFT algorithm does look quite different for different numbers of inputs. Let’s illustrate that with an example: a sorting algorithm that works well for 1024 elements, will generally also work well for 1033 elements. However, an efficient FFT algorithm for 1024 elements typically cannot even handle 1033 elements correctly! That’s because efficient FFT algorithms depend on the prime factorization of the number of elements in the input. The best case is possible when the input size is a power of two, but that’s not always the case, and padding the data to the next power of two distorts it.

Other sources of complexity in FFT implementations come from

  1. The handling of multi-dimensional transforms
  2. Handling of forward vs. inverse FFT transforms
  3. Different types of transforms for real vs. complex numbers
  4. Numerical stability

Therefore, developing an accurate, efficient and general (i.e., one that can handle arbitrary input sizes) FFT library is not an easy task! Fortunately us, the DirectX SDK already contains such a library, which originated from a long-term and grounds-breaking Microsoft research on FFT libraries and their tuning for GPUs.

Now that we have some background on FFT, I’ll stop here and in my next post I’ll share an FFT library for C++ AMP (a wrapper over the DirectX FFT API).

Part I: Fast Fourier Transforms (FFT) on The GPU

Part II: C++ AMP FFT Library

Part III: C++ AMP FFT Test Application

Comments (4)

  1. Broken link on MSR paper: says:

    Hi, it is about one month ago, when I was looking on this page(Fast Fourier Transforms (FFT) on the GPU) and downloaded paper under "grounds-breaking Microsoft research on FFT libraries and their tuning for GPUs." presented here by a link. I have looked on paper and then deleted it thinking I can get it again and it is not possible anymore :(. Thanks for answer.

    Marek Pavlu

  2. Charles Fu says:


    I just tried the link, and I still can download the pdf file "grounds-breaking Microsoft research on FFT libraries and their tuning for GPUs." from the link. Here is the URL:…/default.aspx


  3. Brandon Lloyd says:

    Actually that is not the right paper. That paper talks about FFTs on DX9 class GPUs. Here is the paper you are referring to:

    High Performance Discrete Fourier Transforms on Graphics Processors


    We had a follow up paper as well:

    Auto-tuning of Fast Fourier Transform on Graphics Processors


  4. Asif Bahrainwala says:


    We have had trouble with inverse FFT using DX11(Microsoft DirectX SDK : June 2010)

    we get incorrect values

    please refer:…/Fast-Fourier-transform-using-DX

Skip to main content