Black Scholes using C++ AMP

In the financial industry, Black Scholes is one of the methods used to valuate options. This one is faster than Binomial Options. In this post, I will walk you through a C++ AMP implementation of Black Scholes.

main – Program entry point

Let’s start with main(), where an instance of the blackscholes class is created, an option price is computed, and it is then verified against results of a CPU implementation. The “blackscholes” class implements both the C++ AMP kernel and the CPU implementation. The constructor generates random data and initializes parameters.


This is the function where the C++ AMP kernel is used for computation. To start with, the input data is stored in a concurrency::array. The computation parameters are captured in separate array and passed to the kernel. This kernel schedules tiles of size BSCHOLES_TILE_SIZE and there is one thread per option price computation. The calculated options are stored in a separate output arrays. This implementation takes advantage of many cores on GPU to run threads in parallel. After the computation completes, the results are copied out to host memory for verification.


This function validates the results computed on GPU. The same input data is used to calculate results on the CPU (which is implemented in function “blackscholes::blackscholes_CPU”), then the results of CPU and GPU are compared using “blackscholes::sequence_equal”.


This is an interesting function to mention because of its restrict(amp, cpu) modifier. This function can be called from a CPU function as well as from a GPU kernel. In this sample this function is called from “blackscholes::blackscholes_CPU“ as well as kernel in “blackscholes::execute”.

Download the sample

Please download the attached sample of the Black Scholes that we discussed here and run it on your hardware, and try to understand what the code does and to learn from it. You will need, as always, Visual Studio 11.

Comments (4)

  1. Dmitri Nesteruk says:

    The CND function is a typical example where MAD (multiply and add) would be useful, but I don't see fast_math having such a function. Does the compiler optimize ordinary math, or is this instruction ignored?

  2. Zhu, Weirong says:

    Hi Dmitri,

    (1) Compiler is allowed to optimize multiplication + addition using the mad instruction if you use /fp:fast (However, this is no guarantee).  If you use /fp:precise or /fp:strict (which are mapped to the /Gis switch of the HLSL compiler), it will not do that.

    (2) User has control to explicitly requiring "mad". In the concurrency::direct3d namespace, we offer a "mad" function (…/direct3d-namespace-and-hlsl-intrinsics-in-c-amp.aspx) that will be guaranteed to map to HLSL's intrinsic mad. Please read the "Remarks" section of…/ff471418(v=vs.85).aspx, to see what HLSL mad offers.

    (3) In the precise_math namespace (…/math-library-for-c-amp.aspx), we offer:

        float fmaf(float x, float y, float z);

        float fma(float x, float y, float z);

        double fma(double x, double y, double z);

    It returns (x * y) + z, rounded as one ternary operation. In our current implementation, the double version is implemented by mapping to HLSL fma intrinsic directly (…/hh768893(v=vs.85).aspx). There is no float version of fma in HLSL, so it's up to us on the implementation, as long as the round-off error is small enough.



  3. dar says:

    The example code does not have a license.  Could you clarify how it may be used?

  4. LingliZhang says:

    Hi dar,

    This sample is released with Apache 2.0 License. The code zip file has been updated with license headers in the source.