C++ AMP: N-Body Simulation Sample

My name is Bharath, I am an SDET on the C++ AMP team.

I worked on the NBody demo and wanted to share this project with you. You may have seen our PM (Daniel Moth) demonstrate at AMD Fusion Developer Summit and at Microsoft Build Conference (for the NBody demo watch – 0:03:00 through 0:08:00).

I ported the source code from the Microsoft DirectX SDK sample (you can download full DirectX SDK). You can get more information on nbody simulation on Wikipedia.

You can download the project with all sources from the zip file attached to this blog post. Refer to the README.txt for known issues. To build this project you need Visual Studio 11.

In later posts, I will walk you through this code and explain the different implementations used in this demo. Stay tuned.



Comments (29)

  1. Tim Glauert says:

    It looks to me as if the SSE4 implementation is broken, in that it fails to sum the individual components of the dot product before applying softening. This causes the particles to fly off to infinity rather than slowly expand. The fix is to add the shuffle code from the SSE2 implementation (lines 164-167) to the SSE4 implementation (after line 229).

  2. Hi Tim,

    Thanks for correcting SSE4 implementation. I have updated it with correct version of SSE4 implementation.

  3. CharlesAvatar says:

    When I run the NBody C++ AMP sample on DirectX 11 capable 4-core GPU card in Win 7, the GFlops and FPS performance is significantly worse in "GPU Multi Device" than in "AMP Tiled" mode. I stepped through the source in VS11 Developer Preview, the multi-gpu function is being called instead of single core, it does go through a simple for loop on 4 gpu devices (_ndevices is equal to 4). However, should it be using PPL parallel_for_each instead of simple for loop to iterate through the 4 cores? Can someone confirm this? How can the NBoday sample utilize multi-core GPU capabilities?

    void nbody::amp_multi_gpu(particle *render_data, int num_bodies)


     int size = ((int)((num_bodies/TILE_SIZE)/_ndevices)*TILE_SIZE);

    for (int i = 0; i < _ndevices; i++)


      tiling_implementation((*_pold[i]), (*_pnew[i]), i*size, size, num_bodies);


     for (int i = 0; i < _ndevices; i++)


      index<1> begin(i*size);

     extent<1> end(size);

     array_view<particle, 1> wrSrc = (*_pnew[i]).section(grid<1>(begin, end));

     for (int j = 0; j < size; j++)


       render_data[j+(i*size)] = (wrSrc.data())[j];



     for (int i = 0; i < _ndevices; i++)


      copy(render_data, (*_pold[i]));




  4. The multi-device option is only useful when you have more than one GPU on your system. So when you say “4-core GPU card” do you really mean 4 discreet GPU cards? If not, then this option will not result in speedup and instead would result in slow downs.

    If you do indeed have 4 cards, are they all the same? The specific sample assumes that all cards are of exact equal specification, since it statically splits the data equally between all cards. Also, for that much horsepower, you need to increase the number of particles so you can saturate your system; you can do that through the MAX_GPU_PARTICLES in NBodyGravityCS11.cpp

    Regardless of the above points, you are right, I could have used parallel_for in this sample. I’ll revisit this code for the Beta release (whenever that is), thanks for the feedback.

  5. P. So says:

    I tried to compile the NBody project in blogs.msdn.com/…/c-amp-sample-projects-for-download.aspx.  VS 11 Beta reports missing d3dx11.h and some other DirectX files.  When wil the projects under the URL be updated?  For now, what should be done to get the project compiled.


  6. BharathM says:

    Hi P. So

    To build Nbody demo you need to install DirectX SDK.

    The compiler error you are seeing is due to missing DirectX SDK header file.

    From README.txt

    -Software requirement:

    Install June 2010 DirectX SDK from MSDN http://www.microsoft.com/…/details.aspx

    Install Visual Studio 11 from http://msdn.microsoft.com

  7. P. So says:

    Thanks Bharath:

    I do have the June 2010 DirectX SDK installed on my computer.  I have to add the SDK installation path to the Nbody project file in order for the compiler to find the header files.  This step is not needed if I were to build the project using VS 11 Developer Preview.

    Also, the changes in C++ AMP break the build process.  I would appreciate that if you could post an updated Nbody project for VS 11 Beta.

    Thank you again.

  8. Hi P. So,

    After installing DirectX SDK, user should restart Visual Studio. This is necessary because DirectX SDK defines environment variables which is used in the new project. This wasnt necessary for VS 11 Developer preview because there werent significant changes affecting our dependencies.

    For your second comment. Can you post messages from VS output window? This will give us better picture of what you are experiencing. Since you are mentioning C++ AMP breaking changes, I think you may have downloaded the project before i updated it. Please try downloading it again?



  9. P. So says:

    Thanks.  The updated project works fine.

  10. The sample is updated for muli-gpu scenario to use parallel_for instead of serial for loop. This will enable parallel kernel invokation on different GPUs and also parallel copy in/out data.

  11. Steve says:

    When I run the app, it tells me it's using the "reference" driver and performance will be slow. (It's correct on that point, it's so slow it's useless). Looking into it it seems there is no support for DX10 drivers. The DXUT library used only has checks for DX11 and DX9 but ignores DX 10. What exactly are the requirements for AMP to work? Is there any sort of app or anything to test if a system should support it?

  12. DanielMoth says:

    Steve, as you found out, Microsoft's implementation of C++ AMP runs on DirectX 11 targets. Examples of such hardware can be found on this blog post: http://www.danielmoth.com/…/What-DX-Level-Does-My-Graphics-Card-Support-Does-It-Go-To-11.aspx

  13. Paul says:

    May 26 2011 The NBody.zip file has size zero bytes.

    perhaps is is about to be updated???

  14. N-body sample code zip file error says:

    After download the zip file, cannot unzip it. Please check it.

  15. Hi Paul, the sample has just been uploaded. Thanks.

  16. I've been testing this sample both on a HD 5870 and 7970.

    Strangely the tiled version runs 2 times slower as the simple version, where as it should run much faster due to the use of shared memory.

    The original DX11 CS sample does not have this issue.

    Are there known performance issues that could explain this ?

  17. Kevin Gao says:

    Hi Jan,

    I cannot repro your issue. Are you using the latest driver?

    May you please tell us your environment (win7sp1 or win8rtm? driver version? win32 or x64?)

    And may you tell us the gflop you observed on your machine (the simple and tiles versions for each of the 2 cards you mentioned)?


  18. Hi Kevin,

    I've found the issue, originally I was running in debug mode.

    In release mode things run much better:    

    285 / 920 GFlops for simple / tiled with 20000 particles on HD 5870.

    I'm surprised debug performs so differently, as all runs on the GPU.

    Maybe debug does some emulation ?

  19. Kevin Gao says:

    If you run the code in debug mode, VS still uses GPU. If you debug the code with "GPU only", VS uses REF device (a simulator). DBG and RET use different code path which could lead to your problem.

  20. Jo Blow says:

    I guess there will be no follow up on this? I would like to modify the source to include mass dependent size and charge dependent color but I'm new to C++ AMP and and DX and a bit lost on what to modify(I see the force calculation but the colors are hard coded.

  21. Hi Jo,

    One way to change color: in line 376, 383, 391, 397, 404 of NBodyGravityCS11.cpp, change “g_ParticleColor = D3DXVECTOR4( 0.05f, 1.0f, 0.05f, 1.0f );” to g_ParticleColor = D3DXVECTOR4( NewRedValue, NewGreenValue, NewBlackValue, 1.0f );

    One way to change scale size: in line 116 of ParticleDraw.hlsl, change “output.pos = mul( float4(position,1.0), g_mWorldViewProj );” to “output.pos = mul( float4(position,NewScaleValue), g_mWorldViewProj );”

    One way to change radiation size, in line 59 of ParticleDraw.hlsl, change “static float g_fParticleRad = 10.0f;” to “static float g_fParticleRad = NewRadiationSize;”



  22. Ian says:

    "In later posts, I will walk you through this code and explain the different implementations used in this demo. Stay tuned."

    Any forward linking?  I'd love if this was released as a library without the visual representation.  I'd like to be able to call it as function(method, particles, time); where method is the S-CPU/M-CPU/AMP/AMP-T/M-AMP version, time is the length of time for the simulation, and returns the average FPS of the test.

  23. bobyg says:


    Hi Ian,

     Thanks for expressing the need for having a library version of N-Body simulation sample. Currently we do not have any committed plans to convert the sample to a library version as desired by you. However, we are interested in understanding more about your project and how this library function call would help in such a case. Would you be willing to comment on that. you can contact me directly using bobyg AT Microsoft dot com if needed.

  24. Unfortunately, there are no follow-up posts yet to this blog post. The C++ AMP book (http://www.gregcons.com/cppamp) has detailed discussion of this sample though.

  25. Ade Miller says:

    I'd encourage you to look at the CPU implementation on the http://ampbook.codeplex.com site. The advanced CPU implementation there is a cache aware one and is significantly faster than the one shown here. It is also SSE2/4 enabled.


  26. Sairam Ravu says:

    I have installed VS 2013 and trying to run this sample code, getting the following error

    Error 1 error C4996: 'GetVersionExW': was declared deprecated.

    I have installed DirectX SDK, what could be the problem?

  27. joud says:

    BharathM    can you give me a code for the n-body problem ??? i need it plz

  28. wisewolf says:

    there's bug at 103rd line in nbody.cpp.

    pos.xyz += vel.xyz*deltaTime;


    pos.xyz += vel.x*deltaTime;

  29. Saad says:

    I am trying to run the code but it gives an error says that

    Severity Code Description Project File Line

    Error C4996 'GetVersionExW': was declared deprecated NBodyGravityCS11 d:nbody simulationnbodydxutcoredxut.cpp 1882