In this sample we present a C++ AMP implementation of fluid simulation using the Smoothed-Particle Hydrodynamics (SPH) algorithm. The method was devised in 1970s by Gingold and Monaghan. Although originally meant for the simulation of astrophysical entities, it is widely used in many fields, including computer graphics, where it is often used to model the very terrestrial water.
Our implementation (originally started by one of our interns Venkat Varadarajan) of the SPH algorithm is encapsulated in the fluid_simulation class. Browsing the header file will show that its API is sparse. As particular member functions will be described later, we will first focus on data members:
- accelerator_view av – that’s the handle to the device where the following data structures are stored.
- array<particle> particles – data storage for basic elements building the simulated fluid, particle is a structure consisting of two 2D vectors: position and velocity.
- array<float> particles_density – in the course of the algorithm each particle gets a scalar density value.
- array<float_2> particles_forces – the second output of the algorithm is force acting upon each particle, stored as 2D vector.
If you have been reading our blog, I hope you have noticed that many design decisions behind C++ AMP API were based on the principle that program codes for typical use-cases should be as streamlined as possible, owing to sensible default values and behaviors. To that end, you will be usually fine with running the computation on the default (implicit!) accelerator, and using an array_view as a “smart container” that would follow the computation. Otherwise, you would probably be doing something more advanced, like using Direct3D interop, as we will do in this sample. You may read more about such scenario in the linked article, but long story short, to access C++ AMP data from Direct3D: it has to be stored in an array (or texture) and the array has to be located on the accelerator_view corresponding to the Direct3D device. And that’s why we have to store these accelerator_view and array objects in our class.
By the way, there is nothing inherent in the algorithm that would require the interop, it’s done just for the sake of the real-time visualization.
As a side note, a quick C++ recap. Recall that objects that are data members in C++ classes are created in the order of their declaration in the class definition. As array constructors that we will use take accelerator_view as a parameter, the accelerator_view must be created beforehand, so it is essential to declare it first. In other words, it will be a bug to place av definition after any of the arrays.
The class constructor is responsible for initializing the data structures. accelerator_view is obtained from the provided ID3D11Device object and used to create arrays. Next, particles are initialized to a predefined location in the lower left corner of the simulation space and copied to the array.
Calling the simulate member function results in a computation of a single step of the simulation with the provided parameters. The simulation itself is a system of equations which are solved in three sub-steps:
- density calculation
- forces calculation
- position calculation
Since all of these steps have to be executed sequentially (e.g. density for all particles must be calculated before we start computing any forces), they’ve been broken into separate parallel_for_each invocations.
For clarity, the three mentioned parallel_for_each calls have been broken into three member functions with quite descriptive names: apply_density, apply_forces, finalize_position.
The first two are similar in the fact that they require all-to-all – O(N2) – interaction. Their performance impact on memory bandwidth is reduced by cooperative loading – each thread in a tile loads a different data element, stores it in a tile_static memory and waits for all threads in the tile to reach the same point. Next, all threads in the tile iterate over the collectively loaded data, performing some local computation. Once all are done with it, another chunk is cooperatively loaded. Rinse and repeat. The total number of elements accessed does not change, but most loads are served from much faster tile_static memory.
Program entry point and visualization
The sample is based on the SPH sample from the DirectX SDK and it takes an advantage of the existing visualization framework. Most of the code in the DXUT utility library and FluidCS11.cpp, being not C++ AMP specific, is of no immediate interest to us. The only noteworthy things being that the fluid_simulation object is created during start, then the visualization code hooks to buffers obtained from fluid_simulation::get_*_buffer, and finally fluid_simulation::simulate is called every frame to obtain new data for rendering.
The user-interface exposes some simulation parameters, which can be tweaked to make a perfect splash.
The attached sample code is released under the MS-RSL license.