concurrency::graphics in C++ AMP

Hi there, in this blog post, I’m going to briefly introduce a new C++ AMP feature area of Visual Studio 2012. For more details, we will keep adding links to related blogs posts at the bottom of this post, so please come back often and scroll down to the links.

In short, a new namespace - concurrency::graphics - is introduced to help you author data parallel algorithms in the graphics domain.  A new amp_graphics.h header is the vehicle for delivering this new namespace.  The new namespace contains two major sets of APIs – short vector types and textures.

Short Vector Types

C++ AMP defines a set of classes which emulate the behavior of short numerical vectors that are available in shader languages like HLSL and widely used in computer graphics programming.  In C++ AMP, we call these classes short vector types.

Short vector types have the following naming pattern: ScalarType_N,  where ScalarType is one of int, uint, float, double, norm, and unorm (norm and unorm will be explained below shortly), and N is one of 2, 3, and 4. For example, float_2, int_3, etc. Each of these vector types wraps over N elements internally. For example, a float_2 type is a wrapper of 2 float elements.

In the concurrency::graphics namespace, norm and unorm are classes which behave like scalar types, and they in turn are used as the element type of the short vector types norm_N, and unorm_N.  norm and unorm are wrappers over “float” and provide clamping behavior. norm and unorm clamp a floating point value into the range [-1.0, 1.0] and [0.0, 1.0] respectively.  More details about norm and unorm will be provided in a future post.

For each short vector type, we also define an alias (typedef) without the underscore, e.g. float2, int3, which is the syntax typically expected by graphics programmers. These typedef’s reside in the concurrency::graphics::direct3d namespace.

To access the components of a short vector type, you need to use a swizzling format. For example,

   float_4  va(1.0f, 2.0f, 3.0f, 4.0f)
 float_2 vb = va.xy // read 1.0 and 2.0 and assign to vb
 va.rg = vb.yx;  // write 2.0 and 1.0 into 
                        // the first two components of va
  

A variety of swizzling formats are available, as well as a rich set of overloaded operators to allow common casting, unary, binary and compound assignment operations. We will dive deeper into those in a future blog post.

Texture

Currently, for C++ AMP, one of the most important sets of accelerators is GPUs. As GPUs have a history of graphical uses, they also have optimized hardware and caches specifically designed to efficiently fetch pixels and render images. Existing GPU programming models expose this underlying graphics hardware through their API support for textures.

C++ AMP also introduces the concurrency::graphics::texture type to expose the texture functionality.  As a sneak peek preview, a texture<T, N> object is a data container that can be constructed simply as:

         std::vector<float_4> v(768 * 1024);
        // init v;
        texture<float_4, 2> tex(768, 1024, v.begin(), v.end());
  

Texture is a container of texels, where T is type of the texels. It can be a scalar (e.g., int, float, or norm) or short vector (e.g., int_2, float_4, or norm_4).  N is the rank; it could only be 1, 2, or 3.  One way to initialize it is using iterators during construction as shown above. So a texture looks like concurrency::array, except that it is backed by a Direct3D texture objects instead of buffers. Just like array, a texture should be captured by reference in the lambda that is supplied to parallel_for_each.

As you may know, we offer interoperability between concurrency::array/array_view and Direct3D ID3D11Buffer object. Similar interop functionality is provided for textures in the concurrency::graphics::direct3d namespace.

In this release, C++ AMP support for textures enables the following scenarios:

  • Use texture as a data container for computation to exploit spatial locality of texture cache and texture layouts of GPU hardware.
  • Provide efficient interop with non-compute shaders, because pixel, vertex, tesselation and hull shaders frequently consume or produce textures that you may want to use in your C++ AMP kernels.
  • Provide alternatives to accessing sub-word packed buffers, which is otherwise not possible. Textures with formats that represents texels composed of 8/16-bit scalars allow access to such packed data storage.

This release does not provide texture sampling/filtering functionality directly via C++ AMP APIs. If you need those features you have to use interop and author the code in DirectCompute and HLSL.

This concludes my brief introduction to the concurrency::graphics namespace. Please stay tuned for the coming VS11 Beta and more blog posts that will dive into great details about features in this new namespace.  Your questions and feedback are always welcome in our forum.