Getting Started with Textures in C++ AMP

Article
04/03/2012

When I introduced the concurrency::graphics namespace, I gave a light introduction on the texture type in C++ AMP explaining the motivation behind it. In this post and coming posts, I and my colleagues will dive deeper into the texture type and I will also assume you have read about the short vector types in C++ AMP.

So, let’s first understand some basics, then how to create a texture of ints, uints, floats and doubles without/with initialization, then some properties of texture class, then how to capture a texture object in the lambda passed to the parallel_for_each, and finally how to read from a texture object in restrict(amp) code.

Texture Basics

As with short vector types, to start using texture, you need to do the following:

 #include <amp_graphics.h>
 using namespace concurrency::graphics;

The concurrency::graphics::texture<T, N> template class represents a multi-dimensional container of texels of type T, of N dimensions, specified at compile time. Notice that texture<T, N> looks like concurrency::array<T, N> , but the texel type T can only be one of the following types:

scalar types: int, uint, float, double, norm, and unorm ,
short vector types that have two or four components, e.g., uint_2 and float_4, etc. Among double based short vector types, only double_2 is allowed. Because a double is actually represented with two int’s in a texture, and a texel can only have up to four components.

Direct3D offers very few three-component DXGI_FORMAT’s, and textures of such formats have limited functionality. Since three-component texels are rarely used, in this release of C++ AMP, T cannot be a three-component short vector type.

Rank N can be 1, 2, or 3.

Construction

The construction of a texture is quite similar to the construction of a concurrency::array. It can be created without being initialized, for example,

 // 1D texture of int, and of 16 texels
 texture<int, 1> tex1(16); 
 texture<int, 1> tex2(extent<1>(16));
  
 // 2D texture of float_2, and of 16 x 32 texels
 texture<float_2, 2> tex3(16, 32); 
 texture<float_2, 2> tex4(extent<2>(16, 32));
  
 // 3D texture of uint_4, and of 2 x 4 x 8 texels
 texture<uint_4, 3> tex5(2, 4, 8); 
 texture<uint_4, 3> tex6(extent<3>(2, 4, 8));

The constructions above do not specify an accelerator_view, as a result, the default accelerator_view of the default accelerator will be used. Just like array, you can specify the accelerator_view where you’d like to create the texture. For example,

 // create a 3D texture on a ref accelerator_view
 texture<uint_4, 3> tex7(2, 4, 8,
                         accelerator(accelerator::direct3d_warp).default_view);

Unlike array, you cannot create a texture on a cpu_accelerator.

Specific to our implementation on top of Direct3D, there’re limits (in number of texels) on the size for each dimension of the texture. For each dimension, the inclusive limits are:

texture<T, 1> : 16384
texture<T, 2> : 16384
texture<T, 3> : 2048

For example, if you create a texture that exceeds the limit:

 texture<int_2, 2> tex8(2, 16385);

You will get a runtime_exception with error code E_INVALIDARG, and a message:

Failed to create texture: the limit for each dimension is 16384.

You can also construct a texture and initialize it by providing two iterators that specify the range of the input data. For example,

 std::vector<float_2> src(16 * 32);
 // init src
 texture<float_2, 2> tex8(16, 32, src.begin(), src.end());

In this way, we create a 2D texture of 16 x 32, and fill the texture with the content from the src vector.

Properties

From a texture object, we can query its properties, such as extent, accelerator_view, and data_length in bytes. For example:

 std::cout << "extent: (" << tex8.extent[0] << ", " << tex8.extent[1] << ")" 
           << std::endl;
 std::wcout << "accelerator: " << tex8.accelerator_view.accelerator.description
            << std::endl;
 std::cout << "data length: " << tex8.data_length << " bytes" << std::endl;

On my machine, the output is:

 extent: (16, 32)
 accelerator: ATI Radeon HD 5800 Series
 data length: 4096 bytes

Other properties of the texture class will be covered in a separate blog post.

Capture

Just like array, texture can only be captured by reference to the lambda that is supplied to parallel_for_each, so we can access the texture object in the code executing on accelerator. For example,

 parallel_for_each(tex8.extent, [&tex8] (index<2> idx) restrict(amp) {
     // code 
 });

Read from Texture

Again, same as array, you can read from a texture object via indexing by either supplying index<N> object to the [] subscript operator, function () operator, or get method, or passing integer values to the function () operator. Below is an example of reading from texture:

 std::vector<float_2> src(16 * 32);   
 // code to initialize “src” elided 
  
 std::vector<float_2> dst(16 * 32); 
 array_view<float_2, 2> arr(16, 32, dst); 
 const texture<float_2, 2> tex9(16, 32, src.begin(), src.end()); 
 parallel_for_each(tex9.extent, [=, &tex9] (index<2> idx) restrict(amp) {     
     arr[idx].x += tex9[idx].x; // use subscript operator 
     arr[idx].x += tex9(idx).x; // use function () operator 
     arr[idx].y += tex9.get(idx).y; // use get method 
     arr[idx].y += tex9(idx[0], idx[1]).y; // use function () operator 
 }); 
 arr.synchronize();

Unlike array, texture’s subscript operator and function call operator do not return a reference, but return a value. That is, there is no pointer/reference to the interior of a texture. Also, this indicates that you cannot write to textures via the subscript [] operator. We will explain how to write to textures in a future post.

This concludes the post. More blog posts about texture are on the way. Stay tuned! As usual, you are welcome to ask questions and provide feedback below or on our MSDN Forum.