Texture Sampling in C++ AMP

Article
07/18/2013

Texture sampling is a texture operation used to access values from a texture (i.e. texels) given floating point coordinates and return a formatted value. A filtering algorithm is used to fetch or combine one or a group of texels to produce the result of a sampling operation. Many GPUs have optimized hardware and caches specifically designed to efficiently perform sampling operations. These hardware utilities have been vital for accelerating graphical and visual applications including games, simulations and visual modeling.

The texture APIs of C++ AMP in Visual Studio 2012 enabled users to take advantage of several special properties of textures, such as optimized for 2D spatial locality, hardware assisted type conversion, special format support, etc. However, the sampling functionality was not exposed. In Visual Studio 2013, we added new texture sampling APIs so that these specialized underlying graphics hardware can be exploited to accelerate compute applications. In this post, I will describe the new APIs and how they can be used. Note that background knowledge of texture sampling such as filtering algorithms is out of scope of this post. If you are interested, the wikipedia page and the MSDN article on this subject are good introductory materials to get you started.

Design principles

Before we dive into the actual APIs and their usages, I would like to share the two principles we used to guide our design of these new texture sampling APIs in C++ AMP:

C++ AMP should expose APIs that enable and simplify the most common cases of texture sampling usage in compute applications;
C++ AMP should provide Interop APIs with the underlying platform to enable more complicated texture sampling functionality that are not directly exposed by C++ AMP APIs;

We believe that these two design principles helped us achieve a set of texture sampling APIs in C++ AMP that has a good balance between the usability and functionality.

Filtering mode

Filtering in the context of textures is the algorithm that specifies what texels to fetch and how to combine them to produce the interpolated value for a texture sampling operation. Two most common filtering algorithms are:

Point (nearest-neighbor): simply uses the texel value closest to the given texture coordinates;
Linear: combines several nearest texels to the center of sampled coordinates by weighted average according to distance.

Direct3D supports other filtering algorithms including anisotropic filtering. In addition, different filtering algorithms can be specified for different sampling contexts such as minification, magnification and mipmap level sampling. Currently only two filtering modes are exposed in the C++ AMP APIs which we believe are the most commonly used in compute scenarios: point or linear, and all sampling contexts would use the same filtering algorithm. You can use the following enumeration to specify the filtering mode to be used in a sampling operation:

 enum filter_mode 
 { 
     filter_point, 
     filter_linear, 
     filter_unknown, 
 };

filter_unknown represents filtering modes that are not exposed by C++ AMP APIs, but are adopted from the underlying platform, which means if you do need a more advanced filtering mode, you can create that using Direct3D APIs, then adopt it to your C++ AMP code via interop APIs. I will cover that later in the post.

Addressing mode

Texture’s normalized coordinates are always between 0.0 and 1.0 inclusive. The addressing mode of the texture determines how to map out-of-range coordinates to its normalized domain, which could be used to generate special effects of texture mapping. Four most common addressing modes are:

Wrap : ignore the integer part of the coordinates specified. This causes the texture to “wrap” around every integer;
Mirror : discards the integer part of the coordinates specified, but depending on whether the discarded integer is even or odd, it also complements the address. So the texture is mirrored between 1.0 and 2.0, and then is normal again between 2.0 and 3.0, etc.;
Clamp : clamp the coordinates to the range of 0.0 to 1.0, i.e., a coordinate smaller than 0.0 will be treated as 0.0 while a coordinate greater than 1.0 will be treated as 1.0;
Border : use an arbitrary color, known as the border color, for any texture coordinates outside the range of 0.0 through 1.0, inclusive;

In C++ AMP, you can use the following enumeration to specify the addressing mode of a sampling operation. Note that theoretically each dimension of the texture could have different addressing modes. Following our first design principle, we choose to only support the cases that all dimensions have same addressing mode, which we believe are the most common cases in compute applications. If you do have more complicated setup, you can always resort to Direct3D APIs and C++ AMP interop APIs. Again, address_unknown represents addressing modes that are not exposed by C++ AMP APIs, but are adopted from the underlying platform.

 enum address_mode 
 { 
     address_wrap,   
     address_mirror, 
     address_clamp, 
     address_border, 
     address_unknown, 
 };

texture_view<const T, N>

Currently, sampling is a functionality only available on read-only texture resources. In order to enforce this requirement, we introduced a new texture view type called texture_view<const T, N>, similar to array_view<const T, N>, which can only be read from, but cannot be written to. This new class has almost same properties and read operations as the texture<T, N> type, but has no write operations. Instead, it has a new set of sampling and gathering operations. We will discuss more details about texture_view types in another blog post. In this post, let’s focus on the sampling operations of the read-only texture_view type. Note that sampling is only supported when the texel type is based on a floating point type (i.e., float, norm or unorm, but not double). Invoking sampling operations on non-supported texture formats results in a static_assert.

Sampling with predefined configurations

If the filtering mode and addressing mode of your sampling operations are one of the modes exposed by C++ AMP (for address_border, the border color value must be (0.0f, 0.0f, 0.0f, 0.0f)), you can use the following member function of texture_view<const T,N> to do sampling:

 template<filter_mode _Filter_mode = filter_linear, 
          address_mode _Address_mode = address_clamp>
 value_type sample(const coordinates_type& _Coord, 
                   float _Level_of_detail = 0.0f) const restrict(amp);

For now, you can ignore the _Level_of_detail parameter, which is related to the new mipmap support that we will discuss in a separate blog post. coordinates_type is a floating point short vector type whose rank is the same as the rank of the texture view. For example, for a texture_view<const float, 2>, the coordinates_type for sampling is float_2.

Following is a simplified implementation of Gaussian pyramid algorithm illustrating how to do sampling with predefined samplers in C++ AMP:

 #define KERNEL_SIZE 5
 #define HALF_KERNEL 2
 #define NORM_FACTOR 0.00390625 // 1.0 / (16.0^2) 
  
 // A helper function to convert texture index to float coordinates
 float_2 coord(const index<2>& idx, const extent<2>& ext) restrict(amp)
 {
     // Note that float_2.x corresponds to idx[1] and ext[1]!
     return float_2((idx[1] + 0.5f) / (float) ext[1], (idx[0] + 0.5f) / (float) ext[0]);
 }
  
 // Generate the next level of Guassian image pyramid.
 // output_tex’s extent should be half of input_tex
 void gaussianDownSample(const texture<float4, 2> & input_tex, texture<float4, 2>& output_tex)
 {
     // input_tex_view is a read-only texture view of the texture object input_tex
     texture_view<const float4, 2> input_tex_view(input_tex);
  
     // output_tex_view is a writable texture view of the texture object output_tex
     texture_view<float4, 2> output_tex_view(output_tex);
  
     parallel_for_each(output_tex_view.extent, [=](index<2> idx) restrict(amp)
     {
         // calculate the corresponding index in input_tex
         index<2> levelIdx(idx * 2 - HALF_KERNEL);
  
         float4 buf[KERNEL_SIZE];
         for (int i = 0; i < KERNEL_SIZE; i++)
         {
             // Sample using the predefined filter_linear + address_clamp sampler.
             buf[i] = input_tex_view.sample(coord(levelIdx + index<2>(i, 0), input_tex_view.extent)) +
                 input_tex_view.sample(coord(levelIdx + index<2>(i, 4), input_tex_view.extent)) +
                 4 * input_tex_view.sample(coord(levelIdx + index<2>(i, 1), input_tex_view.extent)) +
                 input_tex_view.sample(coord(levelIdx + index<2>(i, 3), input_tex_view.extent)) +
                 6 * input_tex_view.sample(coord(levelIdx + index<2>(i, 2), input_tex_view.extent));
         }
         output_tex_view.set(idx, (buf[0] + buf[4] + 4 * (buf[1] + buf[3]) + 6 * buf[2]) * NORM_FACTOR);
     });
 }

First of all, if you start with indices into a texture, you need to convert them to floating point coordinates normalized by the texture’s extent. Before normalization, you also need to add 0.5 offset to each dimension of the index so that the coordinates represent the center of each texel. Another thing to watch out is the dimension ordering in the coordinate: since it’s represented by a float_N, its construction takes values in the order of x, y, z dimensions, which is opposite to the dimension ordering of extents and indices. Finally, in this example, filter_linear and address_clamp are used for the sampling operations, which are both the default arguments for the template parameters. That’s why all template arguments are omitted. If your sampling operation requires a different filtering mode or addressing mode, you can specify the template arguments accordingly, e.g., levelView.sample<filter_point, address_mirror>(coord1)

Sampling with user-defined sampler objects

If you want to do texture sampling with customized border_colors, or even more advanced configurations that cannot be expressed by C++ AMP APIs, you can create or adopt a sampler object which is an aggregate of sampling configuration information, then invoke the sampling member function of texture_view<const T,N> that takes in a sampler parameter:

 value_type sample(const sampler& _Sampler, 
                   const coordinates_type& _Coord) const restrict(amp);

Construct sampler objects

The sampler class has a set of constructors that let you specify the filtering mode, the addressing mode and the border color of the sampler object. Note that to avoid templatizing the sampler class and also to enable sharing sampler objects among textures with different value types, border_color’s type is always float_4. When it’s used to sample a texture whose value type has fewer components, e.g. float_2, the extra components are ignored, i.e., only the x and y component of the border_color will be used if the texture’s value type is float_2.

Following example shows how to construct a sampler object whose filtering mode is point, and addressing mode is border_color with value (0.5f, 0.5f, 0.5f, 0.5f):

 sampler s(filter_point, address_border, float_4(0.5f, 0.5f, 0.5f, 0.5f));

Note that the constructors of the sampler class are restrict(cpu), i.e., they can only be invoked on host. But its copy constructors, assignment operators and all accessor functions are restrict(cpu, amp).

Interop APIs for samplers

If necessary, you can adopt an ID3D11SamplerState instance via the make_sampler API:

 ID3D11SamplerState *pSampler = NULL;
 // Code to construct pSampler using Direct3D 11 APIs is omitted here.
 ...
 sampler s = make_sampler(pSampler);

On the other hand, if you want to get the ID3D11SamplerState instance corresponding to a C++ AMP sampler on a specific accelerator_view, say av, you can use the get_sampler API, e.g.:

 sampler s(filter_linear, address_border, float_4(1.0f, 0.75f, 0.50f, 0.25f));
 ID3D11SamplerState * pSamplerState = 
                    reinterpret_cast<ID3D11SamplerState*>(get_sampler(av, s));

Note that both make_sampler and get_sampler are in the concurrency::graphics:direct3d namespace.

Capture samplers

Just like array and texture, sampler can only be captured by reference to the lambda that is supplied to parallel_for_each. In addition, you can only capture up to 16 sampler references for one parallel_for_each invocation. If any predefined configurations are also used, each configuration is counted as one sampler object contributing to the total 16 limit.

This concludes the brief introduction of the new C++ AMP texture sampling APIs. More blog posts are on the way to cover various enhancement we added to the texture support in C++ AMP of Visual Studio 2013. So stay tuned! As usual, you are welcome to ask questions and provide feedbacks below or on our MSDN Forum.