Writing to C++ AMP textures with specific bits_per_scalar_element

In my previous post we looked at the bits_per_scalar_element property of textures. In this blog post, we will see how to write data to such textures. We will also take a look at the clamping semantics during such writes i.e. what happens when we try to store a value that goes over the limit of type specified.

Writing to 8-bit or 16-bit texture

In our post on writing data to textures, we summarized the various rules to keep in mind when writing to a texture. All the rules shown there apply to textures with 8 or 16-bits_per_scalar_element as well. One rule that will almost always apply is that we need to use a writeonly_texture_view to write to a texture of bits_per_scalar_element != 32.

Here is a simple example of writing data to a texture with 8 bit elements.

extent<1> ex(size);

/* create texture of 8 bit ints */
texture<int, 1> tex3(ex, 8U /* bits_per_scalar_element */);

/* need writeonly view to write to texture with 8 bits_per_scalar_element */
writeonly_texture_view<int, 1> wo_tv3(tex3);

parallel_for_each(extent<1>(size), [wo_tv3] (index<1> idx) restrict(amp)
{
    int value = 10;
    wo_tv3.set(idx, value); /* write */
});

Now that we have seen how to write data to a texture, the question that comes to mind is what happens when we try to write back a value that goes over the limit allowed by the bits_per_scalar_element?

Let us try to answer the question by looking closely at the clamping behavior when writing to textures.

Clamping semantics

When you write a value to a texture, the texture will store the value in the bit width specified by bits_per_scalar_element. If the value doesn’t fit in the allowed number of bits, the value is clamped to available number of bits. The underlying type determines how the resulting clamping is handled. Here are the rules for each data type:

  1. When storing integer data into a texture location of smaller size, values are clamped to the maximum or minimum value that can fit in the storage specified by bits_per_scalar_element. Notice that this is different from behavior on the host where values can reduce by modular arithmetic.
  2. For float data from higher range representation to a lower range representation, the value is clamped to the maximum representable (appropriately signed) value. If the original value is already beyond the higher range and set to signed infinity, then the same value of signed infinity is stored in the lower range representation
  3. For norm and unorm data, the values are automatically clamped to [-1.0f, 1.0f] and [0, 1.0f]

Now let’s test these rules using an example. In the code snippet below, we will try to exceed the limit of an 8-bit integer by writing values above the limits.

const int size = 1;

extent<1> ex(size);
/* texture of 8 bit integers */

texture<int, 1> char_tex(ex, 8U /* bits_per_scalar_element */ );
writeonly_texture_view<int, 1> char_tex_v(char_tex);

parallel_for_each(ex, [=](index<1> idx) restrict(amp)
{
    /* Try to store CHAR_MAX + 1 in 8 bits.
     * Expected result CHAR_MAX */
    int char_max = CHAR_MAX + 1;
    char_tex_v.set(idx, char_max);
});

Now let us copy back the data from the texture.

// copy back results
char clamped_result_device;
copy(char_tex, &clamped_result_device, char_tex.data_length);

Now let us try the clamping (casting to smaller size) on the host:

int char_max = CHAR_MAX + 1;
char clamped_result_host = char_max;

Finally, let us compare the clamping that occurs on the host and the clamping that occurred in texture.

cout << "Limits" << endl;
cout << "CHAR_MAX: " << CHAR_MAX << endl;
cout << "clamping behavior on host" << endl;
cout << "CHAR_MAX + 1: " << (int) clamped_result_host << endl;
cout << "clamping behavior in texture" << endl;
cout << "CHAR_MAX + 1: " << (int) clamped_result_device << endl;  

Here is the output on my machine

Limits
CHAR_MAX: 127

clamping behavior on host
CHAR_MAX + 1: -128

clamping behavior in texture
CHAR_MAX + 1: 127

As mentioned before, the one subtle difference to pay attention to is the integer clamping behavior where unlike the host, the texture data doesn’t wrap around; they are clamped at the limit.

Now let us try the same test with a 16-bit floating point number. In the sample below, we are trying to store FLT_MAX (the maximum value for a 32 bit float) into a texture of 16-bit floats. It also tried to store an overflowed value FLT_MAX * 10 back into the 16-bit float.

const int size = 1;
extent<1> ex(size);

texture<float_2, 1> flt_tex(ex, 16U /* bits_per_scalar_element */ );
writeonly_texture_view<float_2, 1> flt_tex_v(flt_tex);

parallel_for_each(ex, [=](index<1> idx) restrict(amp)
{
float_2 value(FLT_MAX, FLT_MAX*10);
    flt_tex_v.set(idx, value);
});

Since 16 bit floats are not a supported type on the host side, let’s read the value back on the accelerator using an array_view. This way we will not need to manually interpret the bits copied out.

float_2 clamped_result;

array_view<float_2> clamped_result_av(1, &clamped_result);

parallel_for_each(clamped_result_av.extent, [clamped_result_av,&flt_tex](index<1> idx) restrict(amp)
{
clamped_result_av[idx] = flt_tex[idx];
});

clamped_result_av.synchronize();

And finally, let us print the result.

cout << "FLT_MAX: " << FLT_MAX << endl;
cout << "FLT_MAX in 16 bit texture: " << clamped_result.x << endl;

cout << "FLT_MAX * 10: " << FLT_MAX * 10 << endl;
cout << "FLT_MAX * 10 in 16 bit texture: " << clamped_result.y << endl;

Here is the output on my machine:

FLT_MAX: 3.40282e+038
FLT_MAX in 16 bit texture: 65504

FLT_MAX * 10: 1.#INF
FLT_MAX * 10 in 16 bit texture: 1.#INF

Note that FLT_MAX was clamped to the maximum possible finite value that can be represented using a half precision floating point = 65504. Also note that if the value is already beyond the representable range, no clamping occurs. The corresponding signed infinity value is stored back into the 16 bit float.

This concludes our series on bits­_per_scalar_element in textures. As always, please feel free to share your feedback below or at our MSDN forum!