restrict(amp) restrictions part 10 of N – tile_static

This post assumes and requires that you have read the introductory post to this series which also includes a table of content. With that out of the way let’s look at of restrictions with regards to tile_static.

C++ AMP introduces a new storage class for local variables within an amp restricted function, called tile_static. A local variables declared with tile_static is stored in a programmable cache that is visible to all threads in a tile (group of threads), and its lifetime begins when execution reaches the point of declaration, and ends when the kernel function returns. The tiled matrix multiplication example showed how to use tile_static and other tiled model constructs to take advantage of tiling for significant performance gain. In this post, I will focus on restrictions associated with tile_static variables.

· tile_static can only be used on local variables within amp-restricted functions; You will get an error if you ever use tile_static outside an amp scope. For example,

tile_static int globalA[16][16]; // illegal – not a local variable

void foo() restrict(amp, cpu)

{

tile_static int locA[16][16]; // illegal – foo is also cpu-restricted.

...

}

· The type of a tile_static variable is not allowed to be a pointer or reference type;
As we mentioned in restrictions on compound types, pointers in C++ AMP are emulated via static analysis. Storing a pointer in a tile_static variable which may be modified by other threads make it impossible for the pointer-emulation algorithm to track the pointer source accurately. Therefore, we ban pointer or reference types in tile_static variables.

· A tile_static variable is not allowed to have any initializer. In addition, if the type of the tile_static variable has non-trivial default constructor and destructor, the compiler will not generate code to invoke them as it does for normal C++ class objects. For example,

class A

{

public:

    A() restrict(amp) : m(0) {}

    A(int n) restrict(amp) : m(n) {}

private:

    int m;

};

void boo() restrict(amp)

{

    tile_static int count = 0; // illegal - initializer not allowed

    tile_static A a1(4); // illegal - initializer not allowed

    tile_static A a2; // OK but a2's default constructor is not invoked

    ...

}

The rationale behind this restriction is that the tile_static object should only be initialized by one thread. We could always let the first thread do the job and block other threads in the tile until the first thread finishes the initialization. But sometime it’s much more efficient to let threads cooperatively work on initialization. How the cooperation should be done is very application-dependent. Therefore, it’s better to leave the tile_static objects uninitialized, and let you the programmer to explicit do the initialization in the most efficient way. However, the fact that its default constructor and destructor won’t be invoked behind the scene by the compiler is a bit surprising for many people who get used to C++ semantics. Therefore, a level 4 warning will be issued if the default constructor and destructor are ever omitted for a tile_static variable.

Finally, if a tile_static variable is ever declared in a call graph that is rooted by a non-tiled parallel_for_each invocation, the following error will be reported:

error C3600: use of tile_static memory detected when compiling the call graph for the non-tiling concurrency::parallel_for_each