Function overloading with restrict in C++ AMP

As you may already know, C++ AMP introduces a new language feature, restriction specifiers, which are part of the function signature. Restriction specifiers are used to restrict the content and behavior of annotated functions. For example, the restrict(amp) specifier limits functions so that they can be executed on the target device correctly and efficiently. In this blog post, I would like to discuss in more detail about the new overloading capability enabled by restriction specifiers in C++ AMP. First I’ll motivate the overloading capability, and then explain the function overloading rules.

Benefit of overload functions on restriction specifiers

Some of you might ask why restriction specifiers must be part of function types because the restriction checking could have been performed in other ways, e.g., reusing existing language constructs such as attributes, __declspec, calling convention, etc.

Let’s look at a fictitious example. Say you need to provide certain functionality in a function called foo. This function is frequently used, and thus, the faster the better. Therefore, you decide to have specialized implementations for host and device respectively to fully exploit the hardware capability (e.g. intrinsic functions). Another reason you may need separate implementation for each restriction specifier is that the cpu implementation may contain code that is not supported in amp restricted functions. Without the capability to overload functions on restriction specifiers, you would have to use a different function name for each implementation, like

 void foo_cpu() // restrict(cpu) by default 
void foo_amp() restrict(amp);

Using different function names for different implementation may not seem too bad until you want to share source code between cpu and amp implementations, because often the implementation of some function is so generic that it can be executed on both host and device, and there is no need to optimize it for each target. In this case, C++ AMP allows you to mark it restrict(cpu,amp) so that you only need to write one version of it. For example,

 int add(int n1, int n2) restrict(cpu,amp)
{
    return n1 + n2;
}

However, without function overloading on restriction specifiers, you wouldn’t be able to write a shared function if there is any function call in the body and the callee has a different implementation for each target because the function names are different. Now you would have to duplicate the whole call tree!

 void boo_cpu() // restrict(cpu) by default 
{
    foo_cpu();
}
void boo_amp() restrict(amp)
{
    foo_amp();
}

Letting restriction specifiers participate in function overloading solves this problem gracefully. Now you can write two functions with same names, identical parameter lists and return types, one for restrict(cpu) , and one for restrict(amp) . The compiler will pick the right version based on the caller’s restriction context.

 void foo() // restrict(cpu) by default; 
void foo() restrict(amp);
void caller1() // restrict(cpu) by default
{
    ...
    foo(); // the cpu version is picked
    ...
}
void caller2() restrict(amp)
{
    ...
    foo(); // the amp version is picked
    ...
}
void boo() restrict(cpu,amp)
{
    // void foo() is picked in boo's cpu target code
    // void foo() restrict(amp) is picked in boo's amp target code
    foo();
}

C++ AMP math library is a real world example that exploits this new overloading capability. In this library, we provide a set of math functions similar to that of <cmath>’s, with the exact same function signatures except that they are marked with restrict(amp) . These functions are optimized for the device. You can now write a function that calls these math functions and mark it restrict(cpu,amp) so that it can be shared between host and device. The blackscholes::cnd_calc function in our Black-Scholes sample is such an example:

 float blackscholes::cnd_calc(float d) restrict(cpu,amp)
{
    ...
    // fabsf and expf are available in both cmath and C++ AMP math library
    float x = 1.0f / (1.0f + 0.2316419f * fabsf(d));
    float cnd = isqrt2pi * expf(- 0.5f * d * d) * 
            (x * (a1 + x * (a2 + x * (a3 + x * (a4 + x * a5)))));
    ...
}

Function overloading rules with restriction specifiers

The following are some rules you need to know about function overloading on restriction specifiers:

Rule 1: A call to function F is valid if and only if the applicable F covers at least all the restrictions in force in the calling function. This rule can be satisfied by a single function F that contains all the require restrictions, or by a set of overloaded functions F that each specify a subset of the restrictions in force at the call site. For example,

 void X() restrict(cpu,amp) { } 
void Y() restrict(amp) { } 
void Y() { } 
void Z() { } 
void caller1() restrict(amp) 
{ 
    X(); // okay; restrict(cpu,amp) covers restrict(amp) 
    Y(); // okay; there is an amp version of Y 
    Z(); // error; no amp version of Z available 
} 
void caller2() restrict(cpu,amp) 
{ 
    X(); // okay; all restrictions available in a single function 
    Y(); // okay; all restrictions available in separate functions 
    Z(); // error; not all restrictions available in Z's overload set
}

Rule 2: The restriction specifiers of a function should not overlap with any restriction specifiers in another function within the same function signature. For example,

 int foo(int x) restrict(cpu,amp);
int foo(int x) restrict(cpu); // error, overlaps with previous declaration

Rule 3: The restriction specifiers applied to a function definition are recursively applied to all function declarators and type names defined within its body that do not have explicit restriction specifiers. For example

 void foo() restrict(amp) 
{ 
    class Bar 
    { 
        void f1() {...} // f1 is amp-restricted 
        void f2() restrict(cpu) {...} // f2 is cpu-restricted 
    }; 
    auto l = [] (int y) {...}; // Lambda is amp-restricted 
    typedef int int_void_amp(); // int_void_amp is amp-restricted
    ...
}

Rule 4: Every expression that is evaluated in code that has multiple restriction specifiers must have the same type in the context of every restriction. For example,

 int foo();
float foo() restrict(amp); 
void caller() restrict(cpu,amp)
{
    auto x = foo(); // error; foo()'s type is different in cpu and amp
}

Rule 5: Constructors can have overloads that are differentiated by restriction specifiers. Destructors cannot be overloaded even with restriction specifiers. Instead, the destructor must contain a restrict specifier that covers the union of restrictions on all the constructors. For example,

 class Bar { 
public: 
    Bar();
    Bar() restrict(amp); 
    ~Bar(); // error: restrict(cpu,amp) required 
};

Note that you can achieve the same effect of overloading destructors by calling auxiliary cleanup functions that have different restriction specifiers. For example,

 class Foo { 
public: 
    Foo() {...} 
    Foo() restrict(amp) {...} 
    ~Foo() restrict(cpu,amp) { cleanup(); }
private:
    void cleanup() {...}
    void cleanup() restrict(amp) {...}
};

Rule 6: Compiler-generated constructors and destructors (and other special member functions) behave as if they were declared with as many restrictions as possible without ambiguities and errors. For example,

 struct A 
{ 
    int a; 
    int b; 
    // compiler-generated default constructor: A() restrict(cpu,amp); 
    ...
    // compiler-generated destructor: ~A() restrict(cpu,amp); 
}; 
struct B 
{ 
    B() restrict(amp);
    ...
    // compiler-generated destructor: ~B() restrict(amp); 
}; 
struct C 
{ 
    B b;
    // compiler-generated default constructor: C() restrict(amp); 
    ...
    // compiler-generated destructor: ~C() restrict(amp); 
}; 

How the compiler decides which versions of special member functions to generate is beyond the scope of this blog post. You can find more detail about that and other syntactic and semantic rules of restriction specifiers in C++ AMP open spec, section 2.

In summary, C++ AMP enables a new overloading capability using restriction specifiers. It allows you to author target specific implementation at source code level to maximize performance yet enables seamless code sharing between host and device code. I hope that this blog post will help you get started to take advantage of this powerful C++ AMP feature. As always, feedback is welcome below or on our MSDN Forum.