printf, errorf, abort in C++ AMP

There is full Visual Studio debugging support for C++ AMP, and we will cover that in future blog posts. In this post, I am going to introduce three debug diagnostic functions that can be used in restrict(amp) code including a variant of the well-known printf function.

All three functions are executed as any other device-side function: per-thread, and in the context of the calling thread.

void direct3d_abort() restrict(amp)

This function aborts the execution of a kernel. When the abort is detected by the runtime, it raises a runtime_exception on the host with the error message, “Reference Rasterizer: Shader abort instruction hit”.

void direct3d_printf(const char *_Format_string, …) restrict(amp)

(Parameters)_Format_string: The format string; ...: An optional list of parameters of variable count.

This function accepts a format string and an optional list of parameters of variable count. It prints formatted output from a kernel to the Visual Studio output window.

void direct3d_errorf(char *_Format_string, …) restrict(amp)

This function has identical characteristics and usage to the direct3d_printf function, in that a message is printed to the output window. Additionally the C++ AMP runtime will raise a runtime_exception on the host with the same error message passed to the direct3d_errof call.

Notes on usage

These functions are usable only if all of the following conditions are met and will otherwise behave as no-ops.

1) The Debug configuration in Visual Studio is selected, i.e. the code is compiled with the _DEBUG preprocessor definition.

2) The accelerator_view on which the kernel is invoked must be on an accelerator which supports the printf, errorf, and abort intrinsics. At the time of writing, only the direct3d_ref accelerator supports these intrinsics.

Also, because these debug functions are based on HLSL intrinsic functions, there are two restrictions to bear in mind.

1) The maximum number of allowed parameters is seven. If you break that rule, as follows (in amp restricted function):

int i = 0;

direct3d_printf("printf: the value is: %d, %d, %d, %d, %d, %d, %d\n", i, i, i, i, i, i, i);

You will get compiler error C3562: intrinsic function 'direct3d_printf' is limited to have no more than 7 parameters.

2) There is no auto widening/narrowing type conversion, for example (in amp restricted function):

float x = 1.0;

direct3d_printf("%lf", x);

If you ran similar code on CPU, then x would be converted to double type correctly before print. However, the code does not work correctly on GPU because there is no auto widening support for direct3d_printf and direct3d_error. So in the example, the print out value will not be correct.

Finally, for all the functions above, note that due to the asynchronous nature of kernel execution, the actual print out of direct3d_printf may happen asynchronously any time between the dispatch of the kernel and the completion of the kernel’s execution. Hence, errors from direct3d_errorf and direct3d_abort may be detected after the parallel_for_each call and before another call that results in a GPU command being queued.

Exception to the rule

One of the restrictions of restrict(amp) code is that trailing ellipsis (…) is not allowed. However, these debug diagnostic functions are implemented as compiler intrinsic functions, so they are the exception to the rule and their parameters can have trailing ellipsis. They are essentially the HLSL functions: abort, errorf, and printf. That restriction-violation is also the reason we could not wrap these intrinsics with friendlier functions.

Sample Code

Here is some sample code for you to try. Remember to run under the debug configuration. 

 #include <vector>
 #include <iostream>
 #include <amp.h>
  
 using std::vector;
 using namespace concurrency;
  
 int main()
 {
     const int N = 2;
     const int M = 2;
     const int size = N * M;
  
     vector<int> A(size);
     int i = 0;
     std::generate(A.begin(), A.end(), [&i](){return i++;});
     extent<2> e(N, M);
     array_view<int, 2> av(e, A);
     //At the time of writing, only the REF accelerator supports these intrinsics.
     accelerator_view acl_v = accelerator(accelerator::direct3d_ref).default_view; 
  
     parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
         av[idx]++;
         direct3d_printf("printf: the value is: %d\n", av[idx]);
     });
     av.synchronize();
  
     try
     {
         parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
             av[idx]++;
             direct3d_errorf("errorf: The value is: %d\n", av[idx]);
         });
         av.synchronize();
     } catch (runtime_exception &e)
     {
         std::cout << "catch runtime exception: " << e.what() << std::endl;
     }
  
     try
     {
         parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
             av[idx]++;
             direct3d_abort();
         });
         av.synchronize();
     } catch (runtime_exception &e)
     {
         std::cout << "catch runtime exception: " << e.what() << std::endl;
     }
  
     return 0;
 }

Output in Visual Studio

To view the output of these functions, in Visual Studio 11, after you have started debugging, enable the program output in the output windows.

image

Then go to menu->debug->output. You can view the output from these functions regardless if you have selected GPU debugging or the default CPU debugging.

With the default of CPU debugging (“Auto” or “Native Only”) you will see output like the following:

image

With the “GPU only” debugging selected, in the “GPU - Software Emulator”, you will see output like the following (abort causes the debugger to break, instead of outputting a message):

image

That is all for these three functions, hope you find them useful when log information in your code. Your feedback as always is welcome below or in our MSDN forum.