Disabling TDR on Windows 8 for your C++ AMP algorithms

The Windows Timeout Detection and Recovery(TDR) mechanism prevents processes from hogging the GPU and rendering the system display unresponsive or denying other processes a fair share of the GPU. I encourage you to read our post Handling TDRs in C++ AMP which extensively covers TDR and various ways of handling TDR occurrences in C++ AMP applications. However, in compute scenarios there is often a genuine need for executing commands on the GPU that run for longer than the stipulated TDR timeout period. Windows 8 offers the ability to programmatically disable TDR for specific devices, thus allowing commands on that device to run for longer than the TDR timeout period, if the OS or other processes are not contending for that GPU simultaneously. In this post, I will show how you can use this new Windows 8 feature to create a C++ AMP accelerator_view where long running commands can be executed without causing TDR.

Creating a Direct 3D 11 device using the D3D11CreateDevice API

The D3D11CreateDevice API creates an ID3D11Device interface which represents a logical device on a display adapter. The “Flags” parameter to this API is a combination of device creation settings from the D3D11_CREATE_DEVICE_FLAG enumeration. Windows 8 introduces a new member to this enumeration viz. D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT which can be used to specify that commands on that device are allowed to run for longer than the usual timeout period without causing a TDR, in absence of contention for that GPU.

Creating a C++ AMP accelerator_view from a ID3D11Device interface pointer

An accelerator_view is an isolated resource and execution context/domain on an accelerator and is your gateway to executing commands on a GPU accelerator in C++ AMP as described in my previous post on accelerator_view queuing_mode. The default accelerator_view of an accelerator or one created through the accelerator::create_view API are subject to the TDR timeout limit and if execution of a command on that accelerator_view exceeds the limit, TDR is initiated.

However, if your application needs to execute long running commands on the GPU, on Windows 8 you can create a Direct3D 11 device with GPU timeout disabled using the D3D11CreateDevice method mentioned above, and subsequently create a C++ AMP accelerator_view using the C++ AMP DirectX interoperability API method concurrency::direct3d::create_accelerator_view. On accelerator_views created through this mechanism, commands are allowed to execute beyond the TDR timeout limit as long as the OS or other processes are not simultaneously contending for the same GPU accelerator.

Following is a code snippet illustrating creation of a C++ AMP accelerator_view which is not subject to TDR timeout:

 unsigned int createDeviceFlags = D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT;
ID3D11Device *pDevice;
ID3D11DeviceContext *pContext;
D3D_FEATURE_LEVEL featureLevel;
HRESULT hr = D3D11CreateDevice(pAdapter,
                               D3D_DRIVER_TYPE_UNKNOWN,
                               NULL,
                               createDeviceFlags,
                               NULL,
                               0,
                               D3D11_SDK_VERSION,
                               &pDevice,
                               &featureLevel,
                               &pContext);

if (FAILED(hr) ||
    ((featureLevel != D3D_FEATURE_LEVEL_11_1) &&
     (featureLevel != D3D_FEATURE_LEVEL_11_0))) 
{
    fprintf(stderr, "Failed to create Direct3D 11 device\n");
    return hr;
}

accelerator_view noTimeoutAcclView = 
    concurrency::direct3d::create_accelerator_view(pDevice);

 

Please note that TDR may be caused due to various reasons and its proper handling in your C++ AMP application depends on the underlying cause of the TDR as discussed in detail in our post on Handling TDRs in C++ AMP. The technique of creating an accelerator_view with TDR disabled must be employed if and only if your application has a genuine need for accelerator operations exceeding the TDR timeout limit. Also remember that:

1) This feature is only available on Windows8.

2) Disabling GPU timeout on devices prevents TDR occurrence only if the OS or other processes are not simultaneously contending for that GPU. If Windows detects contention for the GPU from the Desktop Windows manager or other processes, it will initiate TDR to reset the accelerator_view where a long running command is executing, irrespective of the disablement of GPU timeout on that device. Hence for this technique to be effective in preventing your long running C++ AMP computations from causing TDR, you must pick a dedicated GPU accelerator which is not connected to display and is neither concurrently used by other processes thus eliminating any chances of contention.

I hope this post would help you negotiate the Windows TDR timeout limit for your genuine needs for long running C++ AMP computations on GPU accelerators. Please feel free to ask questions below or in our MSDN concurrency forum!