cpu_accelerator in C++ AMP

In addition to the hardware on your system, in Microsoft's implementation of C++ AMP, as I have described in my accelerator blog post there are three known accelerators: direct3d_warp, direct3d_ref, and cpu. This post expands a bit more on the last one.

Properties of cpu_accelerator

Using the code snippet from my accelerator blog post, this is what the output is for the cpu_accelerator on my machine:

New accelerator: CPU accelerator
device_path = cpu
version = 0.1
dedicated_memory = 4175492 KB
doubles = false
limited_doubles = false
has_display = false
is_emulated = true
is_debug = false

You can create one directly with a single line of code:

accelerator my_acc(accelerator::cpu_accelerator);

Note that accelerator::cpu_accelerator is equal to “cpu”.

Don't try to use for computation

As bizarre as it sounds, in our first release you cannot execute anything on the cpu_accelerator, i.e. passing such an accelerator to a parallel_for_each will result in a runtime_exception with the following message:

runtime_exception (80070057): Concurrency::parallel_for_each is not supported on the selected accelerator "CPU accelerator".

So whenever you query for accelerators to execute your kernels on, make sure you filter out the cpu_accelerator. You can identify this accelerator by string equality comparison between my_acc.device_path and accelerator::cpu_accelerator.

In future releases, or in other implementations of the C++ AMP open spec, this accelerator may work for computation too. In the meantime, if you wish to execute your C++ AMP parallel_for_each computations on the CPU taking advantage of multi-core and SIMD instructions, please use WARP.

Use for allocation

The main usage of cpu_accelerator in our v1 implementation of C++ AMP is for an optimization technique that we have already described on our blog: staging arrays.

Besides creation of staging arrays, an even more niche usage is to create a (non-staging) host concurrency::array by passing the cpu_accelerator to its constructor. Your reason for doing that would be to allocate memory on the same application heap as for operator new (e.g. CRT heap) and access it through a multidimensional view, i.e. through the concurrency::array interface, because you like its interface so much and don’t mind the overhead.