concurrency::array_view –array_views on staging arrays

The previous posts in this series on C++ AMP array_view covered:

  1. Introduction to array_view and some of its key semantic aspects
  2. Implicit synchronization on destruction of array_views
  3. array_view discard_data function
  4. Caching and coherence policies underlying array_view implementation

In this post we will look at using array_views with staging arrays.

array_views with a staging array as data source

As described in a previous post, C++ AMP provides staging arrays for efficient data transfers between the host and accelerators. A staging array can only be accessed on the accelerator_view where it is allocated and additionally has an associated accelerator_view (indicated by the get_associated_accelerator_view method of concurrency::array) to/from which it can be copied efficiently. When using a staging array as the host memory data source for an array_view, any implicit data transfers from the staging array data source to its associated accelerator_view are fastercompared to an array_view on regular (non-staging) host memory where an extra intermediate copy to a temporary staging buffer is performed.

Staging arrays have certain limitations that you must be aware of should you choose to use them as the data source for array_views. It is NOT safe to access a staging array when a copy from (or to) that staging array is concurrently in progress. Hence, for an array_view with a staging array as its data source, any operation that may result in transfer of data from the staging array data source to its associated accelerator_view (or vice versa) must not be concurrently executed with another operation accessing the array_view on the CPU or another accelerator_view where the array_view is not already cached. Any such concurrent operations have undefined behavior (for example may cause an access violation error).

Guidelines regarding using staging array as array_view data source

 Guideline A: Consider using staging arrays as your array_view data source if the view is to be accessed only on the host plus exactly one accelerator_view.

 accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
  
 // Guideline A: Use a staging array as the data source for an array_view
 // to be used in a parallel_for_each computation, for faster transfer of data
 // between the CPU and the accelerator
 std::vector<float> sourceVec(size);
 float *hostPtr = sourceVec.data();
 concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
 float *hostPtr = sourceArray.data();
  
 std::generate(hostPtr, hostPtr + size, rand);
  
 // Using a staging array as the data source for the array_view
 // results in faster transfer of data from the CPU to the accelerator_view
 // where the parallel_for_each kernel executes
 array_view<float> dataView(size, sourceVec);
 array_view<float> dataView(sourceArray);
 parallel_for_each(dataView.extent, [=](index<1> idx) restrict(amp) {
     dataView(idx) = fast_math::cos(dataView(idx));
 });
  
 // Using a staging array as the data source for the array_view
 // also results in faster transfer of data from the accelerator_view
 // to the CPU
 dataView.synchronize();

 

Guideline B: Exercise extreme caution when using array_views over staging arrays in multi-threaded CPU code that can potentially access such array_views concurrently from multiple threads. As described earlier such accesses have undefined behavior and may result in fatal errors.  

 accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
 concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
 float *hostPtr = sourceArray.data();
 std::generate(hostPtr, hostPtr + size, rand);
  
 array_view<const float> sourceView(sourceArray);
 array_view<float> outputView(array<float>(size));
  
 std::vector<float> sourceCopy(size);
 concurrency::task<void> t([&]() {
     for (int i = 0; i < size; ++i) {
         sourceCopy[i] = sourceView[i];
     }
 });
  
 // Guideline B violation: An array_view over a staging array should
 // not be concurrently accessed on the CPU as in the concurrency::task above
 // (or another accelerator_view) with an operation that transfers data from
 // the staging array to the associated_accelerator_view of the staging array
 // (the parallel_for_each invocation results in such a transfer here)
 parallel_for_each(sourceView.extent, [=](index<1> idx) restrict(amp) {
     outputView(idx) = fast_math::cos(sourceView(idx));
 });

 

In closing

In this post we looked at some key aspects regarding using array_views over staging arrays as their data source. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!

I would love to hear your feedback, comments and questions below or in our MSDN forum.