Projections in C++ AMP


In this post, I’ll describe how C++ AMP projections can be used to write clear code that is easy to read and easy to maintain.  Projections are a special case of C++ AMP sections that involve a rank change.  They apply to both array and array_view objects, which we will collectively call arrays unless the distinction is important.

Before looking at projections, let’s briefly discuss indexing into arrays to get a single array element.  You can index into arrays of rank N by passing N integers to the function call operator, (), or by passing an index object of rank N to the array subscript operator, [].

Projections extend indexing to allow passing a single integer value to the array subscript operator of a multidimensional array.  A projection of an array or array_view of rank N is an array_view of rank N-1 consisting of all of the elements with the most significant index equal to the single integer value.  Since the result is an array_view, no data is copied.  For example, the result of projecting a 2D array with integer i is the ith row (a 1D array_view).  Here is a picture:

Let’s take a look at matrix-vector multiplication to see how we can use a projection to make the code easier to read.  The following function implements a straight-forward, but simplistic, algorithm for matrix-vector multiplication in C++ AMP:

1
2
3
4
5
6
7
8
9
10   
11
12
13
14
15
16
17
18
19
20

void mvm(array_view<const float,2> Matrix,    // Input matrix       
         array_view<const float,1> RowVector, // Input row vector
         array_view<float,1> ColumnVector)    // Output column vector
{
  ColumnVector.discard_data(); // Avoid copying contents to accelerator
  parallel_for_each(
    ColumnVector.extent,
    [=] (index<1> idx) restrict(amp)
    {
      int y = idx[0];

      float result = 0.0f;
      for (int x = 0; x < RowVector.extent[0]; x++)
      {
        result += Matrix(y, x) * RowVector(x);
      }
      ColumnVector[idx] = result;
    });
  ColumnVector.synchronize(); // Copy contents back to host
}

This simple code is a good candidate for using an array projection to improve clarity.  Simply replace lines 10—17 of the above code with the following (differences are highlighted):

10   
11
12
13
14
15
16
17

      int y = idx[0];
      array_view<const float,1> MatrixRow = Matrix[y];

      float result = 0.0f;
      for (int x = 0; x < RowVector.extent[0]; x++)
      {
        result += MatrixRow(x) * RowVector(x);
      }
      ColumnVector[idx] = result;

In this code, we use a projection to create the named variable MatrixRow on line 11; this makes it clear that each thread is operating over only a single row of Matrix.  Then on line 15, we can index into MatrixRow with integer x.  By the way, if you find writing/reading the type array_view<const float,1> unfortunate, this would be a good place to use C++’s new auto feature and replace this type specification by the keyword auto, i.e., rewrite line 11 as follows:

auto MatrixRow = Matrix[y];

Like sections in C++ AMP, projections simplify the programming of many common idioms by making the code easier to understand.

Before I end this post, let me address a few miscellaneous points.

Cascading multiple projections

You can use multiple projections instead of multidimensional indexing, but the multidimensional indexing is stylistically preferable.  For example, you could write arr_3d[i][j][k] instead of arr_3d(i, j, k).  This amounts to projecting a 3D array to a 2D array_view, projecting the 2D array_view to a 1D array_view, and then indexing into the 1D array_view.

Changes from Visual Studio Developer Preview

In the Visual Studio Developer Preview version of C++ AMP, the array_view class defined a method called project with semantics identical to the above.  In the Beta, this method was removed because we felt it was unnecessary.

The syntax of indexing

When it comes to indexing, the syntactic choices and guidance above are deliberate.  Use the array subscript operator, [], when indexing with index objects and when projecting with a single integer.  Use the function call operator, (), when indexing with integers.  These choices allow you to distinguish between projection and indexing only based on the type of the index.  That said, the operators are completely interchangeable within the rules of the C++ language which do not allow the array subscript operator to be used with more than one expression.

As always, your feedback is welcome below or in our MSDN forum.

Comments (2)

  1. Steve,

    This code crashes my GPU but ONLY in release mode, it works in debug. Can't attach a screenshot here. The error message is: Display driver stopped working and has recovered." I have NVidia Quadro 5000M DX11 card. Obviously the code runs in debug in x64. It crashes in the first kernel.

    Thanks,

    Alan

    void AmpExamples::Projection()

    {

    cout << "nProjection: matrix x vector.n";

    const int width = 1900;

    const int height = 1080;

    accelerator_view av = accelerator().create_view();

    // 6×6 matrix

    float m[] =

    {

    1, 2, 3, 4, 5, 6,

    1, 2, 3, 4, 5, 6,

    1, 2, 3, 4, 5, 6,

    1, 2, 3, 4, 5, 6,

    1, 2, 3, 4, 5, 6,

    1, 2, 3, 4, 5, 6,

    };

    array_view<const float, 2> matrix(6, 6, m);

    // 6-Vector

    float v[] = { 3, 1, 4, 5, 8, 7 };

    array_view<const float, 1> vec(6, v);

    // Result

    vector<float> r(6);

    array_view<float, 1> result(6, r);

    result.discard_data(); // Do not copy contents to the device

    time_point<system_clock> start = system_clock::now();

    parallel_for_each(av, vec.extent, [=](index<1> idx) restrict(amp) {

    int   x = idx[0];

    float z = 0.f; // result

    for (int y = 0; y < vec.extent[0]; ++y)

    {

    z += matrix(x, y) * vec(y);

    }

    result[idx] = z;

    });

    result.synchronize();

    time_point<system_clock> stop = system_clock::now();

    long long us = duration_cast<microseconds>(stop – start).count();

    cout << "n   finished in: " << us << "us" << endl;

    for (int i = 0; i < 6; ++i)

    {

    cout << "      r[" << i << "] = " << r[i] << endl;

    }

    //

    // Now use a projection

    //

    result.discard_data();

    start = system_clock::now();

    parallel_for_each(av, vec.extent, [=](index<1> idx) restrict(amp) {

    int   x = idx[0];

    float z = 0.f; // result

    array_view<const float, 1> row = matrix[x]; // projection

    for (int y = 0; y < row.extent[0]; ++y)

    {

    z += row(y) * vec(y);

    }

    result[idx] = z;

    });

    result.synchronize();

    stop = system_clock::now();

    us = duration_cast<microseconds>(stop – start).count();

    cout << "n   finished projection in: " << us << "us" << endl;

    for (int i = 0; i < 6; ++i)

    {

    cout << "      r[" << i << "] = " << r[i] << endl;

    }

    }

  2. This appears to be the same issue mentioned in a previous thread on our forum (social.msdn.microsoft.com/…/4b684bcb-366b-4abe-a678-b0f86bc719c0).

    The recent NVidia drivers have addressed this issue (the last our team tested the updated drivers, the issue had been resolved for desktop cards but not  for the mobility cards). I would suggest installing the latest NVidia drivers and see if it addresses your problem.