# section example in C++ AMP

In this blog post I will give a simple example of using the section member function for array and array_view, demonstrating how to offset your origin point in order to operate on a smaller section of data in your computation. So for example if your data is matrix that looks like this:

array_view<float, 2> qin(height, width, data);

Where height and width are divisible by 2, you can view it in four quarters as follows:

array_view<float, 2> q1 = qin.section(index<2>(0, 0), extent<2>(height/2, width/2));

array_view<float, 2> q2 = qin.section(index<2>(height/2,0), extent<2>(height/2, width/2));

array_view<float, 2> q3 = qin.section(index<2>(0,width/2), extent<2>(height/2, width/2));

array_view<float, 2> q4 = qin.section(index<2>(height/2, width/2));

Below is a complete code example that does a summation of all elements in the array_view ‘qin’ and places the result in the first element. The algorithm views the data as two dimensions and splits it into four quarters, and then it sums up all elements in one quarter ‘qout’. By repeating this operation making ‘qout’ to be ‘qin’ it stores the overall reduction result in qin(0,0).

The code demonstrates the section functionality, but is not aimed to be (and indeed isn’t) an optimum implementation of a reduction algorithm (we have one of those in the pipeline) – it was written simply to demonstrate usage of the section API.

` 1: #include <amp.h>`
` 2:  `
` 3: using namespace concurrency;`
` 4: using std::vector;`
` 5:  `
` 6: void main()`
` 7: {`
` 8:   // a small data size for example`
` 9:   // a sample constrain require data to be equal and power of 2`
` 10:   int width = 16;`
` 11:   int height = 16;`
` 12:  `
` 13:   // generate dummy data`
` 14:   vector<float> data (width * height);`
` 15:  `
` 16:   for (int x = 0; x < (width * height); x++)`
` 17:   {`
` 18:     data[x] = x * 1.0f;`
` 19:   }`
` 20:  `
` 21:   // wrap data so it is ready to copy to accelerator`
` 22:   array_view<float,2> qin(height, width, data);`
` 23:  `
` 24:   // repeat reduction`
` 25:   // till data can't be reduced`
` 26:   while(width > 1)`
` 27:   {`
` 28:     height /= 2;`
` 29:     width /= 2;`
` 30:     extent<2> quarterdim(height, width);`
` 31:     array<float,2> qout(quarterdim);`
` 32:  `
` 33:     // view the data in 4 quarters `
` 34:     // create an array_view with offset to each quarters`
` 35:     const array_view<const float,2> q1 =`
` 36:             qin.section(index<2>(0, 0) /*origin*/, quarterdim /*extent*/);`
` 37:     const array_view<const float,2> q2 =`
` 38:             qin.section(index<2>(height, 0), quarterdim);`
` 39:     const array_view<const float,2> q3 =`
` 40:             qin.section(index<2>(0, width), quarterdim);`
` 41:     const array_view<const float,2> q4 =`
` 42:             qin.section(index<2>(height, width));`
` 43:  `
` 44:     // execute the kernel to accumulate all quarters into the first one`
` 45:     parallel_for_each(quarterdim, [=, &qout] (index<2> idx) restrict(amp)`
` 46:     {`
` 47:       // accumulate all quarters in output quarter`
` 48:       // using same index but in different section`
` 49:       qout[idx] = q1[idx] + q2[idx] + q3[idx] + q4[idx];`
` 50:     });`
` 51:  `
` 52:     // set output data array as input view`
` 53:     // for next loop`
` 54:     // NOTE: that doesn't sync data from GPU to host`
` 55:     qin = qout;`
` 56:  `
` 57:     // only for demo, print output data`
` 58:     // transition after every iteration`
` 59:     for(int y = 0; y < height; y++)`
` 60:     {`
` 61:       for (int x = 0; x < width; x++)`
` 62:       {`
` 63:         // accessing qin here force sync that quarter back to host`
` 64:         // this cause a performance hit `
` 65:         printf( "%0.1f ", qin(y, x));`
` 66:       }`
` 67:       printf("\n");`
` 68:     }`
` 69:     printf("===============================================\n");`
` 70:  `
` 71:   } // while loop`
` 72:  `
` 73:   // final summation result can be obtained from`
` 74:   // qin(0,0) here`
` 75: }`
` 76: // Sample print out`
` 77:  `
` 78: //272.0 276.0 280.0 284.0 288.0 292.0 296.0 300.0`
` 79: //336.0 340.0 344.0 348.0 352.0 356.0 360.0 364.0`
` 80: //400.0 404.0 408.0 412.0 416.0 420.0 424.0 428.0`
` 81: //464.0 468.0 472.0 476.0 480.0 484.0 488.0 492.0`
` 82: //528.0 532.0 536.0 540.0 544.0 548.0 552.0 556.0`
` 83: //592.0 596.0 600.0 604.0 608.0 612.0 616.0 620.0`
` 84: //656.0 660.0 664.0 668.0 672.0 676.0 680.0 684.0`
` 85: //720.0 724.0 728.0 732.0 736.0 740.0 744.0 748.0`
` 86: //===============================================`
` 87: //1632.0 1648.0 1664.0 1680.0`
` 88: //1888.0 1904.0 1920.0 1936.0`
` 89: //2144.0 2160.0 2176.0 2192.0`
` 90: //2400.0 2416.0 2432.0 2448.0`
` 91: //===============================================`
` 92: //7616.0 7680.0`
` 93: //8640.0 8704.0`
` 94: //===============================================`
` 95: //32640.0`
` 96: //===============================================`

Observe in the sample that array_view objects captured in the kernel need read only access to data, that is why I declared them as array_view<const float,2>.

Also notice that ‘q1’ creation - line(35) - can benefit from the section overloads to retrieve same view as follows:

array_view<float,2> q1 = qin.section(quarterdim);

In this case the extent is inferred to cover the rest of the parent array/array_view.

array_view<float,2> q1 = qin.section(0, 0, height, width);

Similarly q2 and q3 can be created using the latter section function call.

Finally, one might look close to ‘q1’ and ask couldn’t ‘qin’ replace its functionality and reduce the number of lines of code? The answer is “yes”, but that would introduce a performance overhead; instead of copying 4 quarters to GPU memory, this change will copy 3 quarters plus the whole matrix. Also copying data back to the host would again copy the whole matrix instead of just one quarter of it.

That completes my example for creating sub-sections using the section member function. Feel free to ask questions in the comments section below or in our MSDN forum.

Tags