First look at Parallel FX and self-replicating tasks

The Parallel Computing Platform team at Microsoft has recently launched the Parallel Computing Development Centre along with our first CTP of Parallel FX. In here, I will explore some aspects of the framework. If you feel comfortable with the basics of concurrency then read on.

There is obviously PLINQ and the methods defined in the static System.Threading.Parallel class. It is implemented on top of the lower-level System.Threading.Tasks.Task class, which can be used directly to solve parallel problems with greater flexibility and control over the way work is partitioned and distributed.

Let me start by introducing the notion of replicable or self-replicating tasks. A replicable task is a task that its execution can be shared between multiple threads whilst abstracting from the dynamics of work distribution. In other words, you can potentially execute the same action using multiple threads where all executions stride towards achieving a common goal – getting closer to the result of the operation). Follow the following rules when using self-replicating tasks:

- One thread must be able to complete the operation

- It should be possible to complete the operation using multiple threads all executing the same operation at once

- The same set of activities should not be executed multiple times by different threads

Due to its nature, replicable tasks are very powerful and often very difficult to correctly construct. Parallel FX internally benefits from self replicating tasks. For instance Parallel.For and Parallel.ForEach are implemented using replicable tasks.

For example, we could create a simple parallel For in the following manner:

static void SimpleFor(int from, int to, int step, Action<int> body)

{

  if (body == null) return;

  if (to > from)

  {

    int sharedIndex = from;

    Task task = Task.Create(delegate

       {

         int i;

         while ((i = Interlocked.Add(ref sharedIndex, step)) < to)

         {

           body(i);

         }

       },

       TaskCreationOptions.SelfReplicating);

    task.Wait();

  }

}

There are a few things to mention here:

- The shared integer and the Interlocked operation are effectively enforcing isolation between threads hence allowing all idle processor cores to participate in the operation

- Task.Create is called with TaskCreationOptions.SelfReplicating option

- If the work carried out by the action is small, cache contention, and performance overhead of calling an Interlocked operation may become an issue

- Depending on the value of step, you could use other Interlocked operations such as Increment, Decrement and CompareExchange to improve efficiency

To avoid possible performance overheads, it would be advisable to process a stride of indices at once:

static void SimpleFor(int from, int to, int step, Action<int> body)

{

  if (body == null) return;

  if (to > from)

  {

    const int stride = 10; // take a stride of size 10

    int sharedIndex = from;

    int increment = stride * step;

    Task task = Task.Create(delegate

    {

      int startIndex;

      do

      {

        startIndex =

           Interlocked.Add(ref sharedIndex, increment) - increment;

        for (int i = 0; i < stride && i + startIndex < to; i += step)

        {

     body(startIndex + i);

        }

      } while (startIndex + stride < to);

    },

       TaskCreationOptions.SelfReplicating);

    task.Wait();

  }

}

If you have not already done so, download the December CTP of Parallel Extensions to .NET Framework and let the team know your thoughts through here.