Guided Tour of the Concurrency Runtime and Parallel Libraries Part 1: supporting fine-grained tasks

In the next several blog posts I want to continue the early look of what we're considering in the Concurrency Runtime and the Parallel Libraries built on top of it. Today, I'll share primitives for expressing fine-grained parallelism and concurrency. In the next few posts in this series, I'll provide a look at the additional primitives for expressing concurrency that we are considering, touch on higher level APIs like the parallel_for I showed briefly last time, and finally take a deeper look into the Concurrency Runtime and how it works.

Introducing task_handle and task_group

When we think about enabling programmers to decompose logical work into a collection of finer-grained “tasks” suitable for concurrent execution there are two fundamental abstractions that we want to ensure are provided:

· A task is a computation that may be internally decomposed into additional tasks.

· A task group is a collection of tasks that form a logical computation or sub-computation for the purpose of either completion or exception processing.

The task object we're considering is an extremely lightweight template object; effectively it contains a reference to some “work” as described by a functor or lambda and looks like this:

 template <class Function>
class task_handle{
public:
task_handle(const Function& f );

};

The task group object has a few more methods on it which in addition to construction allow scheduling tasks with the overloaded run methods and waiting for or cancelling the scheduled tasks to complete. Also note the constructor which takes a function object parameter, this is intended to help process exceptions on the task_group.

class task_group {

   public:

task_group();
template <class _ExceptionFunc>
task_group(_ExceptionFunc Fn);
template <class _Func>

      void run(const _Func& Fn);

      template <class _Func>

      void run(task_handle<Function>& t);

      void interrupt();

      task_group_status wait();

      …

};

Dividing and conquering concurrently with tasks

So let’s take a look at a recursive example, quicksort. One thing that’s nice about divide and conquer algorithms is that the divide step often ends up splitting the work into two disjoint sets. When this happens it makes parallelizing them significantly easier because there isn’t any data sharing. Here’s an abbreviated serial implementation of quicksort:

void quicksort(int * a, int n) {
if (n <= 1)
return;
int s = partition(a,n);
quicksort(a,s);
quicksort(a+s,n-s);
}

We can parallelize this relatively easily with task_group by first creating a task_group, then scheduling the two recursive calls with C++0x lambdas via the overloaded task_group::run template method, finally there’s a call to task_group::wait which waits for both halves to finish. I’ll leave parallelizing the partition step for a future blog post.

void quicksort(int * a, int n) {
if (n <= 1)
return;
int s = partition(a,n);
task_group g;
g.run([&]{quicksort(a,s);});
g.run([&]{quicksort(a+s,n-s);});
g.wait();
}

Note that you can also do this directly with a task_handle if you’d like to manage the storage more directly, here I’m declaring 2 task_handles on the stack and the scheduling them by reference with the recursive calls to quicksort, but the tasks could just as easily been allocated with another custom allocator:

void quicksort(int * a, int n) {
if (n <= 1)
return;
int s = partition(a,n);
task_group g;
task_handle t1([&]{quicksort(a,s);});
g.run(t1);
task_handle t2([&]{quicksort(a+s,n-s);});
g.run(t2);
g.wait();
}

That’s a very brief look at task_handle and task_group. In part 2, I’ll continue talking about task_group, specifically handling exceptions and cooperatively cancelling work with the task_group::interrupt method.

As always your feedback & discussion is incredibly valuable, it’s a major reason for providing this early look into what we’re thinking about so if you’re reading this, please take a moment to comment or ask questions.

-Rick