Guided Tour of the Concurrency Runtime and Parallel Libraries Part 1: supporting fine-grained tasks


In the next several blog posts I want to continue the early look of what we’re considering in the Concurrency Runtime and the Parallel Libraries built on top of it.  Today, I’ll share primitives for expressing fine-grained parallelism and concurrency.  In the next few posts in this series, I’ll provide a look at the additional primitives for expressing concurrency that we are considering, touch on higher level APIs like the parallel_for I showed briefly last time, and finally take a deeper look into the Concurrency Runtime and how it works.


Introducing task_handle and task_group


When we think about enabling programmers to decompose logical work into a collection of finer-grained “tasks” suitable for concurrent execution there are two fundamental abstractions that we want to ensure are provided: 


·       task is a computation that may be internally decomposed into additional tasks. 


·       A task group is a collection of tasks that form a logical computation or sub-computation for the purpose of either completion or exception processing.


The task object we’re considering is an extremely lightweight template object; effectively it contains a reference to some “work” as described by a functor or lambda and looks like this:


 template <class Function>
  class
task_handle{
     
public:
        task_handle(
const Function& f );


};


The task group object has a few more methods on it which in addition to construction allow scheduling tasks with the overloaded run methods and waiting for or cancelling the scheduled tasks to complete.  Also note the constructor which takes a function object parameter, this is intended to help process exceptions on the task_group.


class task_group {


   public:


task_group();
     
template <class
_ExceptionFunc>
      task_group(_ExceptionFunc Fn);

      template <class _Func>


      void run(const _Func& Fn);


      template <class _Func>


      void run(task_handle<Function>& t);


      void interrupt();


      task_group_status wait();


     


};


Dividing and conquering concurrently with tasks


So let’s take a look at a recursive example, quicksort.  One thing that’s nice about divide and conquer algorithms is that the divide step often ends up splitting the work into two disjoint sets. When this happens it makes parallelizing them significantly easier because there isn’t any data sharing.  Here’s an abbreviated serial implementation of quicksort:


void quicksort(int * a, int n) {
   if (n <= 1)
      return;    
   int s = partition(a,n);
   quicksort(a,s);
   quicksort(a+s,n-s);
}


We can parallelize this relatively easily with task_group by first creating a task_group, then scheduling the two recursive calls with C++0x lambdas via the overloaded task_group::run template method, finally there’s a call to task_group::wait which waits for both halves to finish.  I’ll leave parallelizing the partition step for a future blog post.


void quicksort(int * a, int n) {
   if (n <= 1)
      return;    
   int s = partition(a,n);
  
task_group g;
   g.run([&]{quicksort(a,s);});
   g.run([&]{quicksort(a+s,n-s);});
   g.wait();
}


Note that you can also do this directly with a task_handle if you’d like to manage the storage more directly, here I’m declaring 2 task_handles on the stack and the scheduling them by reference with the recursive calls to quicksort, but the tasks could just as easily been allocated with another custom allocator:


void quicksort(int * a, int n) {
   if (n <= 1)
      return;    
   int s = partition(a,n);
  
task_group g;
   task_handle t1([&]{quicksort(a,s);});
   g.run(t1);
   task_handle t2([&]{quicksort(a+s,n-s);});
   g.run(t2);
   g.wait();
}


That’s a very brief look at task_handle and task_group.  In part 2, I’ll continue talking about task_group, specifically handling exceptions and cooperatively cancelling work with the task_group::interrupt method.


As always your feedback & discussion is incredibly valuable, it’s a major reason for providing this early look into what we’re thinking about so if you’re reading this, please take a moment to comment or ask questions.


-Rick

Comments (11)

  1. aaron says:

    I notice that reference semantics were used for the lambdas here.  I was curious if that offers any particular value — couldn’t value semantics also be used, allowing for less indirection?  ex:

    g.run([]{quicksort(a,s);});

    #aaron

  2. rickmolloy says:

    Value semantics could just as easily used, but you would need to capture the variables explicitly so it would look like this:

     g.run([a,s](){quicksort(a,s);});

  3. Ben L says:

    This blog is exactly what I’m looking for. But there is no mention of what libraries this uses. Is this an upcoming library or one I can use? All I’m finding are .net assemblies.

  4. rickmolloy says:

    I’m glad it’s helpful to you.  As I mentioned, this is technology we’re currently exploring and I don’t have any ship or CTP dates to announce.  I will definitely make sure to post any of information if it becomes public.

  5. fredo says:

    are you aware of some effort put by the boost library to develop such concurrent primitives library?

    it’s already working and uses functors instead of lambdas for imediate availability…

    why don’t help them instead of reinventing the wheel, as their implementation has great potential to become a c++ standard ?

  6. aaron says:

    fredo,

    You may want to note that C++0x lambdas are a shorthand way of implementing functors.  

    I think at this point the C++ standards folks have their hands full adding memory atomics, barriers, threads, and intialization semantics, to be supported over all ranges of computing hardware, not just desktop and server PC’s.  But I agree it would be very interesting to see other concurrency primitives libraries enter the C++ standard at some point.

    #aaron

  7. rickmolloy says:

    functors are intended to have full support.  It’s a lot more convenient, particularly in a blog, to use lambdas than it is to create a functor and construct it.  

    Writing:

      tg.run([&i]{doSomething(i);})

    is a lot easier than writing the following:

    class DoSomethingFunctor{

    private:

      const int& i_;

    public:

      DoSomethingFunctor(const int& i):i_(i){};

      void operator()(){

         doSomething(i_);

      }

    };

    tg.run(DoSomethingFunctor(i));

  8. Art Scott says:

    I’m way interested.

    Thanks for the blog.

    Any chance of using F# for some examples?

    And more graphics examples, less PLINQ?

  9. rickmolloy says:

    Art,

    take a look at the parallel extensions team blog if you haven’t already and they have a downloadable CTP available: http://blogs.msdn.com/pfxteam/

  10. Arch Robison says:

    There’s a subset implementation of task_group at http://softwareblogs.intel.com/2008/07/02/implementing-task_group-interface-in-tbb/ .  I implemented enough functionality there to support examples like quicksort.

  11. Daniel Moth says:

    Parallel Tasks � new Visual Studio 2010 debugger window