Design: Task Parallel Library explored



As some of you may know, the threadpool code of the .NET is helping many developers to use multiple threads on their applications, increasing “sometimes” the responsiveness and delegating all the switching responsibility. But, for those less fuzzy that like to know how it really works you usually find that the code is not the best threadpool library. The internal reason for it is mostly related to inherit code. If we really have to write it again we will do it in a completely different way but this is a story for another post, what I really want to focus today is in the inner workings of the task parallel library, that come to complement a important part of it : Organized parallelism.


I am not planning to explain what is TPL, if you need this information I will recommend this basic article that will introduce you to the library here. The idea is to show you how the internal architecture works and some considerations on the use of it, as it not a magical library, you can still write bad code on it.


Let’s start with the general architecture, as we can see on the first figure the TPL sits just above the CLR, both teams has been working together in order to coordinate the implementation. The TPL library contains the task scheduler, this is available for extensibility. The best examples of this extensibility are the Parallel framework and PLINQ.


                  


Let’s start from the top of the stack; the parallel FX publishes a set of methods that allows developers to access the library (note that is static (shared VB))


Parallel.For (start, end, delegate(int param) { a[param] = a[param]*a[param]; });


The method will invoke the TPL library, assigning the parameters and the delegate to execute (note that this can be a lambda expression as well!). The interesting bit comes at this stage when the request enters in the main queue. The library evaluates the amount of threads that will be required for the task, trying to find the optimal amount based on amount of cores and length of the task. Is even possible to decide to use a single thread in a sequential mode, this scales well as the developers does not need to know how many cores will have the final users, and the application will behave differently if the system is scaled in.


          


In order to avoid a bottleneck in the queue consumption by the threads the TPL implements a queue per thread. The main queue distributes equally the tasks across the available threads and each thread starts to execute the delegates that are waiting on the individual queues, this is a common pattern applied in OpenMP. In theory this works fine but as we know, the cores are not always available for the threads; therefore one thread maybe finishes the queue earlier than the other threads. This situation is not optimal therefore the queue stealing model is applied. When the thread does not has any other task on the queue it will start to query the neighbour queues looking for tasks. If it finds more tasks it will remove them from the busy queue respecting the order, optimizing the processor time. As you have guessed by this time, the threads are not based on the threadpool, as they have core affinity.


This execution model does not guarantees the execution order, as if you have 1000 tasks in a 4 core machine all of them should have 250 tasks, but if a thread finishes earlier it will consume other tasks. This is an important consideration if you need to respect certain order on the execution, as I said at the beginning, it is important to use this extensions when you are completely sure about the task independence. The same applies if the delegates use shared memory, a contention is possible on this scenarios that will limit the scalability.


The final architectural consideration is regarding exceptions. In a normal sequential “for” if an exception is detected the loops stops, with the parallel library this becomes a little more difficult.


             


If an exception occurs in one of the threads the TPL will bubble up the exception and it will inform the other threads that the execution needs to be cancelled. This happens automatically, but the TPL cannot guarantee that some of the tasks are executed after the exception. Now, what happen if two threads throw an exception at the same time? The parallel library throws a special exception type that contains both exceptions. This is something important to consider.


To summarize this blog I must confess that I am quite excited about the potential of this library in the future, is only the first version but I can see the future of parallel computing in a single place. If we think about the next step….mmm…maybe including grid computing… J I better stop here.


UPDATED: New Microsoft Parallel Computing Developer Section (including CTP download: http://msdn2.microsoft.com/en-us/concurrency/default.aspx)

Comments (10)

  1. TPL is cool ….

    What I really would want is some framework to run threads/tasks on multiple machines like Alchemi.net or Digipede …

  2. Keith Patrick says:

    I would imagine the task manager can be subclassed (or the scheduler interfaced out and used as a TM property) to prioritize tasks for doling out to cores, physical chips, or other machines to enable multiple machines. I’m curious how sophisticated (and extensible) the default scheduler will be, but it hopefully won’t be much longer before the first CTP comes out to know for sure.

    MS has some grid project (I remember reading about it on an MSR or job req site, plus it’s maybe alluded to here) in the works, and I’d be surprised if the concurrency library isn’t integral in those plans.

  3. Hassan says:

    Is there any place to download TPL(Parallel FX, PLINQ) API’s ? i am excited to run some test applications like Ray Tracer etc

    I made a simple console app running for loop to square values on Core2 Duo using C#, VS2008, Vista (no parallelism) and found (in Performance monitor) that the processing was distributed among both processors (60%:50%). It seems that VISTA is managing Parallelism, if its true does TPL (P FX) will be beneficial in that case as it has its own overhead in synchronization and threading.

  4. SalvaPatuel says:

    You can find the CTP in the new Microsoft Parallel Computing center website:

    http://msdn2.microsoft.com/en-us/concurrency/default.aspx

    Salvador Patuel

  5. A couple of weeks ago saw the release of the CTP of the Parallel Extensions to the .NET Framework ( download

  6. Over on his blog, Don Syme has a post about F# and Parallel Extensions : "Over the coming year I

  7. Using Parallel Extensions from F# A couple of weeks ago saw the release of the CTP of the Parallel Extensions…

  8. Herman.Wong says:

    TheCommonLanguageRuntimehasalwayshadbasicsupportforparallelprogrammingintheformofloc…

  9. Rednael says:

    It’s an article about basic parallel programming. Examples in C# .Net included. Also, it describes a lightweight parallel framework to work with tasks. Opposed to some other frameworks, this one is very light and very easy to use.

    Regards,

    Martijn

  10. Manuel Ixtapa says:

    I am working with Parallel Extensions and PLINQ, combined with traditional threading existing in Visual C# 2008 (3.0).

    I’ve achieved excellent results in a few days.