Design: Task Parallel Library explored

As some of you may know, the threadpool code of the .NET is helping many developers to use multiple threads on their applications, increasing “sometimes” the responsiveness and delegating all the switching responsibility. But, for those less fuzzy that like to know how it really works you usually find that the code is not the best threadpool library. The internal reason for it is mostly related to inherit code. If we really have to write it again we will do it in a completely different way but this is a story for another post, what I really want to focus today is in the inner workings of the task parallel library, that come to complement a important part of it : Organized parallelism.

I am not planning to explain what is TPL, if you need this information I will recommend this basic article that will introduce you to the library here. The idea is to show you how the internal architecture works and some considerations on the use of it, as it not a magical library, you can still write bad code on it.

Let’s start with the general architecture, as we can see on the first figure the TPL sits just above the CLR, both teams has been working together in order to coordinate the implementation. The TPL library contains the task scheduler, this is available for extensibility. The best examples of this extensibility are the Parallel framework and PLINQ.

                  

Let’s start from the top of the stack; the parallel FX publishes a set of methods that allows developers to access the library (note that is static (shared VB))

Parallel.For (start, end, delegate(int param) { a[param] = a[param]*a[param]; });

The method will invoke the TPL library, assigning the parameters and the delegate to execute (note that this can be a lambda expression as well!). The interesting bit comes at this stage when the request enters in the main queue. The library evaluates the amount of threads that will be required for the task, trying to find the optimal amount based on amount of cores and length of the task. Is even possible to decide to use a single thread in a sequential mode, this scales well as the developers does not need to know how many cores will have the final users, and the application will behave differently if the system is scaled in.

          

In order to avoid a bottleneck in the queue consumption by the threads the TPL implements a queue per thread. The main queue distributes equally the tasks across the available threads and each thread starts to execute the delegates that are waiting on the individual queues, this is a common pattern applied in OpenMP. In theory this works fine but as we know, the cores are not always available for the threads; therefore one thread maybe finishes the queue earlier than the other threads. This situation is not optimal therefore the queue stealing model is applied. When the thread does not has any other task on the queue it will start to query the neighbour queues looking for tasks. If it finds more tasks it will remove them from the busy queue respecting the order, optimizing the processor time. As you have guessed by this time, the threads are not based on the threadpool, as they have core affinity.

This execution model does not guarantees the execution order, as if you have 1000 tasks in a 4 core machine all of them should have 250 tasks, but if a thread finishes earlier it will consume other tasks. This is an important consideration if you need to respect certain order on the execution, as I said at the beginning, it is important to use this extensions when you are completely sure about the task independence. The same applies if the delegates use shared memory, a contention is possible on this scenarios that will limit the scalability.

The final architectural consideration is regarding exceptions. In a normal sequential “for” if an exception is detected the loops stops, with the parallel library this becomes a little more difficult.

             

If an exception occurs in one of the threads the TPL will bubble up the exception and it will inform the other threads that the execution needs to be cancelled. This happens automatically, but the TPL cannot guarantee that some of the tasks are executed after the exception. Now, what happen if two threads throw an exception at the same time? The parallel library throws a special exception type that contains both exceptions. This is something important to consider.

To summarize this blog I must confess that I am quite excited about the potential of this library in the future, is only the first version but I can see the future of parallel computing in a single place. If we think about the next step....mmm...maybe including grid computing... J I better stop here.

UPDATED: New Microsoft Parallel Computing Developer Section (including CTP download: https://msdn2.microsoft.com/en-us/concurrency/default.aspx)