The ThreadPool is dead

Of course, the ThreadPool isn't actually dead - it's just hiding. Or at least it should be. Long live the Task Parallel Library.

During my career I've got to play with some reasonably serious parallel challenges. As a result, I've posted a number of articles on this very subject in the past on this blog. I have to admit, I found it a fascinating and challenging area; I have very fond memories of working on some of the puzzles this stuff presented.

I cut my threading and synchronization teeth working for a travel company where the vast majority of their compute time was spent waiting on network calls (in many cases to a server in a different continent). Even in the old world, with just one or two processors, introducing multiple threads to this scenario reaped massive rewards provided you had the bandwidth to support the increased pressure on the network and therefore the cost of introducing multiple threads was well worth the investment. But listen up, it was a fairly serious investment. The 'tax', if you will, on developer productivity when creating heavily parallelized code is high. The potential for really-nasty-but-easily-made-mistakes-that-could-lead-to-data-corruption-and-other-bad-things would dissuade most of us from even attempting serious parallelization in all but the most mundane areas of our systems. The myriad ways you could incorrectly utilize one of the many synchronization primitives at your disposal was dizzying. And the stakes with Exception handling were painfully raised.

With the increasing availability of multi-core machines and the increasing numbers of cores they possess, the range of applications that can benefit from parallelism has increased. Even applications which are processor bound, rather than I/O, are now in a position to reap the rewards of parallelism.

Historically if we wanted our application to go faster we just go on holiday for 6 months and buy a newer chip when we got home. In the words of Herb Sutter, that free lunch is over. Now the onus is on software (and therefore developers and architects) to change tack and position itself to take advantage of the multi-core age.

What a shame then, that it's all so hard to do. If only there was something out there which could make this stuff easier and accessible to developers with a threading gamer score of less than a million? Something that allowed developers to describe what they wanted to achieve without thinking in terms of threads and locks and monitors and semaphores and reset events? Enter the Task Parallel Library, or TPL.

Every so often a library comes along that makes you nauseous with the sentiment "Why didn't I think of that?". Well, it always seems so obvious after somebody else has done all the hard work. And so, enter the TPL, my new unrequited codemance.

The P&P guys have put together some great documents on the TPL and PFX (Parallel Framework) on codeplex: https://parallelpatterns.codeplex.com/.

Specifically, in the coding guidelines from they suggest obsoleting the use of ThreadPool.QueueUserWorkItem - whoa! That's serious! I use QueueUserWorkItem at least thrice a day. Who doesn't? So why/how would I stop?

Of course, it transpires that and there's a couple of reasons why this is a really smart thing to do. Hopefully, this explains the name of this articlette (except that the TPL itself uses the ThreadPool so I really should have entitled the post QueueUserWorkItem is dead? but that wouldn't have grabbed your attention in the same way). Here's just a couple of reasons why you should consider using the TPL instead of the ThreadPool directly.

Inlining

The TPL has a class called Task that represents an activity that might run on a different thread. They're really easy to use:

Task taskB =new Task ( delegate
{
DoWork();
});

Task taskA =new Task ( delegate
{
DoSomeStuff();
// now need to be sure that taskA has finished before continuing
taskB.Wait();
DoMoreStuff();
});

taskA.Start();
taskB.Start();

In this example there are two tasks. Task A can do some stuff in parallel to task B but needs to wait for it to complete before running to completion. The TPL manages how and when these tasks are actually executed on the ThreadPool.

If we were doing this the old skool way we'd probably have two delegates queued on the ThreadPool and use a ManualResetEvent to communicate, letting A know when B has finished. However, if the machine is busy it's entirely possible that B won't be picked up for execution until sometime after A has blocked on the ManualResetEvent. This ThreadPool thread would now be stuck, blocked and wasted until B eventually gets scheduled and completes. Very wasteful and this would limit the load an application in IIS could deal with.

Not with the TPL. In this example, if task B hasn't commenced at the point task A tries to wait on it then task B will be inlined and executed on the same thread as A. Now that's smart.

TPL 1 - 0 QueueUserWorkItem

Multiple Task Queues

ThreadPool.QueueUserWorkItem uses a single queue to store the backlog of work items. This is probably fine in most of today's environments today with a maximum of maybe 8 threads actually being processed at any one time (e.g. running on an eight core server). However, as we enter the massively multi-core era and we have 24, 32 or even 256 cores the cost of synchronizing access to a single queue is going to start to bite hard.

Significant processing time will be wasted waiting for another core to finish accessing the queue. And so the TPL takes a different approach for Tasks with Per-Thread Local Task Queues.

The TPL provides a separate Task queue for each thread in the ThreadPool, thus helping to eliminate this bottleneck. Any task created on a ThreadPool thread will automatically be placed into the local queue and these items processed in last-in-first-out order (LIFO). When this queue is exhausted the thread will go back to the global queue (and if that's exhausted a work-stealing algorithm will allow tasks to be taken from another thread's local queue).

TPL 2 - 0 QueueUserWorkItem

Arguably though, the most compelling reason is that your code will be easier to read and ready to benefit from future enhancements that may come to the TPL. If you still think this doesn matter and you don't need no stinking parallelism then think again. The number of logical cores available in the average machine is growing. Code you author today will one day run on a 32 core machine. Previously this has never been a concern as the assumption was that processors would get faster and your code would effortlessly take advantage of that benefit when it arrives. If you desire to code better then you should be thinking parallel ready. Period.

 

Originally posted by Josh Twist on 22nd August 2010 here: https://www.thejoyofcode.com/The_ThreadPool_is_dead.aspx