Limiting concurrency for faster and more responsive apps

Andrew Arnott

May 11th, 20170 1

When you have a set of highly parallelizeable work, executing it concurrently can be easy:

Of course you’d probably want to track the work at least so you know when it’s done:

Calling Task.Run schedules the work to run on the .NET ThreadPool which is highly tuned and can likely get the work done as fast as you have CPUs to do the work as it tends to schedule at least as many threads as the machine has CPU cores. The problem with the above code is that it floods the threadpool queue with items all at once. So if you have 10 items on your list, and only two cores, the queue and threads look like this:

 Queue     Active 
XXXXXXXX     XX

Now suppose after you add all this work to the queue, your app adds one more item Y to the queue:

 Queue      Active 
YXXXXXXXX     XX

Notice its placement is *behind* all that work you previously enqueued. When Y is more urgent than X, you just have to wait till all the X items are done anyway before Y gets time:

 Queue      Active 
YXXXX         XX

 Queue      Active 
YX            XX

 Queue      Active 
              YX

That can be very bad for application responsiveness or perceived performance if the user is waiting for Y but doesn’t care (as much) how long X takes.

We can solve this by applying a simple throttle to the generator of X tasks, such that X items are progressively enqueued over time so that Y can get attention more quickly. If we assume that X has already started, the timeline on the queue may look like this:

 Queue      Active 
YXX            XX

 Queue      Active 
XXY            XX

 Queue      Active 
XXX            XY

 Queue      Active 
XXX            XX

Do you see how the queue is never empty (guaranteeing all cores are still busy) but Y got through much more quickly? It’s a fairness scheme. And it ensures that X still runs very quickly in the absence of other work, and when other work exists, it shares the threadpool much more equitably.

Encoding this is actually quite simple if we use the ConcurrentExclusiveSchedulerPair class. Suppose our goal is to keep all cores busy while maintaining a short threadpool queue, no matter how many items of work you have to complete. Just a few extra lines is all we need to achieve our fairness scheme:

In the above, we’re using a concurrent TaskScheduler that comes with .NET that allows us to throttle the rate at which work is passed to the ThreadPool. By scheduling items for twice as many cores as the machine has, we will maximize CPU usage and have one item on the queue for each, ready and waiting to go as soon as each of the threadpool threads finishes work. As each work item completes, the TaskScheduler we’re using will add another item to the threadpool’s queue, ensuring that it stays busy. But importantly, it also keeps the threadpool queue short at any given time, ensuring that our Y task that comes in from another source can be executed in a reasonably short time.

The ConcurrentExclusiveSchedulerPair approach works great if you’re executing synchronous code, or async code that does not use .ConfigureAwait(false) everywhere. Otherwise, throttling async code that uses .ConfigureAwait(false) requires another technique like use of SemaphoreSlim.

To drive the point home that throttling your concurrency with this technique achieves better responsiveness without sacrificing throughput of your work, I wrote a little console app that schedules the same work both with and without throttling. Then I wrote it again to demonstrate an async work variant.

The output of the synchronous flavor on my machine is:

*** WITHOUT throttling ***
Race has begun.
Threadpool responded in: 00:00:05.7263928
Threadpool responded in: 00:00:00.0000353
FINISHED in 00:00:06.2941732
*** WITH throttling ***
Race has begun.
Threadpool responded in: 00:00:00.0000364
Threadpool responded in: 00:00:00.0000309
Threadpool responded in: 00:00:00.0000722
Threadpool responded in: 00:00:00.0000335
Threadpool responded in: 00:00:00.0000313
Threadpool responded in: 00:00:00.0000262
Threadpool responded in: 00:00:00.0000980
Threadpool responded in: 00:00:00.0000386
Threadpool responded in: 00:00:00.0000255
Threadpool responded in: 00:00:00.0000780
FINISHED in 00:00:04.6851826

The output of the async flavor on my machine is:

*** WITHOUT throttling ***
Race has begun.
Threadpool responded in: 00:00:04.2789838
Threadpool responded in: 00:00:01.6836769
Threadpool responded in: 00:00:00.0000324
FINISHED in 00:00:07.0103151
*** WITH throttling ***
Race has begun.
Threadpool responded in: 00:00:00.0000171
Threadpool responded in: 00:00:00.0000080
Threadpool responded in: 00:00:00.0000342
Threadpool responded in: 00:00:00.0000222
Threadpool responded in: 00:00:00.0000346
Threadpool responded in: 00:00:00.0000361
Threadpool responded in: 00:00:00.0000295
Threadpool responded in: 00:00:00.0000226
Threadpool responded in: 00:00:00.0000397
Threadpool responded in: 00:00:00.0000306
Threadpool responded in: 00:00:00.0000670
Threadpool responded in: 00:00:00.0000309
Threadpool responded in: 00:00:00.0000299
FINISHED in 00:00:06.2486978

I find it very interesting the throttled run is even faster in these tests. Your mileage may vary, but what is very clear and will be consistent is that you can see the threadpool is much more responsive to other work in the throttled run.

And that my friends, is the point I’d like you to take from this post.