ThreadPool performance you can see

We’ve spent a lot of time touting improvements to the .NET Framework in .NET 4 around threading, including core enhancements to the performance of the runtime itself.  Sometimes data is more powerful than words, however, and it’s useful to be able to see exactly what kind of difference such improvements can make.  To assist with that, here is code for a small sample you can try compiling and running on your own:

using System;

using System.Linq;

using System.Threading;

using System.Diagnostics;

 

class Program

{

    static void Main(string[] args)

    {

        Console.WriteLine(

            TimeSpan.FromMilliseconds(

                Enumerable.Range(0, 6).Select(_ =>

                {

                    var sw = Stopwatch.StartNew();

                    CreateAndWaitForWorkItems(10000000);

                    return sw.ElapsedMilliseconds;

                }).Skip(1).Average()

            )

        );

    }

 

    static void CreateAndWaitForWorkItems(int numWorkItems)

    {

        using (ManualResetEvent mre = new ManualResetEvent(false))

        {

            int itemsRemaining = numWorkItems;

            for (int i = 0; i < numWorkItems; i++)

            {

                ThreadPool.QueueUserWorkItem(delegate

                {

                    if (Interlocked.Decrement(

                        ref itemsRemaining) == 0) mre.Set();

                });

            }

            mre.WaitOne();

        }

    }

}

The CreateAndWaitForWorkItems method simply launches N work items using ThreadPool.QueueUserWorkItem and then waits for all N to complete by atomically decrementing a shared counter.  The main method then times the invocation of this method with N equal to 10 million, doing so several times and taking the average.  This microbenchmark is pure overhead (with a lot of synchronization overhead), as there’s no actual work being performed in each work item. In fact, we should expect that as we add more cores (or at least more threads), the time to complete this operation will increase, as more cores will contend for the data structures employed in both the ThreadPool and in my simple test. The hope is that the work done in .NET 4 decreases that overhead, especially on higher core counts where more and more threads will be contending for the shared data structures employed.

The following numbers are in no way official benchmarks, but they can give you a sense for how the work that’s been done in .NET 4 really does make a difference. These are the numbers I see when I run this microbenchmark informally on .NET 3.5 and on .NET 4 on two laptops I currently have access to while writing this blog post.  The only change I made to go from .NET 3.5 to .NET 4 was modifying the “Target framework” in the project’s properties in Visual Studio, taking advantage of Visual Studio 2010’s multitargeting support.

Machine

.NET 3.5

.NET 4

Improvement

A dual-core box

  5.03 seconds

2.45 seconds

2.05x

A quad-core box

19.39 seconds

3.42 seconds

5.67x

Some pretty awesome performance improvements simply by upgrading to .NET 4.