Performance: Don't trust, don't make assumptions... Measure!

When it comes to performance, there's a hard lesson I had to learn: never trust anyone, measure for yourself! Don't trust the gurus, the MSDN documentation, the technical fellow that wrote the code. Instead, do some simple performance measurements to see what is really going on.

I had to instrument my C# code with performance counters. I started reading the MSDN PerformanceCounter documentation, and here is what I found out:

The Increment, IncrementBy, and Decrement methods use interlocks to update the counter value. This helps keep the counter value accurate in multithreaded or multiprocess scenarios, but also results in a performance penalty. If you do not need the accuracy that interlocked operations provide, you can update the RawValue property directly for up to a 5 times performance improvement. However, in multithreaded scenarios, some updates to the counter value might be ignored, resulting in inaccurate data.

Because the application I instrumented was multithreaded, I didn't want to use the RawValue mechanism. Not much choice to be made, no headache. My application needed to create and cleanup multiple performance counter instances based on certain object instances, and these objects were accessed from multiple threads. Then, my requirements changed: I had to introduce my own locks around perf counters. So the question became: should I use the PerformanceCounter.Increment/Decrement functions inside my lock, or just change the RawValue directly? Based on the documentation, the second option should have been more performant.

But, like I said: never trust the documentation. So I wrote a simple program to test the 2 methods. First, I created a performance counter category and added a counter to it:

 CounterCreationDataCollection dataCollection = 
    new CounterCreationDataCollection();
dataCollection.Add(new CounterCreationData(
    "MyCounterName", 
    "My special test counter", 
    PerformanceCounterType.NumberOfItems64));
PerformanceCounterCategory.Create(
    "MyCategory", 
    "My special test category",
    PerformanceCounterCategoryType.SingleInstance,
    dataCollection);

Now that the counters are installed, let's start incrementing them. I tested 4 configurations: 1. using RawValue, 2. using RawValue under a lock, 3. using Increment, 4. using Increment under a lock.

 enum PerfCounterOperations
{
    Increment,
    IncrementLock,
    Raw,
    RawLock
}

I created a disposable class that instantiates a performance counter in our installed category.

 using System;
using System.Diagnostics;
using System.Threading;

sealed class PerfCounterWrapper : IDisposable
{
    static Stopwatch watch = new Stopwatch();
    int noRepetitions = 10000;

    readonly PerformanceCounter perfCounter;
    bool enabled;
    readonly object thisLock;

    public PerfCounterWrapper()
    {
        this.thisLock = new object();
        try
        {
            this.perfCounter = new PerformanceCounter(
                "MyCategory", 
                "MyCounterName", 
                false);
            this.perfCounter.RawValue = 0;
            this.enabled = true;
        }
        catch (InvalidOperationException ex)
        {
            // The category is not installed 
            // or there isn't enough memory to create the counter
            Console.WriteLine("Error creating counter: {0}", ex.ToString());
            this.enabled = false;
        }
    }

    public void Dispose()
    {
        if (this.enabled)
        {
            this.perfCounter.Dispose();
            this.enabled = false;
        }
    }     .....
}

Here is the method that simulates the 4 types of increment operations:

 void Increment(object perfCounterOperation)
{
    PerfCounterOperations operation = 
        (PerfCounterOperations)perfCounterOperation;
    switch (operation)
    {
        case PerfCounterOperations.Increment:
            if (this.enabled)
                for (int i = 0; i < noRepetitions; i++)
                    this.perfCounter.Increment();
            break;
        case PerfCounterOperations.IncrementLock:
            for (int i = 0; i < noRepetitions; i++)
                // We could simplify this, but we don't want to; 
                // requirements: check "enabled" inside the lock
                lock (this.thisLock)
                    if (this.enabled)
                        this.perfCounter.Increment();
            break;
        case PerfCounterOperations.Raw:
            if (this.enabled)
                for (int i = 0; i < noRepetitions; i++)
                    this.perfCounter.RawValue += 1;
            break;
        case PerfCounterOperations.RawLock:
            for (int i = 0; i < noRepetitions; i++)
                lock (this.thisLock)
                    if (this.enabled)
                        this.perfCounter.RawValue += 1;
            break;
    }
}

The class has methods that measure the number of ticks necessary to increment the counters in a loop. I measured the results with and without contention (by starting multiple threads in the first case and just one thread in the second).

 void IncrementCounterSingleThread(PerfCounterOperations operation)
{
    watch.Reset();
    watch.Start();
    this.Increment(operation);
    watch.Stop();
    Console.WriteLine("{0}: {1} ticks", operation, watch.ElapsedTicks);
}

void IncrementCounterMultipleThreads(PerfCounterOperations operation)
{
    Thread[] t = new Thread[10];
    for (int i = 0; i < 10; i++)
        t[i] = new Thread(new ParameterizedThreadStart(this.Increment));

    watch.Reset();
    watch.Start();
    for (int i = 0; i < 10; i++)
        t[i].Start(operation);

    for (int i = 0; i < 10; i++)
        t[i].Join();

    watch.Stop();
    Console.WriteLine("{0}: {1} ticks", operation, watch.ElapsedTicks);
}

Then we just call these methods for the 4 types of operations and get the results.

 void MeasureOpSingleThread(int noRepetitions)
{
    this.noRepetitions = noRepetitions;
    Console.WriteLine("Single threaded - no contention");
    this.IncrementCounterSingleThread(PerfCounterOperations.Raw);
    this.IncrementCounterSingleThread(PerfCounterOperations.RawLock);
    this.IncrementCounterSingleThread(PerfCounterOperations.Increment);
    this.IncrementCounterSingleThread(
            PerfCounterOperations.IncrementLock);
}

void MeasureOpMultipleThread(int noRepetitions)
{
    this.noRepetitions = noRepetitions;
    Console.WriteLine("Multiple threads - with contention");
    this.IncrementCounterMultipleThreads(PerfCounterOperations.Raw);
    this.IncrementCounterMultipleThreads(PerfCounterOperations.RawLock);
    this.IncrementCounterMultipleThreads(PerfCounterOperations.Increment);
    this.IncrementCounterMultipleThreads(
            PerfCounterOperations.IncrementLock);
}

Lastly, we create a wrapper and measure the operations for different number of repetitions:

     static void Main(string[] args)
    {
        using (PerfCounterWrapper p = new PerfCounterWrapper())
        {
            for (int i = 5000; i < 100000; i += 5000)
            {
                p.MeasureOpSingleThread(i);
                p.MeasureOpMultipleThread(i);
            }
        }
    }

And here are the results:

image

image

Based on our measurements, the MSDN documentation is misleading: accessing the RawValue is slower than using the Increment/Decrement function, both under lock and with no lock.

Notice that the code is throw away - far from ship quality, doesn't contain code to handle exceptions or corner cases; all I wanted is to write a simple test fast and get some numbers. This is just one example; the general lesson is: when it comes to performance, do your own tests. Measure, measure, measure.