Locking — Isolation — Unit of Work — Performance


I’ve had a bit of a locking/threading theme in some of my recent postings so I thought I’d continue that with a little problem.  Here’s a (simplified) set of operations that is sort of an example of what you might need to do in a free threaded class.  Let’s assume there is one instance of this class that is to be shared between multiple threads any of which might call the DoIt() function at any time.


Here are two possible implementations that are like things I’ve seen in the past.  The question is:  Which of these is “better?”  Is there something that is (in some sense) better still?  What are the factors that influence this decision (Hint: see the title of this posting)


You might have to scroll down to see the code depending on your browser because the <PRE> formatting seems to come out weird with the stylesheets in use here.

    // it’s easy to explain what this does
class ExampleCoarse
{
public ExampleCoarse() {}
private Object myLock = new Object();
private int count;

public void DoIt(StreamWriter sw)
{
lock(myLock)
{
long ticks = DateTime.Now.Ticks;
int count = IncrementCount();
LogToStream(sw, ticks, count);
}
}

private int IncrementCount()
{
return ++count;
}

private void LogToSteam(StreamWriter sw, long ticks, int count)
{
sw.WriteLine(“{0} {1}”, ticks, count);
}
}

// it’s not so easy to explain this… or how to use it.. but is it better?
class ExampleFine
{
public ExampleFine() {}
private Object myLock = new Object();
private int count;

public void DoIt(StreamWriter sw)
{
long ticks = DateTime.Now.Ticks;
int count = IncrementCount();
LogToStream(sw, ticks, count);
}

private int IncrementCount()
{
int c = System.Threading.Interlocked.Increment(count);
return c;
}

private void LogToSteam(StreamWriter sw, long ticks, int count)
{
lock (myLock)
{
sw.WriteLine(“{0} {1}”, ticks, count);
}
}
}

Comments (11)

  1. Lauren Smith says:

    You incur the extra cost of a mutex in the InterlockedIncrement call, but you also risk interleaving threads between the call to IncrementCount and LogToSteam.  If you don’t mind the output to be out of order, then this may be the speed demon you’re looking for.

    Let’s take two threads A and B.  In your first example, if A enters DoIt first, B must wait until A finishes before it may enter DoIt.  If we assume that each action in DoIt requires 1 unit of worktime, then two threads hitting DoIt at the same time will require 6x worktime.  In fact, any single CPU system will require the same.  However, thread B must wait idly for 3x worktime before beginning any work.  What a waste!

    Your second example, assuming maximum interleaving and multiple CPUs, will allow the two threads to complete execution in 5x worktime.  The first call to Ticks is simultaneous, and the next two calls are blocked, so it requires the total amount of worktime: 4x.

    Unfortunately, it also means that execution may be out of order with thread A getting a Tick of 1 and count of 10 and thread B getting a Tick of 2 and a count of 9 and the output showing thread B’s output first.  That’s acceptable if you think it is.  Maybe not so acceptable for other applications.

    The real unit of work is defined by the application.  So if the output must be absolutely in order, then it may make sense to wrap the last two calls in its own mutex and do away with the InterlockedIncrement.  If you’re looking to maximizing the number of simultaneous threads which may access the DoIt method at any given time, then your fine grained locking scheme works great.

    Of course this also depends on how much using a mutex costs in relation to the amount of work done.  If it is relatively expensive, you don’t want to be acquiring and releasing mutexes all the time.  It may be faster in practice to wrap a large chunk of work in a mutex than to wrap each individual action (or it might not).  That’s what we’ve got profilers and stopwatches for.

  2. Carlos says:

    I’d be amazed if this made any measurable speed difference; the cost of the WriteLine must far exceed the cost of the increment and the DateTime.Now call.  And the second method can write the lines out of order, so overall the first looks like the "best".

    You could improve the second method (i.e. reduce the lock contention) by formatting the output before you take the lock.  My experience in other languages is that formatting is much slower than writing a few bytes to a stream, so this could make a big difference.  But then you’d have to worry about the memory used by the formatting.  e.g. a per-thread MemoryStream might be better than creating a new string on each call.  (I’m assuming that StreamWriter.WriteLine formats directly into the stream; if this isn’t true then the memory-use is probably a non-issue.)

    Of course, the bottom line is that you’d have to measure it.

  3. Scott Singer says:

    I don’t know if it was your intent with this example, but it occurs to me that trying to synchronize access to the externally provided StreamWriter instance is incorrect.  

    Since the example class is shared by many threads, it seems likely ( at least possible ) that the StreamWriter that is passed in will be a different instance for each of the many, concurrent, callers.  We can’t assume in this code that all callers will be providing the same StreamWriter instance.  With that said, causing all calling threads to block while we update a particular StreamWriter will cause unnecessary contention and reduce performance.

    It seems that it would be better to synchronize only the resources that are locally owned by this class instance, which is the ‘private int count’, and that is well served by just doing the iterlocked increment.

    In my scenario, it’s up to the caller to know if it’s StreadWriter is shared between threads.  If the streamwriter was, in fact, used by multiple threads, and it was used in any other areas of code, then the local locking in this example would be insufficient.

    Below are two modified examples.  The first follows the same code organization as the original code, the second breaks it down into a more concise example.  ( A third example could put everything in DoIt() on one line of code. )

       class ExampleWithLess

       {

           public ExampleWithLess() {}

           private int count;

           public void DoIt(StreamWriter sw)

           {

               long ticks = DateTime.Now.Ticks;

               int count = IncrementCount();

               LogToStream(sw, ticks, count);

           }

           private int IncrementCount()

           {

               int c = System.Threading.Interlocked.Increment(count);

               return c;

           }

           private void LogToSteam(StreamWriter sw, long ticks, int count)

           {

               sw.WriteLine("{0} {1}", ticks, count);

           }

       }

       class ExampleWithEvenLess

       {

           public ExampleWithLess() {}

           private int count;

           public void DoIt(StreamWriter sw)

           {

               long ticks = DateTime.Now.Ticks;

               System.Threading.Interlocked.Increment(count);

               sw.WriteLine("{0} {1}", ticks, count);

           }

       }

  4. ricom says:

    Scott writes:

    >>I don’t know if it was your intent with this example, but it occurs to me that trying to synchronize access to the externally provided StreamWriter instance is incorrect.

    Critical observation there.

    And what else do we not know along those lines?

    (Btw this is basically an example I saw "in the wild", I just simplified it)

  5. You probably have to look at the purpose behind that kind of logging.

    Do you need a strictly increasing serial number (regardless of whether the output streams are different), or would a unique ID do just as well?  In the latter case, you can have thread-private data that lets you manage a chunk of IDs per thread (thread 1 gets IDs 0-999, thread 2 gets IDs 1000-1999, etc.).  Then you only need locking when you must grab a new chunk of IDs; the version with Interlocked.Increment() freezes the memory bus for all the other processors while the counter is being incremented.

    Accessing the stream is way more expensive than bumping a counter.  Can you buffer the information to be logged (i.e. store the <counter, timestamp, stream> tuples somewhere) and write to the streams later?

  6. Carlos says:

    "And what else do we not know along those lines?"

    We don’t know who’s responsible for instantiating the classes.  If you have multiple instances the locks are useless.  They should be implemented with a singleton pattern (and with the StreamWriter moved into the class as mentioned above.)

  7. Alois Kraus says:

    The fastest locking is no locking at all. To get maximum speed we could make the data thread local with the [ThreadStatic] attribute. If the counter is meant to be AppDomain global then it should be put into a static class and not in an instance one. Thread  thrashing should be avoided if you only want to print out the current time and a number which does not serve a real purpose but degrades performance. If you eliminate the counter and leave only the time formatting then you should be done with a static function.

       public static void DoIt(StreamWriter sw)

       {

             long ticks = DateTime.Now.Ticks;

             sw.WriteLine("{0} ", ticks);

       }

    By the way my performance contest to format the time in the most performant way did lead to some interesting results: http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx

    The coarse and fine grained examples should not differ so much in terms of performance since the formatting takes quite some time. Although I would vote for the second fine grained example to be a little more performant since it does block for a shorter time which does reduce thread contention.

    Yours,

     Alois Kraus

  8. Alois Kraus says:

    Oh and I forgot to mention that if the writing to the stream is slow and highly concurrent you should definitely check out its asynchronous version BeginWrite to benefit from the .NET thread pool to get a scalable solution.

    Yours,

     Alois Kraus

  9. Jon Rista says:

    I think the ultimate answer to the question is, how exactly does this work in a practical scenario? Both of your examples have pros and cons, depending on how they are used and what the expectations are. Both also have flaws that could be corrected and make them better performing and more useful for different situations.

    Locking an external resource (in this case the StreamWriter) seems like a bad idea, as you don’t know where it came from or who else might already be locking it (which leads to the question how long will they hold the lock and degrade the performance of your logger). There are no guarantees that each time DoIt() is called that the stream will be the same as other calls, which was also mentioned previously.

    (Which makes me wonder, lock() is an object-specific call, is it not? If two different streams are passed in to DoIt() from different callers, lock will not block if its called on a different object the second time, if memory serves. If that is truely the case, then I think that changes the scenario quite a bit.)

  10. A few days ago I posted a concurrency problem for commentary and I got a very nice set of responses…

  11. snaveen says:

    This is just a trivial one.

    Interlocked increment takes an int as ref parameter which is missing in your sample code.

    int c = System.Threading.Interlocked.Increment(count);