What is a Race Condition?

I've been posting a bit, lately, on debugging multi-threaded applications.  When talking about deadlocks, I mentioned that I was adding a Sleep statement to ensure that I hit the deadlock issue.  The reason that I needed to force the deadlock was that, depending on timing, the code could either work, or hang the application.  The example I was using was a classic "race condition".  If Thread 2 completed before Thread 1 started, there would appear to be no bug.  However, if Thread 1 entered the lock statement first, the application would hang.

What is a race condition?
I mentioned earlier that the situation in my example was called a "race condition".  What exactly does that mean?  Multi-threaded applications are very similar to sporting events such as sprints, marathons and stock car races.  As with these more traditional races, for any given event, any entrant can win.  In a marathon, if an unexpected entrant wins, it's exciting.  In software, if an unexpected thread wins the race the results are typically unpleasant (application hangs, data corruption, etc) for users.

It has been my experience that running multi-threaded applications on multiple processor machines tends to encounter race conditions more frequently.  On these systems multiple simultaneous application threads are not only possible, they are likely and in such an environment, race condition bugs have been, for me, much more likely to surface.

Debugging race conditions
How do you debug a race condition?  This is a difficult question to answer.  Whenever you make any change to the system such as: running on a different system, instrumenting the binary with diagnostic logging or running under the debugger, the timing characteristics of your application can be changed.  This change in the application's timing can cause a different thread to win the race.  If we are lucky, this causes the bug to surface.  If we are not, this timing change can mask the problem.

What has worked for me has been to instrument my code (in debug builds) and run the debug build of my application (or component) under the debugger at all times while writing the code.  Whenever possible, I also recommend running on a debug build of the target platform.  Using the debugger to step through your code and setting breakpoints on the methods where thread safety is a requirement, you may be able to find the issue during development.

I have also found that reviewing my threading code with another developer to be a very useful strategy to find race conditions.  By walking through the code and explaining how it works, I have found that many issues, not just those related to threading, can be resolved before the code gets run.

Avoiding race conditions
By far, the best approach to handling race conditions is to code proactively.  If your library code could possibly be used in a multi-threaded application, use synchronization objects (locks, mutexes, ManualResetEvents, etc) to make the code thread safe.   Be sure to take care and perform only thread safety critical operations within these synchronization events to avoid deadlocks such as the example shown in my earlier post (where I used a ManualResetEvent within a lock statement).

Take care,
-- DK

Disclaimer(s):
This posting is provided "AS IS" with no warranties, and confers no rights.