Why threading is hard

Anybody who says “I can write correct multi threaded code” probably should be saying “I don’t test my multi-threaded code”.

It is very difficult to write correct multi-threaded code.

One way to appreciate this is various “find-the-bug” pop quizzes that occasionally come up.

Another approach that I’ll go after here is to look at the state space.

If single-threaded apps have O[n] execution paths where n = number of atomic operations,  just two threads have O[n!] paths or worse. (It’s not O[n*n], or even polynomial – see below). And n is usually a very large number.

Each of those execution paths is an angle to hit your program and find a bug. Maybe you can test against O[n], but you’re delusional if you think you can protect against O[n!] or worse.

15 is a small number, and 15 factorial is 1,307,674,368,000 = ~1.3 trillion. Testing a million cases is hard enough. “Trillion” may sound like an unrealistic number, but listen to some of the ship-stopper stress bugs that show up in product’s end games followed by the “what are the odds that could ever have actually happened” comments, and it’s not so unrealistic.

Practically, that means multi-threaded apps have so many states that you can’t possibly keep track of them all in your head. And even if you do, the next guy that goes and changes a line of code just ruined your model.


Counting the execution paths

Suppose you have a function that executes sequence of atomic operations:  A0, A1, A2, A3.

I want to just compare single-threaded and multi-threaded cases, so for now, let’s ignore loops, conditionals, etc. because that’s common across both and so it just complicates the comparison. (This simplification actually downplays the complexity of adding additional threads). In this model, there’s only 1 code-path through this sequence of N states: A0, A1, A2, A3. 

Now suppose you have a second function B0, B1, B2, B3 executing on another thread.

The threads can interleave these operations in any order. The convenient case would be (A0,A1,A2, A3, … B0,B1,B2,B3). But maybe it’s something goofy like: (A0,A1,B0,A2,B1,B2,A3,B3). If the bug only occurs when the pattern (A1,B0,A2, B1) is executed, good luck finding it.

It turns out for 2 sequences of lengths N and M, there are choose(N+M, N) ways to interleave them. That’s (N+M)! / ( M! N! ) execution paths.   (see quiz here ). Hence O[n!] for 2 threads.

You can see how much more complex just simple execution gets with threads. Now imagine how much more complicated threads make loops, conditionals, etc.


How many states?

Locks don’t make your code safe. They just reduce the number of states and hedge your bets.  At one extreme, 1 giant lock around everything can reduce you to 1 state per thread. If you put 4 states under 1 big lock (eg, A0…A3), and you put that same lock around every other state (eg, B0…B3), then you’ve reduced those 4 states to 1 state, thus eliminating 3 states.

You likely don’t control the number of states. An operation that appears atomic (like a memory read) may actually be multiple states. Maybe you’re looking at the source-line and the compiler split it out into multiple states. Or maybe there are windows where you communicate with the file-system, or other processes – and so they can be effectively injecting states into your state space. Or maybe the hardware has multiple states (usually around memory models).  That’s why there’s InterlockedIncrement(&i) instead of just doing i++.

Threading vs. Reentrancy

Now threading models aren’t the only way to rack up states. Single-threaded models (like UIs) that queue messages can create a ton of states too, and those tend to create nasty reentrancy bugs. And there will be times when another thread would actually produce fewer overall states then a single-threaded model. Just be wary.


What if the states independent from each other?

Granted many of the states may appear to be independent from each other. So one may claim that O[N!] is way to pessimistic. But unless you have a very thorough isolation model, it tough to prove the states can’t impact each other.

For example, states A0 and B1 may not directly interact, but maybe A0 impacts A1 which impacts B0 which impacts B1.  Maybe A0 impacts how much memory A1 allocates, and that impacts whether an allocation in B0 fails (especially if there’s a process wide memory allocator), and then the bug is in B1 handling the out-of-memory failure case.

And even if most of your states actually are isolated, factorial growth is so fast that you only need a small number before you’re doomed.

See also: Measuring how difficult a threading bug is.

Comments (8)

  1. Mike G says:

    Isn’t this a bit of a contrived, worst-case example?  The majority of threading I’ve done in my 10 year professional career has focused on one or two narrow areas of code that I need to run concurrently, and is usually or always completely separated from the main areas of execution.  The number of "code paths" is far less relevant than focusing on the areas where two threads interact.

    Blindly stating that anyone who "says they can write threaded code" should be saying "I don’t test my threaded code" is daft.

  2. Cory Nelson says:

    when you first start coding, you end up writing bad code.  over time you improve.  after a while it becomes second nature.  it’s the same thing with threading – people just need to stop being afraid and take some time to learn.

  3. Peter Ritchie says:

    How is a possible scenario "contrived"?  The difficulty that Mike’s talking about is keeping the data separated amongst the threads.  It’s when you don’t do that you get the increase in possible states.  If you’ve got more than a trivial number of states, you’re likely not testing all of them.

    But the same token, it’s a bit of a leap to assume everyone does have shared state in multi-threaded code and concluding "Anybody who says ‘I can write correct multi threaded code’ probably should be saying ‘I don’t test my multi-threaded code’"

  4. You are correct of course that there is great opportunity for problems when adding threading to an application.  However, this is considerably mitigated by well-designed code that avoids obvious pitfalls such as global variables and static state.  

    I am not a threading expert.  But I have had the opposite experience to you – I found multi-threaded code to be exceptionally easy!

    For example, late last year, I had a large Windows application, and decided to introduce some background data loading on separate threads.  Essentially, I quickly made a significant portion of the application multi-threaded.  The number of resulting bugs was minimal and easy to resolve.  The only real point of contention was dealing with collections of items – one thread adding or removing items while another was iterating over them.  

  5. jmstall says:

    I’m not talking about just putting a big lock around some collection class you wrote. Maybe you’ve got a thread-safe hashtable, but the logic above the hashtable still has to be thread-safe.

    Multi-threaded code must have some shared state: if the worker thread can’t impact the main thread, you wouldn’t even have worker thread doing any work at all.

    Now maybe the shared state is small. But even a few small shared states can lead to a ton of combinations which could be hiding some bugs.

    Many people that I’ve met who think they can write multi-threaded code either just apply it to trivial systems (a single collection class) or may be able to write code that’s correct enough to stay one step ahead of their testers, but would still break down under stress testing.

  6. jmstall says:

    I agree that sometimes thread is the way to go (it may be the only way to achieve performance, scalablity, you may need to be consumable by a multi-threaded client, it may actually reduce the overall number of states, etc)

    Let me take a softer approach. Beware that:

    – multiple threads introduce a whole new class of bugs that can be dormant for a while. It’s easy to appear that the app is good.

    – One way to appreciate the potential surface area for bugs is to count the new state space. (that’s the primary focus of this entry). It grows N!, not N or even N^2.

    – this creates a maintenance problem. Even if you actually figured out all N! paths, the next guy to own the code probably won’t.

    – good design (eliminating statics, etc) can greatly reduce N.

    – It’s much harder to prove that your multi-threaded code is correct. You maybe able to stay a step ahead of your testing, but that’s hardly the bar for correct.

  7. antosha says:

    The number of possible execution paths for two threads does not grow as n!. Obviously it is less than 2^n, because each time you are going to execute the next atomic operation, you have at most 2 options: take the next operation for the first thread or take the next operation for the second thread. Moreover, it is well known that choose(N+M, N) with the fixed sum N+M = n is at maximum when N = M = n/2 (assuming n is even, if n is odd then N and M differ by one). Using Stirling’s approximation for n!, one can see that choose(n, n/2) is asymptotically equal to 2^n/sqrt(pi*n/2).

    Therefore it would be more correct to say "grows almost as 2^n". Similarly, in case of k threads, the number of execution paths will be less than k^n.

  8. Yuhong Bao says:

    >Single-threaded models (like UIs) that queue messages can create a ton of states too, and those tend to create nasty reentrancy bugs.

    And that is why the COM STA threading model is bad. Every COM call that have to block pumps messages and create additional state. And bugs resulting from that may not be discovered until the COM object moves to another thread or process.