Why Threads Are A Bad Idea


My friend Tim Dodd found this presentation back when we worked together at ISS somewhere around ’96-’97. It’s by John Ousterhout, who worked at Sun Microsystems Laboratories – the deck is dated 9/28/95. We found it hilarious, because we worked with a lot of highly threaded code on a daily basis, and didn’t think we were doing anything really special. The really funny thing is the slide where the author claims that threads are “too hard for most programmers to use”, and “Even for experts, development is painful.” The diagram has a range from casual programmers to wizards, with thread programmers at the high end of the wizard range. So we must be wizards! Tim really is a wizard, though he’s far too modest to label himself as such.


I recently ran into a bug in my code that made me think that maybe thread programming is for wizards – here’s the problem – if you’re dealing with a named pipe asynchronously, one of the best ways to do it is using I/O completion ports – really efficient, and I used that in the sample service I put into Writing Secure Code for Vista. In the piece of code I was working on, there could be at most one read pending against the pipe, but possibly a lot of writes. If the client goes away without warning, you get messages posted to the port for each one, and you want them all to clear before you reset the pipe for a new client. I’d put in a simple reference counter class using InterlockedIncrement and InterlockedDecrement, and for sanity, I asserted that the ref count should never go to less than 0. Of course, I hit the assert… The code looked about like this:


if( WriteFile(…. ))
m_WriteCount.Add();


So what happened?


As it turns out once you go down into WriteFile, somewhere down in kernel mode, likely in NtWriteFile, the write happens, it causes the message to get posted to the port, and then your other thread, which these days has a whole processor of its own, comes out of its wait, and is able to call m_WriteCount.Release() before the other thread can even get back up the call stack and call Add(). Ouch.


So while IO completion ports are really cool, and highly efficient, the mixture of these and modern multi-core systems add up to some serious race condition potential. It also calls out the need for test rigs – the test rig was able to force this failure reliably (though it still took well over 10k iterations), where the app I’d bolt it into might not find this easily, if ever, and getting repro steps would be a ton of fun.


 Oh – found a link to the presentation – http://home.pacbell.net/ouster/threads.pdf


 


Comments (5)

  1. Felix von Leitner says:

    A reference count is conceptually similar to a lock.  So yeah, you have to lock first, then manipulate the resource.  It’s kind of obvious if you think about it.  The same is true for non-threaded code, too, if it handles asynchronous events.

    Most people have no business writing threaded code, but not just because it is more difficult to get right, also because it is more difficult to debug and more difficult to audit.  It’s making everyone’s life harder.  If you don’t have overwhelming clear benefits from using threads, don’t.

    [dcl] Well, yes, which is why I used InterlockedIncrement and so on. All bugs are obvious in 20-20 hindsight. The problem I see is that people do have business writing threaded code, since that’s how you’re going to see perf improve. The obvious problem is that it is difficult, and prone to a whole class of subtle errors and locking design issues. Though I do think that lesser wizards can do it with some education…

     

  2. Foolhardy says:

    To me, this is a pretty obvious race condition. WriteFile, during execution connected to an IOCP, may queue a completion packet at some undefined time after the call starts. It won’t queue anything if it returns an error code.

    In the old code, there’s a race between the function returning and the packet being queued, but the code requires the function to return first so it can add the reference. There’s a window where the packet can release a reference it doesn’t have yet, hence the negative ref count.

    WriteFile can’t possibly know when the next statement in your code is going to execute, so there’s no way it’s going to wait for it to execute before posting the completion packet. Therefore, you have to increase the ref count BEFORE the call to WriteFile, and dereference it locally IIF an immediate error is returned (i.e. no completion packet to dereference it is forthcoming). That way, the reference is guaranteed to be added before the packet can be queued and executed.

    m_WriteCount.Add();

    if(WriteFile(…) != ERROR_SUCCESS)

     m_WriteCount.Release();

    //else packet executor will release

    [dcl] Yes, and that’s exactly how I fixed it. BTW, WriteFile returns BOOL, not DWORD. The thing I found interesting about this particular scenario was that the combination of multi-proc and completion ports makes the race condition a near certainty – it’s an error to have the race condition at all, even if you weren’t using completion ports, but with other designs, you might avoid it from Pure, Dumb, Blind Luck, where this design holds your feet to the fire.

  3. It can easily happen on a single-proc system, particularly if the waiting thread is running at a higher priority than the writing thread.

    Multi-threaded code can be complex, but there are plenty of cases where trying to do multiple tasks simultaneoulsy within a single thread is far more complex and error-prone.

    [dcl] You can get into tricky situations doing things asynchronously, even with one thread. This bug didn’t rear its ugly head until I started working with multiple threads. This is why all the devs on the ISS Scanner all had dual-proc systems 10 years ago – we’d find the race conditions that way. Some things were only found on the faster systems.

    I try not to muck with thread priorities for exactly that reason, but the OS will tweak them for you in some cases – pending IO, being foreground, etc. will all do it.

  4. Jon W says:

    [quote]Though I do think that lesser wizards can do it with some education.[/quote]

    That education, in effect, turning them into greater wizards.

    It really is true: It’s easy to slap forms on a control, and it’s even relatively easy to create well-formed SQL statements. Doing systems programming, however, is a totally different ball of wax, and the two kinds of programming are very different, requiring different skill sets. Thirty years ago, the industry knew this (you had “systems programmers” and “application programmers”) but these days, the distinction seems to be lost on many people.

    Maybe it’s a case of “you don’t know enough to know how little you know…”

    [dcl] Clearly the challenge we all face is that in a multi-core world, effectively using threads is going to be critical to getting good perf. We can’t count very much on getting faster because of cycles. I’m sure that initially, people with lots of thread experience will be in high demand, but over time, we need to make ways for more average people to get good results. It’s exactly the problem we have with a lot of facets of security.

    I also believe that while threading techniques may be a specialty today, lots of this stuff isn’t rocket science (and I know rocket science – MS in AE). Though my example is a counter-point – I’ve been doing threads for 12 years, have a lot of experience, and can avoid most typical mistakes, but I missed this one.

  5. Interesting presentation on the topic. I’m not an expert programmer but have been experimenting with threads in VB.NET mainly because I hate waiting … certainly makes the programming a lot more complex… but when done properly the result can be great.