If you’re going to use an interlocked operation to generate a unique value, you need to use it before it’s gone


Is the Interlocked­Increment function broken? One person seemed to think so.

We're finding that the Interlocked­Increment is producing duplicate values. Are there are any know bugs in Interlocked­Increment?

Because of course when something doesn't work, it's because you are the victim of a vast conspiracy. There is a fundamental flaw in the Interlocked­Increment function that only you can see. You are not a crackpot.

LONG g_lNextAvailableId = 0;

DWORD GetNextId()
{
  // Increment atomically
  InterlockedIncrement(&g_lNextAvailableId);

  // Subtract 1 from the current value to get the value
  // before the increment occurred.
  return (DWORD)g_lNextAvailableId - 1;
}

Recall that Interlocked­Increment function increments a value atomically and returns the incremented value. If you are interested in the result of the increment, you need to use the return value directly and not try to read the variable you incremented, because that variable may have been modified by another thread in the interim.

Consider what happens when two threads call Get­Next­Id simultaneously (or nearly so). Suppose the initial value of g_lNext­Available­Id is 4.

  • First thread calls Interlocked­Increment to increment from 4 to 5. The return value is 5.

  • Second thread calls Interlocked­Increment to increment from 5 to 6. The return value is 6.

  • First thread ignores the return value and instead reads the current value of g_lNext­Available­Id, which is 6. It subtracts 1, leaving 5, and returns it.

  • Second thread ignores the return value and instead reads the current value of g_lNext­Available­Id, which is still 6. It subtracts 1, leaving 5, and returns it.

Result: Both calls to Get­Next­Id return 5. Interpretation: "Interlocked­Increment is broken."

Actually, Interlocked­Increment is working just fine. What happened is that the code threw away the unique information that Interlocked­Increment returned and instead went back to the shared variable, even though the shared variable changed its value in the meantime.

Since this code cares about the result of the increment, it needs to use the value returned by Interlocked­Increment.

DWORD GetNextId()
{
  // Increment atomically and subtract 1 from the
  // incremented value to get the value before the
  // increment occurred.
  return (DWORD)InterlockedIncrement(&g_lNextAvailableId) - 1;
}

Exercise: Criticize this implementation of IUnknown::Release:

STDMETHODIMP_(ULONG) CObject::Release()
{
 InterlockedDecrement(&m_cRef);
 if (m_cRef == 0)
 {
  delete this;
  return 0;
 }
 return m_cRef;
}
Comments (35)
  1. Mike Caron says:

    The fact that someone would make this mistake belies a fundamental misunderstanding of how concurrent processing works.

    The Release method is incorrect because there is nothing stopping two threads from calling Release() at the same time, and both running the m_cRef == 0 branch.

  2. Simon Farnsworth says:

    There's a use after free bug in the implementation of IUnknown::Release

    Imagine that you have two references to this object, each being used by a different thread; thread 1 calls Release, runs the InterlockedDecrement, and is then scheduled away (m_cRef is now 1). Thread 2 calls Release and runs to completion; when thread 2 hits the if statement, m_cRef is 0, and it deletes the object.

    When the scheduler then resumes thread 1, this is an invalid pointer, and you have no idea what m_cRef will be. If you're lucky, the memory has been reused, m_cRef comes out by chance as non-zero, and the return value of Release() is ignored. If you're unlucky, the memory hasn't been reused, so m_cRef is 0, and the delete is run twice. If you're really unlucky, the memory has been reused, m_cRef is non-zero, and the delete corrupts the heap. If you hit the worst available luck, a new object of the same type is using this block of memory, and you've deleted someone else's object out from underneath them.

    Additionally, you're breaking the contract for Release() – it's defined as returning the reference count after this release has run, so you may return the wrong value. As it's only meant for debugging, this shouldn't cause a crash, but may confuse someone trying to track down a bug.

    The fix is the same as in the blog entry; record the result of InterlockedDecrement and use it:

    STDMETHODIMP_(ULONG) CObject::Release()

    {

    LONG cRef = InterlockedDecrement(&m_cRef);

    if (cRef == 0)

    {

     delete this;

    }

    return cRef;

    }

  3. Joshua says:

    You cannot test lock-free algorithms for correctness. You have to prove them correct. Most people can't.

  4. Brian says:

    @Joshua: I think a tool like Microsoft Chess ( research.microsoft.com/…/chess ) should be able to test such algorithms.  It works by systematically testing every interleaving.

  5. Adam Rosenfield says:

    Oh thank goodness, I was worried something unusual had happened when the blog post didn't show up at 7am PDT.

    [There was a glitch in the autoposter. Sorry for the confusion. -Raymond]
  6. Phil says:

    Isn't there also a problem with the Release implementation where two threads decrement the value and it becomes -1?  In this case, we never free the object.

  7. Ian says:

    Raymond, your GetNextId() code would not work on Windows95

    [It also doesn't work on Win32s. But if you're targeting Win32s or Windows 95, you would have known this already. -Raymond]
  8. jim says:

    @Adam Rosenfield: Don't worry, Raymond was just busy checking out the new Ubuntu release and lost track of time :)

  9. Isn't there also a problem with the Release implementation where two threads decrement the value and it becomes -1?

    This can't happen. If two threads have a reference on the object, then the reference count was at least 2 before they started calling Release().

  10. Joshua says:

    @Brian: I did the math on it. There's not enough CPU power in the world to do the whole program. It might work on an extracted library. The basic math is something like (#threads)^(#instructions executed by longest running thread) It gets worse when the library uses the "wrong" opcodes, resulting in the chance of splitting a single CPU instruction.

  11. Joker_vD says:

    @JamesJohnston: Sometimes I am almost convinced that concurrency is just a fancy name for merging lists.

    @Maurits [MSFT] — You assume that AddRef() implementation isn't broken.

  12. Ian Boyd says:

        if (m_cRef == 0)

    Oh, damn. Be right back. i have to, uh, i forgot the thing…in the place…with the guy.

  13. Chris B says:

    @Adam: I had the same reaction. When I checked this at 10 ET, it wasn't there, and then I checked Ayende's blog, and nothing was new there either. I am so conditioned to expect those, that I had brief moment where I thought something was wrong with my internet connection or something! It was almost like one of those moments where you are in your car and the person beside you pulls out, and you think you are moving. To make it worse, I checked two sites, and it made me feel as if the cars on both sides of me pulled out simultaneously.

  14. dave says:

    re: IUnknown::Release

    I think that the programming theorist Joni Mitchell warned about this antipattern:

     Don't it always seem to go

     that you don't know what you've got till it's gone.

  15. @Joker_vD yes, the Release() implementation should assume that the AddRef() implementation is correct.

    You could, of course, do something like this:

    LONG Release() {

    ____ LONG ref = InterlockedDecrement(&m_ref);

    ____ if (ref < 0) { throw(…); }

    ____ ….

    }

  16. Joshua says:

    [It also doesn't work on Win32s. But if you're targeting Win32s or Windows 95, you would have known this already. -Raymond]

    Brings back memories. I still have a working Win32s environment.

  17. Mike [MSFT] says:

    There's also a distinct lack of memory barriers.

    I'm not sure which barriers, if any, InterlockedIncrement and/or InterlockedDecrement have or whether x86 strong memory model means barriers are unneccessary in this case, but relying on either of those helping would be brittle at best.

    Just use std::atomic<T> if you can, the guys who designed that thought long and hard about memory barriers:

    std::atomic<int> m_refCount;

    if(–m_refCount == 0)

    {

      delete this;

    }

    [see channel9.msdn.com/…/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2 for all the gory details]

    [The vanilla interlocked functions erect a full fence. There are special Release and Acquire versions that erect only partial fences. -Raymond]
  18. A mistake this elementary indicates that someone sprinkled interlocked functions throughout their code as if it were some kind of magic pixie dust that magically makes everything work, and they didn't stop to think about what the function actually *does*.

  19. alegr1 says:

    The possibilities are: The object never gets deleted; the object gets deleted twice; *and* deleted memory gets touched.

  20. Funny, as someone who did general tech support for domain specific programming language (Turing), I was always careful to verify it wasn't a interpreter bug.  The humiliation of having to profusely apologize to a customer after having told him repeatedly that the problem "had to be in his code somewhere" until I realized with horror that it was a bug in *my* compiler was enough to make me very, very careful about every absolutely denying the possibility of a interpreter bug.  

    Of course I didn't have MS's test team behind me, so it was only 99 times out of 100 that it was a user's bug.

  21. @Mike [MSFT]

    The interlocked functions are how we have been roughing it without a multithreaded memory model. But advertisement of the C++11 atomics is never a bad thing.

    At the very least, on x86 and x64, the interlocked functions would correspond to instructions with the LOCK prefix. So InterlockedIncrement would correspond to LOCK INC and InterlockedDecrement would correspond to LOCK DEC. The lock prefix guarantees that all outstanding loads and stores are serialised prior to the locked instruction executing. This is much the same as MFENCE.

  22. Did interlocked functions not exist on Windows 95 and Win32s…?  Win32s would make sense because it didn't support threading, but Windows 95 did support threading.

  23. AsmGuru62 says:

    @Crescent2k:

    How would LOCK INC return a value — which is what these functions do.

    I think there is more to implementation here and not just LOCK INC/DEC.

  24. @simon/Raymond,

    I always am puzzled by this?

    LONG cRef = InterlockedDecrement(&m_cRef);

    if (cRef == 0) // Lets say thread 1 is here and context switch happens to thread 2 which tries to do QI/AddRef and increments counter to 1. Now what?

    {

    delete this;

    }

  25. msdn.microsoft.com/…/cc839627.aspx

    ULONG CMyMAPIObject::AddRef()

    {

       InterlockedIncrement(m_cRef);

       return m_cRef; // Could return incorrect value if another thread also does addref.

    }

    ULONG CMyMAPIObject::Release()

    {

       // Decrement the object's internal counter.

       ULONG ulRefCount = InterlockedDecrement(m_cRef);

       if (0 == m_cRef) // If thread 2 did AddRef before this line, then it doesn't delete self. If thread 2 does AddRef just after this line, then again same problem as above. Am I missing anything obvious?

       {

           delete this;

       }

       return ulRefCount;

    }

  26. alegr1 says:

    @James J:

    Windows 95 and Win32s has to run on 80386 processors that didn't have XADD and CMPXCHG instructon. Interlocked operations that return the result require one of those.

    @Santosh Sampath:

    The rule of proper COM programming is: if you have a pointer to object, you hold a reference to it. When you give the pointer away, you call AddRef. When you're done with it, you call Release. Thread 2 is supposed to have a reference which would prevent the refcount from going to zero. Please keep that in mind when you write your programs.

  27. Thread 2 doesn't even have the pointer yet. Lets say it has called some function which tries to return the pointer by doing an Addref. What happens if refcount was just found to be 0 by first thread?

  28. alegr1 says:

    @Santosh:

    The MSDN example is incorrect. It's supposed to use ulRefCount.

    Also, you may not call AddRef (or any other member function) if you don't have a reference. Just having a pointer is not enough, you need to own a outstanding reference.

  29. @alegr1, Sorry for confusion and probably offtopic. Thanks for being patient.

    I was thinking of a global ref counted single instance object shared between two threads.

    I think COM objects would need to be marshaled if two threads are in different STA.

    However if both threads use free threaded apartment, wouldn't it be unsafe to access global object without any lock.

    For e.g. CoCreateInstance returns the same object for two threads.

    [Ignoring the CoCreateInstance issue, if you keep a single global pointer to an object, and if one thread releases it while another tries to addref, then you have a bug in your app. Because if the release occurs before the addref, you are trying to addref a pointer to which you do not have a valid pointer. (The other thread invalidated it by releasing it.) Note that there is no concurrent release/addref race here. The race is in the app. -Raymond]
  30. alegr1 says:

    However if both threads use free threaded apartment, wouldn't it be unsafe to access global object without any lock.

    When CoCreateInstance returns an interface pointer, the pointer comes with an outstanding reference.

  31. @AsmGuru62:

    Of course, but that is the reason why I started it off with "At the very least". It was about how even the simplest implementation of a locked increment has memory fences naturally, nothing more, nothing less.

  32. alegr1 says:

    >Thread 2 doesn't even have the pointer yet. Lets say it has called some function which tries to return the pointer by doing an Addref. What happens if refcount was just found to be 0 by first thread?

    Let me repeat that again:

    *Before* you give a pointer away (for example, pass it somehow to thread 2) you *must* call AddRef (QueryInterface is also an option). You *may not* call AddRef if you don't own a reference already. This way it's impossible to collide the last Release with AddRef.

    You should NOT simply pass the pointer to thread 2 and have the thread 2 call AddRef at its leisure. That would be a big error. It's like throwing a lifeline to a person in distress, assuming they have to also secure the other end of it.

  33. 640k says:

    @AsmGuru62: dec instruction updates flag register. if the value was decreased to zero, it's detectable.

  34. foo says:

    The company I work for once shipped code with pretty much that implementation of Release. The bug was quickly exposed and fixed once dual CPU Pentium Pro machines started to appear on client sites.

  35. John Doe says:

    @Santosh_Sampath, that shouldn't happen, because you're not supposed to have non-outstanding pointers.

    However, when implementing multiple interfaces with composition, or cached throw-away/tearoff/lazy interface pointers, or even when using COM aggregation, this is exactly what may happen: the delegated interface object may be in the middle of a Release() which will delete itself, while the outer/owner object still has a non-null pointer to it which will be returned and AddRef()'ed on QueryInterface().

    Without givin much thought to it, I think the only safe option here is to use proper locks around IUnknown's methods, however inefficient that might be.

    @Raymond, ignoring the CoCreateInstance issue is ignoring the (probably) most significant place where the issue may happen. The details of the timing of COM-loaded DLL unloading are scarce.

Comments are closed.