If there’s already a bug, it’s not surprising that there’s a possibility for error


It's great to think about all the things that can go wrong but you also have to think about the situations that could lead to those bad things. In particular, you have to recognize when you are trying to avoid a bug that is ultimately outside your component and which you can't fix anyway.

For example, consider this multithreaded race condition:

Why is InterlockedDecrement used in the implementation of IUnknown::Release? The only reason I can think of is for multithread safety. But that Release function doesn't look multithread safe—what if another thread was about to increment m_cRef? Does the AddRef refcount incrementer have a special interlocked check for zero to catch this case?

What if another thread was about to increment m_cRef? In other words, what if another thread was about to call IUnknown::AddRef? In other words, you have two threads and an object with a refcount of one. One thread calls Release and the other thread calls AddRef. The concern is that the thread calling AddRef may execute after the thread that calls Release, thereby "rescuing" the reference count from zero back to one.

But this scenario you're worried about is already a bug. Suppose the second thread runs just a smidgen slower than the scenario you described, calling AddRef after the Release returns instead of while it is executing. Well, now, that's obviously a bug in the caller, isn't it? It's using a pointer after destroying it.

This happens a lot: You're worrying about not introducing a bug into a hypothetical situation that is already a bug. The answer to that is "Fix the original bug."

In this specific situation of reference counting, a useful rule of thumb is "If you're worrying about the possibility of a reference count incrementing from zero to one, then you already have a bug somewhere else."

Comments (17)
  1. Frederik Slijkerman says:

    In other words, like with any method on an interface, you may only call AddRef if you own a reference to the interface.

  2. jeffdav says:

    Considering that the vast majority of COM objects are STA anyway, it seems to me that using the Interlocked functions for those objects is a waste since you should not be calling the IUnknown methods from different threads (unless you marshal, in which case access will be serialized and you still don’t need atomic inc/dec).

  3. John says:

    Serialization through STA doesn’t help in this case.  The fundamental problem is that you have two threads "simultaneously" accessing an object with a refcount of 1, and one of the threads is calling IUnknown::Release().  Depending on which call is executed first, either nothing will go wrong or you will crash:

    TID1: p->AddRef(); // refcount = 2

    TID2: p->Release(); // refcount = 1; ok, object is still alive

    TID1: p->Release(); // refcount = 0, object is deleted

    TID2: p->AddRef(); // crash, object has been deleted

  4. Gabest says:

    If you release an object and have not addref’ed it before (and TID2 did not), it may be already deleted. Bug in the program.

  5. Ken Hagan says:

    "using the Interlocked functions for those objects is a waste"

    That’s true, but I think you’ll struggle to measure the cost. The interlocked APIs generally boil down to compiler intrinsics and on x86 that basically means exactly the same code as the naive implementation but with a carefully chosen LOCK prefix on one of the instructions.

    On the other hand, accidentally using the non-interlocked version for a non-STA object is an accident waiting to happen.

  6. orcmid says:

    This reminds me of the companion bug which is in lots of code.  Even if you use an interlocked decrement in Iunknown:Release, it sort of breaks everything if you then return the reference counter when finding out you are not the last releaser.  Because it could have been release behind you and you used an invalid this pointer.  I figure there must be just a few of those actually happening every day.

    [See all those MSDN and Inside COM examples where the Release implementation returns m_cref after it tests non-zero in the preceding decrement operation.]

  7. orcmid says:

    The only implementation code that should ever use the interface (and the related this pointer) after decrementing in IUnknown:Release is the code that witnessed the reference counter becoming zero.

    That is the only code for which the this pointer is assured to still be valid assuming all AddRef and QueryInterface invocations only use still-valid interface pointers.

    Based on prevalent examples, there may be a lot of code around that breaks that rule.

  8. Simon Buchan says:

    @Dennis: you are referring to this?

    STDMETHODIMP_(ULONG) CFoo::Release()

    {

       if (!::InterlockedDecrement(&m_cRef))

            delete this;

       return m_cRef; // Bug?

    }

    Thread 1: InterlockedDecrement(), return 1

    Thread 2: InterlockedDecrement(), return 0

    Thread 2: InterlockedDecrement(), delete this

    Thread 1: return *this->*m_cref

    This is *technicly* a bug, but is very unlikely to manifest without some fancy allocation of your object, since the this pointer is actual local to each call, and not modified by thread 2.

    I can see this causing an actual problem in two cases, thread 1 returns 0, not one, so both callers of Release() handle it’s deallocation, or if this was the last allocation in a page and the heap VirtualFree()s it.

  9. Goran says:

    Denis, Simon, isn’t it amazing how many broken examples exist? To the benefit of writers of such code, such things were less dangerous before prevalence of SMP machines. But today, it really is dangerous.

    Indeed, e.g. ATL does the right thing (goes through a temporary value, *not* through a member variable). And with correct increment/decrement primitive, too!

    COM is way too complicated for the basics to be written (over and over again) by mortals like me :-(

  10. Christian Kaiser says:

    "This is *technicly* a bug, but is very unlikely to manifest without some fancy allocation of your object, …"

    Well, in my DEBUG code, delete() will overwrite the memory area, thus m_cref becoming 0xcdcdcdcd (or whatever the code uses to overwrite the area). This is done purposely.

    Nope, this behaviour (a memory area is still valid after being freed) should never be relied on.

  11. Simon Buchan says:

    @Christian: That will still work, though.

    The only meaningful results from Release() are

    0 (You just deleted me) and non-0 (You didn’t delete me, but you can still access me), and the freed data value (0xfeeefeee, I think: 0xcdcdcdcd is un-initialised) is correctly non-0.

    And I never said you should *rely* on it :).

    [Um, no, that’s what a non-zero return value means. -Raymond]
  12. It is absolutely a bug to refer to m_anything after "delete this".

    It is absolutely not a bug to take action based on Release() returning 0.

    It is very probably a bug to take action based on Release() returning non-zero, because it is usually possible that all the other holders of references to the object could simulateously decide to release their references in between you saying "hey, look, there are outstanding references to this object" and "… so that means I can…"

  13. It is absolutely a bug to refer to m_anything after "delete this".

    Even this is a bug:

    STDMETHODIMP_(ULONG) CFoo::Release()

    {

      if (!::InterlockedDecrement(&m_cRef))

      {

           delete this;

           return 0;

      }

      return m_cRef; // Bug? Yup.

    }

    … because the object could be deleted on a different thread after the if (…) but before the "return m_cRef;".

  14. Kujo says:

    Ah, code that supports 9x.  Nowadays, you’d trust the return value of InterlockedDecrement:

    LONG ret = ::InterlockedDecrement(&m_cRef);

    if ( !ret )

    {

     delete this;

     return 0;

    }

    return ret;

  15. orcmid says:

    So now the question is, since the return value is defined to be unpredictable anyhow, why not do something as simple as

    {

     if (!::InterlockedDecrement(&m_cRef))

     {

          delete this;

          return 0;

     }

     return 1; // or ~0 or anything handy not 0

    Especially, since any returned value of the decremented m_cRef refers to a state of the world that is not assured to exist by the time the return happens.

    All it tells you now is that your thread did the final release or it did not.  

    The only way you can count on even that is if your code is based on knowledge of how the implementation of Release behind the interface was done.  

    I can imagine an use for that in custom factory code, but haven’t tried it.  I should.  It might eliminate some convolutions in COM-implementing code of mine that I tend to worry about.  The code is in need of serious refactoring anyhow.

  16. Jack says:

    At what point is the bug no longer ‘yours’ ?

    Perhaps the author of InterlockedDecrement had the same mentality as you.

    In that case, IUnknown::Release introduces the bug, not my code that uses IUnknown::Release ‘incorrectly’.

    Don’t like that scenario ?

    Ok, so , the author of InterlockedDecrement then relies upon the CPU to do it’s locking correctly, but wait, it’s multi-core, what if the other core is about to adjust the same lock.  And so on, and so on.

    You’re right, you already have a bug, and you’ll never find it, you can only prey it doesn’t kill people.

  17. Simon Buchan says:

    Yeah… resurrecting a week-old thread is bad. Sorry I’m late.

    [Um, no, that’s what a non-zero return value means. -Raymond]

    Huh? 0 means “This Release() call deleted the object”, and non-0 means it wasn’t (but don’t go poking, it could be deleted by someone else), right? So what is the ‘that’ in your reply? (I beleive my previous meaning for non-0 was misstated, sorry about the confusion).

    Oh, and I seem to have been unclear about this: yes, it *is* a bug. I just don’t think a real application could get it to manifest (ie: crash).

    [Your original description of a non-zero return value was “You didn’t delete me, but you can still access me.” You still can’t access it because it could be deleted by somebody else. -Raymond]

Comments are closed.

Skip to main content