What’s wrong with this code, part lucky 13

Today’s example is a smidge long, I’ve stripped out everything I can possibly imagine stripping out to reduce size.

This is a very real world example that we recently hit – only the names have been changed to protect the innocent.

I’ve used the built-in C++ decorations for interfaces, but that was just to get this stuff to compile in a single source file, it’s not related to the bug.

extern CLSID CLSID_FooDerived;
__interface IFooBase : IUnknown
    HRESULT FooBase();

class CFooBase: public IFooBase
    LONG _refCount;
    virtual ~CFooBase()
        ASSERT(_refCount == 0);
    CFooBase() : _refCount(1) {};
    virtual HRESULT STDMETHODCALLTYPE QueryInterface(const IID& iid, void** ppUnk)
        HRESULT hr=S_OK;
        *ppUnk = NULL;
        if (iid == IID_FooBase)
            *ppUnk = reinterpret_cast<void *>(this);
        else if (iid == IID_IUnknown)
            *ppUnk = reinterpret_cast<void *>(this);
            hr = E_NOINTERFACE;
        return hr;
    virtual ULONG STDMETHODCALLTYPE AddRef(void)
        return InterlockedIncrement(&_refCount);
    virtual ULONG STDMETHODCALLTYPE Release(void)
        LONG refCount;
        refCount = InterlockedDecrement(&_refCount);
        if (refCount == 0)
            delete this;
        return refCount;

class ATL_NO_VTABLE CFooDerived :
    public CComObjectRootEx<CComMultiThreadModel>,
    public CComCoClass<CFooDerived, &CLSID_FooDerived>,
    public CFooBase
    virtual ~CFooDerived();




As always, tomorrow I’ll post the answers along with kudos and mea culpas.

Edit: Fixed missing return value in Release() – without it it doesn’t compile.  Also added the addrefs – my stupid mistake.  mirobin gets major props for those ones.

Comments (35)

  1. Anonymous says:

    Your casts in the QI are broken. They work so long as IUnknown : IFooBase are at the beginning of the vtable, but … I wouldn’t want to be the one depending on a pointer from that code.

    The casts should be something more along the lines of "(void *)(reinterpret_cast<IFooBase *>(this))" and "(void *)reinterpret_cast<IUnknown *>(this);

    You are also not properly incrementing the refcount on the object before you return it.

  2. Anonymous says:

    Isn’t there going to be a problem with doing this:

    > *ppUnk = reinterpret_cast<void *>(this);

    for a class which is in a multiple-inheritance tree?

    I think you really want:

    > *ppUnk = static_cast<IFooBase*>(this);

  3. Anonymous says:

    Also, you should technically be checking ppUnk for NULL before assinging to *ppUnk… (but I doubt this was the answer you were seeking in relation to the core problem).

  4. Anonymous says:

    Oh – another problem. According to MSDN the InterlockedIncrement() in the AddRef() isn’t going to do what you want on Win95, WinNT3.51 and earlier.

  5. Anonymous says:

    mirobin: I’m pretty sure you want static_cast<>(), not reinterpret_cast<>().

    reinterpret_cast<>() is generally considered ‘evil’. (Or at least should be)

  6. Anonymous says:

    The first thing that strikes me is that you are not AddRefing on your QueryInterface, but I don’t think thats the bug you were looking for. Also it seems like a bad idea to implement all the IUnknown methods in CFooBase because ATL has its own internal implementation of IUnknown which you are trampling on. Also in ATL the ref count starts at 0 and its up to the ATL library to give you that initial AddRef. Here you are assuming an initial ref count of 1, which has always been problematic for me in the past.

  7. Anonymous says:

    I think Aaron hit it. With reinterpret_cast you are always going to be casting to ATL’s IUnknown impl (which will happen to occupy that slot because of the way ATL is structured), so your IUnknown impl will never be called.

  8. Anonymous says:

    Aaron: reinterpret_cast<> can certainly let you do some fun & evil things, but in this case its usage should be semantically the same as static_cast<>. Granted, I’ve never really pinned down any functional distinction between the two casts other than errors thrown by the compiler when compiling …

    Lonnie: if that is indeed happening in this case, I believe you’ll see the same behavior with static_cast<>.

    You are probably correct about the initial refcount, though as the factory code isn’t being shown here we can’t really tell for certain if its a problem.

  9. Lonnie, the reinterpret_cast isn’t the problem (although mirobin’s comment in the beginning is also valid – for this case it’s not a problem, but in general it is a good idea (especially if you’re dealing with multiple inheretance)).

    But you’re on the right track…

    Aaron, I’m only concerned with operating systems that are currently supported (which doesn’t include any of the Win9x operating systems).

  10. Anonymous says:

    Re: reinterpret_cast<>() vs static_cast<>()

    In general, for simple types (non-multiple inheritance) reinterpret_cast<>() and static_cast<>() are indeed semantically the same. However, for complex types (which we’re dealing with here) the difference is that:


    is the same as:


    which throws away any compiler known type information. static_cast<>(), however, may need to offset the pointer when doing the cast because of the difference in vtbl and local variable offsets between the various types.

    Looking back at the code, however, I think that in this case you’re right – since the reinterpret_cast<>() occurs at a class which has single inheritance there will be no difference.

    However – that makes them the same as reinterpret_cast<void*>(this).

    Maybe the problem is more with the ATL code? I’ve never used ATL, but according to the spec BEGIN_COM_MAP() is going to overload InternalQueryInterface(). Does the base class’ QueryInterface() ever get called or do we lose the IUnknown query ability?

  11. Anonymous says:

    So a quick look at MSDN seems to indicate that we want the com map to actually be something like:





    or we lose the handling of interfaces in IFooBase::QueryInterface().

  12. Aaron, that’s a good idea (I’d not even realized that the COM_INTERFACE_ENTRY_CHAIN macro existed – that’s cool) but you’ll note that IFooBase doesn’t have a COM map.

    You’re on the right path with your previous comment, the problem is specific to ATL. It’s not the BEGIN_COM_MAP that’s the issue though. It’s somewhere else in the sample.

  13. Anonymous says:

    I just noticed that the destructor for CFooBase isn’t a virtual function, but I doubt that is near the core of the problem either.

    Re: casting … Aaron, that makes a ton of sense (why can’t books describe it like that? :)). I’ve tried for a few years to try to interpret docs to figure out what that stupid operator really does under the hood, but

    nothing described it nowhere as clear as you did.

    Armed with my new knowledge, I’d agree that it would be general better practice for static_cast to be used, but (as has already been stated) in this case the usage is harmless.

    The problem that Larry is trying to focus on seems to be the interaction of a base class implementing IUnknown while using ATL. I know I’ve read about something like this before, but I’ll be damned if I can remember what it was.

    We know from Lonnie’s post that setting the initial refcount to 1 instead of zero likely results in an object that never gets destroyed … or does it? Because ATL provides its own implementation of IUnknown…

    I don’t suppose that you get different "implementations" of IUnknown depending on how you retrieve the IUnknown *? (meaning that you have two separate ref counts, leading to all sorts of fun about when the object gets destroyed).

    I’m grasping at straws at this point. Multiple inheritance isn’t my strong suit…

  14. Anonymous says:

    Mirobin: That’s a great point about the multiple reference counts. (This is what I get for always rolling my own instead of using ATL).

    Looking at the docs, CComObjectRootEx seems to imply that it has it’s own refcount. That’s not a problem as long as you never use the IFooBase reference counts.

    That would imply that you really want is something really nasty like:

    virtual ULONG STDMETHODCALLTYPE AddRef(void)


    IUnknown* p;

    if(SUCCESS(QueryInterface(IID_IUnknown, &p)) && (p != this))


    return p->AddRef();



    return InterlockedIncrement(&_refCount);


    and something similarly disgusting for Release().

  15. Aaron, you are SO close on this one, I can taste it.

    One more hint: I pared out every line I could possibly pare out of the example. There’s something that everyone has overlooked so far that’s critical to the bug.

  16. Anonymous says:

    Hm. Most of this conversation is beyond my current C++ level, however just reading msdn, does the DELCARE_PROTECT_FINAL_CONSTRUCT macro get rendered useless by your implementation of Release() ?

  17. Anonymous says:

    Three things:

    1. Multiple reference counts (as said by others)

    2. FooBase() isn’t implemented 🙂

    3. ~FooBase() is private! I’m surprised that this compiles – a private destructor should prevent inheritance, invocation of delete as well as creation of objects on the stack…

    4. I think the problem is the ASSERT in the destructor: It will always fire due to #1 and the fact that ATL does some funky things with the vtables.

    If that ain’t it I won’t find it…

  18. Anonymous says:

    The first and most obvious thing that struct me is the c’tor of CFooBase sets refcount to 1, but the d’tor expects it to already be zero.

    I’d expect that assert to always trigger, if class allows you to create an instance on the stack, doing:

    { CFooBase barf; }

    This is (perhaps falsely) assuming the private d’tor isn’t disallowing this.

    The reinterpret_cast’s are not only ugly, they are nuts. Every pointer is freely convertible to a void*. I’d even go so far as to say the argument "ppUnk" to QueryInterface is an error (though, by necessity parrotting the original design-error in COM). It shouldn’t be "void**". It misses a const. I leave it to the reader to figure out where. 🙂

    But as Larry mentioned this is about an ATL thing. I think ATL_NO_VTABLE provides a clue, if it indeed is the "__declspec(novtable)" I think it is. Inheirit from that and you’d get "d’tor of CFooDerived? What d’tor? Bah, I’ll just call the d’tor of …". Right. Of what? 🙂

    COM… 🙂

  19. Anonymous says:

    I’ll go with what Mike said, the ctor will set _refcount to 1. But the end result of that will not be the desctructos ASSERT triggering but the fact that Release will never release the object, as long as it’s called as many times it should be. It should be called as many times as AddRef is resulting in _refcount only decreasing to 1, never to 0.

  20. Mike Dunn says:

    I haven’t read thru all the other comments (yeah I’m lazy, Me.Sue()) but the _refcount is way suspicious.

    More so when you consider the usual way of instantiating an ATL COM object:


    CComObject<CFooDerived>* pObj;

    HRESULT hr;

    hr = CComObject<CFooDerived>::CreateInstance(&pObj);

    if (FAILED(hr)) bail_out;



    Uh oh, what does that AddRef() do? _refcount is now 2. Bad times.

    The caller using CFooDerived shouldn’t be bothered by impl details in CFooBase, yet here it is.

    Also, doesn’t your QI() have to cast ‘this’ to IUnknown* to ensure the caller sees the right vtbl? (Not positive on this one, this is one of those black-magic areas of C++ that I haven’t completely groked yet.)

  21. Anonymous says:

    Unlike MFC (old MFC, I haven’t looked at it in a while), ATL assumes that a newly created object’s refcount starts at zero so the initial QI for the request yields a refcount of 1. It isn’t a problem with either piece of code, it is a problem of the combination of the two.

  22. Mike Dunn is 80% of the way to figuring out the problem. You’re all right, that the issue is in the destructor (the assert firing).

    But the question is: Why?

    Cogetate on Mike’s code snippet, you should see what’s happening from the snippet.

  23. Anonymous says:

    The docs here probably have the "key" to the problem:


    "If your object is not aggregated, IUnknown is implemented by CComObject or CComPolyObject. In this case, calls to QueryInterface, AddRef, and Release are delegated to InternalQueryInterface, InternalAddRef, and InternalRelease of CComObjectRootEx to perform the actual operations."

    So we know that when we Addref/release the the CComObject<> doodad, we’re manipulating the ATL internal refcount. If that ATL refcount hits zero, the object gets deleted while the CFooBase refcount in non-zero, firing the assert.

  24. Anonymous says:

    I was just doing some other reading on the actual implementation of CComObject … the class is defined as something like:

    template <class Base>

    class CComObject : Base




    Meaning that it is derrived from CFooDerived, which means that CComObjectBase’s implementation of IUnknown "wins", explaining why the base class’s version of IUnknown isn’t called…

  25. BingBingBingBingBing!

    Say the magic word and you win a prize!

    mirobin nailed it. An object derived from CComObjectRoot doesn’t have an addref/release/QI implementation, instead those are provided by the CComObject<subobject> wrapper object.

    So the base implementation of IUnknown is totally dead code. And thus the addref/release behavior is ignored.

  26. Oh, but I forgot one more thing…

    Where’s the code that instantiates the CComObject?

    It’s NOT left out.

  27. Anonymous says:

    OBJECT_ENTRY_AUTO – References the class factory in the CComCoClass template. The CreateInstance method does the work of creating the CComObject for you.

  28. Anonymous says:

    BTW, you are a tricky *****. 🙂

    However, I see this type of bug all the time. Many times when I help people debug problems, it comes down to "Is your routine even being called?"

  29. Universalis says:

    In other words, why using frameworks is a Bad Thing. You create code with 1 bug in it instead of 21 bugs, but that 1 bug takes 50 times as long to identify.

  30. Anonymous says:

    Who said this bug took 50 times as long? A quick run of the debugger should have tracked down the bug in a few runs. Also, with the assert, the bug should manifest quickly.

  31. Anonymous says:

    I’m bothered by the fact that the destructors for both CFooBase and CFooDerived are virtual and private. Does that even make sense in C++?

    Ignore the other bugs for a moment and assume that CFooBase::Release executes delete this. The destructor is virtual, so you’d expect the derived class’s d’tor to be called and then the base class’s. But since the base class’s d’tor is private, the derived class can’t provide an implementation. So I’d expect the delete this in CFooBase::Release to call the CFooBase d’tor, which will lead to bugs if anything in CFooDerived’s d’tor is important.

    And is it a contradiction that CFooDerived has a virtual function but is declared with ATL_NO_VTABLE? What does that even mean?

    ATL seems like a great example of Joel Spolsky’s Law of Leaky Abstractions. http://www.joelonsoftware.com/articles/LeakyAbstractions.html

  32. Anonymous says:

    I loved this particular &quot;What’s wrong&quot; because it was quite subtle (and real world).

    One of the developers…

  33. Anonymous says:

    Making a destructor virtual in C++ is essential. If don’t, the proper destructors will not get called while deleting an object (think about how inheritance works in C++ and this should make sense).

    Making the destructor private is a good idea with refcounted classes, as it prevents code from manually deleting objects. As the destructor is virtual, you end up hitting the correct destructor regardless of where the delete is called from.

    ATL_NO_VTABLE tells the compiler not to write code in the constructor/destructor to create/destroy the class’s vtable. ATL generates the vtable at some point instead.

  34. Anonymous says:

    mirobin: I understand why you use virtual on a d’tor and why you make some base class methods private. My point was the *combination* of them on the d’tor is nonsensical. When you say virtual, you’re explicitly giving derived classes the ability to override the method. When you make a method private, you’re hiding it from derived classes. This is a contradiction.

    And apparently MS VC++ .NET 2003 agrees with me. When I try to compile the example, it complains on the declaration of CFooDerived::~CFooDerived.

    error C2248: ‘CFooBase::~CFooBase’ : cannot access private member declared in class ‘CFooBase’

    Thanks for the clarification on the ATL_NO_VTABLE. I always thought that it meant the object won’t have a vtable at all, as that’s what it said in an ATL tutorial I read.

  35. Anonymous says:

    Offtopic, I never got along well with the suicidal nature of "delete this" in COM…