What’s wrong with this code, part 14


Keeping up with the “theme” of COM related bad code examples, here’s another real-world example.  To avoid any ATL confusion, it’s 100% pure C++.

Our test team rolled out a new set of tests last week and immediately hit this one.  The funny thing is that the code in question had worked  without flaw for two years before this.

Obviously this has been abstracted to the point of ridiculousness, there’s just enough here to demonstrate the bug…

struct FooConfig
{
    int _Value1;
    int _Value2;
};

[
    object,
    uuid("0A0DDEDC-C422-4BB3-9869-4FED020B66C5"),
    pointer_default(unique)
]
__interface IFoo : IUnknown
{
    HRESULT GetFooConfig([out] struct FooConfig **ReturnedFooConfig);
};

FooConfig _GlobalFooConfig = { 1, 2};

class CFoo: public IFoo
{
    LONG _refCount;
public:
    CFoo() : _refCount(1) {};

    // IFoo
    HRESULT GetFooConfig(FooConfig **ReturnedFooConfig)
    {
        *ReturnedFooConfig = &_GlobalFooConfig;
        return S_OK;

    }

    // IUnknown
    virtual HRESULT STDMETHODCALLTYPE QueryInterface(const IID& iid, void** ppUnk)
    {
        HRESULT hr=S_OK;
        *ppUnk = NULL;
        if (iid == __uuidof(IFoo))
        {
            AddRef();
            *ppUnk = reinterpret_cast<void *>(static_cast<IFoo *>(this));
        }
        else if (iid == IID_IUnknown)
        {
            AddRef();
            *ppUnk = reinterpret_cast<void *>(static_cast<IUnknown *>(this));
        }
        else
        {
            hr = E_NOINTERFACE;
        }
        return hr;
    }
    virtual ULONG STDMETHODCALLTYPE AddRef(void)
    {
        return InterlockedIncrement(&_refCount);
    }
    virtual ULONG STDMETHODCALLTYPE Release(void)
    {
        LONG refCount;
        refCount = InterlockedDecrement(&_refCount);
        if (refCount == 0)
        {
            delete this;
        }
        return refCount;
    }
};

This one’s pretty straighforward, and I expect that people will see the problem right away.  So I’m going to raise the bar a bit – to get full credit, you not only have to explain not only what the problem is, but also why we’d never seen a problem with this code before.

And, as always, kudos and mea culpas on Monday.

Edit: Fixed return value from GetFooConfig.

Comments (32)

  1. bao says:

    GetFooConfig is missing a ‘return S_OK’, and I’m guessing the reason it isn’t noticed is that typically QueryInterface is called first then GetFooConfig, and since by convention return values are stored in the eax register on x86, QueryInterface’s returned value of S_OK is retained as GetFooConfig’s return value as well. Correct?

  2. LarryOsterman says:

    bao, no, that’s not it (but thanks for picking it up). For some reason the C compiler didn’t complain (although I’m not sure why).

  3. ThalesC says:

    Gee… I guess it has to do with GetFooConfig returning a pointer to that global variable. What happens if you try to marshal that call on a different process? Won’t the stub assume the memory has been dynamically allocate and try a CoTaskMemFree on it?

  4. vrk says:

    ReturnedFooConfig should be allocated using CoTaskMemAlloc and returned. It should be a copy of the global variable. The error will be observed only when the calling application retains the value returned by the function beyond the time the inproc server is loaded in memory. It will also be observed when the call is marshaled in that case COM Runtime will try to free the returned pointer using CoTaskmemFree. If the call is within the same apartment and the caller doesnot reatin the pointer beyond the lifetime of the dll, and he doesnot free the returned memory using CoTaskMemFree (which legaly he should) the problem will not be observed.

  5. LarryOsterman says:

    ThalesC got the problem, but not the answer.

    vrk also added in a reason. But why didn’t we find the problem, and what’s different about the test case?

  6. Ralf says:

    In the case somebody call GetFooConfig with a NULL we will se a access violation.

    The method should contain a check:

    if (!ReturnedFooConfig)

    return E_POINTER;

    CoTaskMemFree can fail silent if you use a wrong pointer (not allocate with CoTaskMemAlloc). Maybe the new test case contains a custom memory allocator.

  7. Orbit says:

    hey larry, post some C# code ;)

    btw when do MS devs comment their code ..after or while they are coding???

    why don’t I see comments?

  8. Aaron says:

    Is the test loading the class as a DLL? Once the DLL is unloaded globals defined in it will go away – thus the returned FooConfig pointer will no longer be valid.

    So if they do something like:

    h = LoadLibrary("MyTest");

    p = new CFoo();

    p->GetFooConfig(&r);

    FreeLibrary(h);

    r->_Value1 = 0;

    then the test will asplode.

  9. Alex says:

    Larry, perhaps I’m being pedantic, but that’s not 100% pure C++. Sure, it’s 100% pure Microsoft C++ as of 7.0.

  10. LarryOsterman says:

    Orbit, I’ve done C# code in bad code examples in the past. But the problem doesn’t show up easily with C# (since COM’s implemented via interop).

    And the reason you don’t see comments is that there’s a lot of code in the example already.

    Aaron: This code is never accessed by any mechanism other than CoCreateInstance – so nobody’s directly instantiating a CFoo

  11. Orbit says:

    are you still on the media team?

  12. LarryOsterman says:

    Alex, you’re right, but the actual code is clean as far as I know – I could have broken the interface and struct into their own .idl file instead of using the extensions but it was easier this way.

  13. LarryOsterman says:

    Oh, and Orbit, I’m still on windows core audio (which is part of the Media Technologies Group (mediatech), which is part of the Digital Media Division, which is a part of…)

  14. feroze says:

    I suspect that the old tests were using the COM object inproc. Hence they didnt hit this issue. THe new tests might be using this object out of proc?

  15. orbit says:

    which is part of Windows Media Player? one question, I’m running Windows XP home sp2, When I exit WMP the song still runs in the background, looks like it doesn’t exit so I have to kill the process? known bug or is it just me?

    plus I’m getting tired of the same 3 languages..VB,C#,C++ ..C# is basically a Java copy but you think we will see a totally new language being implemented for a niche part of developers?

    I read blogs that VB and C# have the same capability but C# gets the better sterotype because it has the "letter" C which people think of the power of the C languages.

    any ideas? (Comega and F# are just extensions to the C# language)

  16. Rob says:

    IFoo isn’t a proper COM interface because GetFooConfig uses the thiscall calling convention instead of stdcall. This will make the interface impossible to call from C code.

  17. Andreas Haeber says:

    orbit:

    F# is no extension to C#. It has its roots in Caml. Read Don Syme’s blog about it here http://blogs.msdn.com/dsyme/. It’s way different then C#.

    There are plenty of languages which targets .NET, and you may consider some of them as "niche languages". See http://www.dotnetpowered.com/languages.aspx for a list.

  18. orbit says:

    I reread the F# blog and I stand corrected..

    I meant a new .NET language that MS supports

    even if MS supports it by coding a small program like Express Edition..

  19. Mike says:

    Names that begin with an underscore then a capital or another underscore ( _[A..Z_] ) are reserved to the C++ implementation.

    You can see discussion on comp.lang.c++.moderated if you search on "underscore variable C++".

    By the way, I would also believe that the problem was revealed by changing from a non-marshalling scenario to a marshalling one. Maybe not in-proc to out-of-proc. Maybe apartments. Also a struct of int’s is not automation-compatible if I recall correctly.

  20. Tim Smith says:

    Ok, I admit I am foggy on my proxy generation rules, but…

    If this call is being marshaled, as already stated, the proxy would try to ::CoTaskMemFree the pointer. However, this would be a silent failure. The real problem would be that the caller is now working on an allocated copy of the structure and not the original which could easily fail the validation tests. Also we now have a memory leak.

  21. LarryOsterman says:

    Mike, the "reserved" names is a style definition, not a language restriction – no compile on the earth will enforce that one. And it isn’t a code defect.

    But you’re right – the original (working code) ran in-proc, so the problem wasn’t found. It’s when the tests started testing the failing method (the tests run out-of-proc) that we found the problem.

  22. Orbit says:

    hey larry, you never fully reveal what exactly your working on…? :)

    btw..Steve Ballmer is on channel9

  23. LarryOsterman says:

    Orbit, you’re right, I haven’t.

    And Steve was cool.

  24. orbit says:

    is it WMP 11? ;)

  25. Frederik Slijkerman says:

    CFoo should derive from IUnknown. The cast to IUnknown happens to work because the QueryInterface, AddRef, and Release functions are layed out exactly as required for IUnknown so the vtable layout is the same at the moment. It would break if you inserted a new function between one of these. The GetFooConfig function doesn’t matter because it isn’t virtual.

  26. Frederik Slijkerman says:

    … Of course, now I see that IFoo does indeed derive from IUnknown. Please ignore that last comment…

  27. Stewart Tootill says:

    Just to be pedantic I’m pretty sure that C++ says you mustn’t use names with leading underscores because they are reserved for the implementation and even if the compiler doesn’t enforce it, the implementation is allowed to have a _GlobalFooConfig which tramps on yours at link time so the linker enforces it. However this only applies at file scope, names with leading underscores in namespaces are allowed (the Boost.Bind and Boost.Lambda libraries use this fact for their _1 style argument placeholders), so you could resolve the problem by hiding the variable in an anonymous namespace (since it doesn’t require external linkage anyway).

  28. Centaur says:

    It seems that the expected answer was already posted (about marshalling and giving out a pointer to internal data). I would like to add a little:

    * I’m worried by the initial value of _refCount = 1. If the factory code creates a new CFoo, then QueryInterface()s it for the IID asked by the client, then Release()s the newborn, that’s ok.

    * Your _refCount is a signed LONG, while AddRef()’s and Release()’s return value is an unsigned ULONG. Ditto refCount local in Release(). This won’t usually cause any trouble, especially if Release() checks it for equality to zero, but someone debugging might see negative values.

  29. mirobin says:

    Aside from the other problems mentioned, you’re passing around a data structure that can presumably be modified by your caller without providing any mechanism to lock/"guard" the structure.

    All of the lovely dangers of modifying the same bits in memory from different threads apply.

  30. Wound says:

    My guess the reason the bug has not been seen before is to do with the client code. If you had a COM object being used from VC 6 or VB, and the object is created apartment threaded and in process this will work fine so long as:

    a) You only have one instance of CFoo

    b) no one tries to change the contents of FooConfig on one of 2 instances.

    However, with a .NET client a COM wrapper will be created, which will then cause it to operate with a marshalling layer and you’ll see the bug.

  31. Yesterday’s post was a classic example of Joel Spolsky’s Law of Leaky Abstractions.

    Why?&amp;nbsp; Well,…