Quiz: What’s wrong with this code?


What’s wrong with the following code? This is real code taken verbatim from Cordbg (see src\debug\shell\dshell.cpp in the rotor sources), except that I removed preprocessor goo to make it more readable. All the interfaces are COM-classic interfaces derived from IUnknown. You don’t need to be familiar with ICorDebug to see the problem here (that’s hint #1 – it’s not related to some subtle API misuse), though I give some background data below for the curious. Here’s a hint #2: Ever notice memory leaks in Cordbg?


//
// Strip all references off of the given value. This simply
// dereferences through references until it hits a non-reference
// value.
//
// On input *ppValue is an ICorDebugValue for some arbitrary variable.
// On output, *ppValue is update to yield an ICorDebugValue which is
// the furthest down the dereference chain.
//
// The caller requires no specific knowledge of what the *ppValue passed in is.
// The caller must have a reference to the incoming *ppValue (which this function will release)
// and must release the outgoing *ppValue (which this function would addref).
//
HRESULT DebuggerShell::StripReferences(ICorDebugValue **ppValue,
bool printAsYouGo)
{
HRESULT hr = S_OK;

while (TRUE)
{
ICorDebugReferenceValue *reference;
hr = (*ppValue)->QueryInterface(IID_ICorDebugReferenceValue,
(void **) &reference);

if (FAILED(hr))
{
hr = S_OK;
break;
}

// Check for NULL
BOOL isNull;
hr = reference->IsNull(&isNull);

if (FAILED(hr))
{
reference->Release();
(*ppValue)->Release();
*ppValue = NULL;
break;
}

if (isNull)
{
if (printAsYouGo)
Write(L“<null>”);

reference->Release();
(*ppValue)->Release();
*ppValue = NULL;
break;
}

CORDB_ADDRESS realObjectPtr;
hr = reference->GetValue(&realObjectPtr);

if (FAILED(hr))
{
reference->Release();
(*ppValue)->Release();
*ppValue = NULL;
break;
}

// Dereference the thing…
ICorDebugValue *newValue;
hr = reference->Dereference(&newValue);

if (hr != S_OK)
{
if (printAsYouGo)
if (hr == CORDBG_E_BAD_REFERENCE_VALUE)
Write(L“<invalid reference: 0x%p>”, realObjectPtr);
else if (hr == CORDBG_E_CLASS_NOT_LOADED)
Write(L“(0x%p) Note: CLR error — referenced class “
L“not loaded.”, realObjectPtr);
else if (hr == CORDBG_S_VALUE_POINTS_TO_VOID)
Write(L“0x%p”, realObjectPtr);

(reference)->Release();
((*ppValue))->Release();
*ppValue = NULL;
break;
}

if (printAsYouGo)
Write(L“(0x%08x) “, realObjectPtr);

(reference)->Release();

((*ppValue))->Release();
*ppValue = newValue;
}

return hr;
}


Here’s an explanation of the identifiers involved:



  • ICorDebugValue is the debugger’s representation of a variable.
  • If that variable is a reference value, then it also implements ICorDebugReferenceValue.
  • Here’s what the following methods on ICorDebugReferenceValue mean:

    • IsNull(out BOOL) – returns Boolean if the reference value in the debuggee is null.
    • GetValue(out CORDB_ADDRESS) – get the address in the debuggee of what the reference points to.
    • Dereference(out ICorDebugValue) – get an ICorDebugValue describing the variable that this reference points to.
       

Comments (15)

  1. Luke says:

    Without knowing the API, one potential problem is if Dereference returns a success value other than S_OK and sets newValue to something other than NULL. Then newValue would never be released.

  2. Sriram says:

    There should be a (*ppValue)->Release() after the first QueryInterface fails and you set hr=S_OK. And this seems like a hot code path in the debugger to me :)

  3. oshah says:

    What is the reference count of ppValue at the start of the function?

  4. jmstall says:

    Sriram, Luke – both valid finds. And they’re are still more.

    Oshah – generally the only thing a function knows about the ref count of its incoming parameters is that they’re some value greater than 1. This is actually related to a subtle 3rd problem in the code.

  5. Vladimir says:

    It seems that

    *ppValue = NULL;

    incorrectly assumes that the ppValue’s reference count is exactly 1?

  6. Yaytay says:

    The (hr != S_OK) check could lead to a leak of ICorDebugValue references (if it succeeded in getting one but didn’t return S_OK). Given that the documentation should list all success codes this may not be relevant.

    There is no explanation given for the extra brackets wrapping reference and (*ppValue) in the later calls to Release().

    A number of variables are declared without being initialised, which shouldn’t have any effect but I’d try to avoid it.

    There is no indication of whether realObjectPtr requires any kind of freeing. It either needs freeing or it’s risky because it could provide a pointer to the internals of a non-existant object (though the code above does not appear to access it after reference has been Release()d.

    An SEH anywhere in this code, or the code it calls, will leak references. If Dereference can return NULL (maybe in conjunction with an S_FALSE return code) this will happen when QueryInterface is called on the second run through the loop.

    Getting anywhere?

  7. Yaytay says:

    Just thinking through Sriram’s comment, the code should either Release *ppValue AND set ppValue to NULL OR it should do neither.

    The caller is under instructions to Release any non-NULL pointer returned.

    And if it did Release it then the caller would never get a successful result.

    It’s also worth noting that the function returns unknown successful HRs in two different places. It’s documentation should either list all of the successful HRs from IsNull and Dereference, or it should explicitly set hr in these cases.

  8. akraus1 says:

    First: As already mentionend ppValue can have a refcount greater than one. If you set it to NULL nobody can call release later.

    Second: This said the function design itself is flawed. As input you have an Interface with an arbitrary refcount which you replace in the last line with *ppValue = newValue; a new interface. This will make the old interface inacessible to perhaps still needed release calls. Voila memory leak. You should have instead sth like this:

    HRESULT DebuggerShell::StripReferences(ICorDebugValue **ppinValue,ICorDebugValue **ppoutValue)

  9. Yaytay says:

    Surely that’s not an issue.

    If it has a refcount > 1 then something else (some other variable) should have a reference to it.

    It is perfectly reasonable, and indeed, required, to set the variable to NULL after calling Release on it once – unless your code has caused it to increment refcount more than once.

    Isn’t it?

  10. Vladimir says:

    Yaytay: ppValue is passed to StripReferences() by _reference_. How can you possibly make _any_ assumptions about its reference count in the StripReference() code?

    An example from dshell.cpp:

    ICorDebugValue *value = …; // theoretically possible refcount > 1



    GetStaticFieldValue(…, &value); // theoretically possible refcount > 1



    StripReferences(&value, false);

  11. Yaytay says:

    My point is that you aren’t making any assumptions about it’s refcount.

    The variable that you are dealing with is responsible for precisely one increment of that refcount, if the caller has allocated more than one increment to that variable then it is their issue to resolve it.

    One variable is one reference.

    If you choose to circumvent that by explicitly calling AddRef thn you are responsible for picking up the pieces.

  12. oshah says:

    Yaytay.

    Because ppValue is a reference to the object. Any changes made in StripReferences will be directly reflected in the caller.

    If it helps you, think of the problem like this–What’s wrong with this code?

    int *x = new int;

    scanf("%p", &x);

    delete x;

    Before scanf, x is a pointer to some heap memory. After scanf is called, what is the value of x? The value of x can be anything (depending on what the user typed), and it doesn’t have to point to that heap of ram you allocated a line before.

    Is the block of ram still valid? Did scanf automagically delete x when it changed its value? No. scanf just slapped in 4 bytes (8 bytes for Win64) at x, and all the caller sees now is what the user typed. Nobody deleted the pointer. Now who’s gonna delete the pointer? It can’t be x, x holds what the user typed. Who owns the pointer now? Does anyone own the pointer?

    What you’re suggesting as a solution is to get scanf to delete your pointer for you. You could get away with altering scanf (assuming scanf was a rarely used function, which as we know it’s not). You could until someone passes a malloc’ed pointer and scanf deletes the malloc’ed pointer …

    Admittedly, this example is rather contrived. Nobody would ever scanf a pointer. However, the scanf is a simplified version.

    What has this got to do with COM interfaces? These COM interfaces are actually sophisticated "scanf"-style functions. When they change something in StripReferences, it’s reflected in dshell.cpp (the caller).

    Now when StripReferences sets ppValue to NULL, that is reflected in dshell.cpp. Setting ppValue to null DID NOT call Release(). It just wrote 4 bytes of 0 to *ppValue: the same ppValue from dshell.cpp.

    If the CorDebug team used to C++ smart-pointer classes (like com_ptr_t)… then your solution would work, and nobody would have to care about Release()/AddRef(). When StripReferences makes a change in ppValue, the operator=() would call Release on the old pointer and take ownership on the new pointer, as you said.

    Right now, CorDebug aren’t using smart pointer classes, they’re just using plain vanilla pointers: 4 bytes of memory (8 on Win64) that can’t tell the difference between a malloc and a new, or an int* from a ICorDebugValue*.

  13. oshah says:

    Vladimir.

    If StripReferences made the requirement that all callers make sure that ppValue has reference count 1, then the ppValue=NULL would not be an issue.

    That’s a very contrived situation though. In the same vein, we can make my scanf program work–if the user typed in the correct value of x each time. So all we need is to require the users to type the correct value of x; and my scanf program actually works!

    Since ref counting is not mentioned anywhere in the code sample, we know this is clearly a bug. (However, I was not completely sure, hence why I asked for the exact reference count earlier).

  14. Yaytay says:

    OK, the common consensus appears to be that I am being particularly dense, but I’m afraid I’m still there.

    StripReference does, explicitly, call Release before setting *ppValue to null.

    Now it is certainly possible that the caller called (*ppValue)->AddRef() and didn’t copy the pointer value, but that in itself would be ignoring the documentation as AddRef "should be called for every new copy of a pointer to an interface on a given object".

    So either the refcount is 1 and calling (*ppValue)->Release invalidated the pointer, or the caller should have another variable with the same value as ppValue.

    I’d certainly say that this sort of bare pointer work was dangerous, but it’s dangerous because it permits the caller to do the wrong thing, not because it itself does a wrong thing.

  15. jmstall says:

    Most of the local reference counting bugs in here that people brought up can be fixed by converting the

    function over to use smart pointers.

    FWIW, We actually converted a lot of CorDbg over to use smart pointers. But since we’re deprecating Cordbg in favor of Mdbg anyways, we’ve redirected any real effort to Mdbg

    The thing that originally bothered my about this function was using ppValue as an in-out parameter (I agree with akraus1)

    It’s too easy for a caller to get this wrong.

    I think the dispute between yaytay + others is over code like this:

    1: ICorDebugValue * value = … // ref=1

    2: value->AddRef(); // ref = 2

    3: StripReferences(&value, false); // if value is changed under us, ref goes back to 1

    4: value->Release(); // matches addref in line 2

    5: value->Release(); // matches addref in line 1

    Given the impl of StripReferences, that code could blow up. My read of the thread is that we all see that. But I gather yaytay is saying that nobody should write

    code like that in the first place, and the code should instead be written like this:

    1: ICorDebugValue * value = … // ref=1

    2: ICorDebugValue * value2= value; value2->AddRef(); // ref = 2

    3: StripReferences(&value, false); // if value is changed under us, ref goes back to 1

    4: value2->Release(); // matches addref in line 2

    5: value->Release(); // matches addref in line 1

    Thus each raw pointer is directly tied to 1 ref count.