Debugging session: Which of the many things happening in this single line of code is the one that crashed?


A crash report came in, and the offending line of code was the following:

void CDeloreanSettings::UpdateFluxModulation(bool sendNotification)
{
    ComPtr<IFluxModulator> spModulator;
    // Crash on the next line
    if (SUCCEEDED(m_spFluxCapacitor->GetFluxModulator(&spModulator)))
    {
        ...
    }
}

Someone made the initial diagnosis that

The call is to Release­And­Get­Address­Of() on a ComPtr object which is declared right above (which should be initialized to nullptr). Am I missing something?

Let's look at the disassembly. First, with no annotations. See if you can figure it out yourself.

CDeloreanSettings::UpdateFluxModulation:
mov     qword ptr [rsp+10h],rbx
mov     qword ptr [rsp+18h],rsi
mov     qword ptr [rsp+20h],rdi
push    rbp
push    r14
push    r15
mov     rbp,rsp
sub     rsp,50h
mov     rax,qword ptr [__security_cookie]
xor     rax,rsp
mov     qword ptr [rbp-8],rax
mov     rdi,qword ptr [rcx+18h]
mov     r14,rcx
lea     rcx,[rbp-10h]
xor     esi,esi
mov     r15b,dl
and     qword ptr [rbp-10h],rsi
call    Microsoft::WRL::ComPtr<IUnrelatedInterface>::InternalRelease
mov     rax,qword ptr [rdi] << crash here
mov     rbx,qword ptr [rax+38h]
mov     rcx,rbx
call    qword ptr [__guard_check_icall_fptr]
lea     rdx,[rbp-10h]
mov     rcx,rdi
call    rbx

Okay, here's the version with my annotations:

CDeloreanSettings::UpdateFluxModulation:
; Prologue: Save nonvolatile registers and build the stack frame.
mov     qword ptr [rsp+10h],rbx
mov     qword ptr [rsp+18h],rsi
mov     qword ptr [rsp+20h],rdi
push    rbp
push    r14
push    r15
mov     rbp,rsp
sub     rsp,50h
mov     rax,qword ptr [__security_cookie]
xor     rax,rsp
mov     qword ptr [rbp-8],rax

mov     rdi,qword ptr [rcx+18h] ; rdi = m_spFluxCapacitor
mov     r14,rcx                 ; save "this"
lea     rcx,[rbp-10h]           ; prepare spModulator.ReleaseAndGetAddressOf
xor     esi,esi
mov     r15b,dl                 ; save "sendNotification"
and     qword ptr [rbp-10h],rsi ; construct spModulator
; ReleaseAndGetAddressOf was inlined. Here's the Release part:
call    Microsoft::WRL::ComPtr<IUnrelatedInterface>::InternalRelease

; prepare m_spFluxCapacitor->...
; Crash here loading vtable from m_spFluxCapacitor
mov     rax,qword ptr [rdi] << crash here
mov     rbx,qword ptr [rax+38h] ; load address of GetFluxModulator
mov     rcx,rbx                 ; parameter to CFG check
call    qword ptr [__guard_check_icall_fptr] ; check the function pointer

; Here's the GetAddressOf part of ReleaseAndGetAddressOf:
lea     rdx,[rbp-10h] ; spModulator.GetAddressOf
mov     rcx,rdi                 ; "this" for GetFluxModulator
call    rbx                     ; _spFluxCapacitor->GetFluxModulator()

The compiler inlined Release­And­Get­Address­Of, and it interleaved various unrelated operations. In the second block of code, you can see it interleave the construction of the Com­Ptr with the call to Internal­Release. In the third block, you can see it peform the control flow guard test before performing the Get­Addresss­Of.

The conclusion, therefore, is not that the crash occurred in the Release­And­Get­Address­Of The Release­And­Get­Address­Of just finished releasing and is waiting for its turn to do the Get­Addresss­Of. Rather, the crash occurred because m_spFlux­Capacitor is null, and we crashes trying to read the vtable from a null pointer.

Further investigation of the issue revealed that Update­Flux­Modulation is called from an event handler that was registered to be called whenever the modulation changed. Inspection of memory showed that the event registration token was zero, indicating that the event has already been unregistered. The issue is that there was a modulation change in flight when the event handler was unregistered, so the CDelorean­Settings received its change notification after it had unregistered. The fix is to have the handler check whether it still has a m_spFlux­Capacitor, and if not, then ignore the notification, on the assumption that it was a stray notification that was late to arrive.

Comments (13)
  1. 12BitSlab says:

    m_spFluxCapacitor is null because it traveled backwards in time to a point where the pointer had yet to be set.

    Sorry! Bad joke!

    1. Brian_EE says:

      I don't think that's right. In the movies, both Marty and Doc traveled back to the 1800's, well before either of them were born. They didn't suddenly disappear (become Null).

      1. ICom says:

        That's due to a temporal issue in their reference counts, which prevented them from forfeiting their lives. Just because they travel to the past does not "release" their other selves from still existing in the future.

  2. 640k says:

    I recently investigated problems with code that changed the OS time. Very confusing. It took -20 hours btw.

  3. Antonio Rodríguez says:

    Isn't this code part of the famous time machine which Microsoft Research has been developing for at least 25 years?

    1. Erik F says:

      You'll know when they're finished when they've always had it!

  4. jimbobmcgee says:

    When/why did you start injecting spaces into your inline Pascal-cased code keywords?
    It's super-annoying...

    1. Those are soft hyphens, and they've been there for years. If they are showing up as spaces, then something is wrong with your browser.

      1. jimbobmcgee says:

        So they are - you learn something new every day.
        Good old mobile browsers - rendering everything goes wrong since 1996...

      2. Azarien says:

        One of your "ComPtr"s does have a soft-hyphen, the other one doesn't.

  5. Medinoc says:

    IUnrelatedInterface (and its link) reminded me of a question, does the PDB contain information mentioning that Identical COMDAT Folding occurred? Would it be possible for the debugger to say "Microsoft::WRL::ComPtr::InternalRelease() could actually be any of 37 folded functions"?

    1. I don't see a technical reason why not. SymEnumSymbolsForAddr will tell you. I don't know whether it actually does, though.

  6. Thomas A says:

    Excuse my ignorance, but by looking at the source line and assuming that SUCCESS doesn't do anything fancy, is there much more than m_spFluxCapacitor having a bad value that could go wrong? (I guess there is? :-))

Comments are closed.

Skip to main content