Postmortem Debugging - Better Late Than Never

If there is a consistent repro, I would definitely prefer Early Debugging. However in the real life postmortem debugging seems to be unavoidable. 

There are three concepts I wish to clarify before digging into the details:

  1. AeDebug is a set of registry keys which specify the behavior when unhandled exception happened in an user mode application.

    • \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug
    • \\HKEY_LOCAL_MACHINE\Software\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug

    By default AeDebug is configured to use drwtsn32.exe, which would capture a dump and terminate the application in problem.

  2. Just-In-Time Debugging (a.k.a. JIT Debugging) is a feature provided by most debuggers (e.g. CDB, NTSD, WinDBG and Visual Studio Debugger), which allows the debugger to be launched and attached to the application in problem.

    The JIT debugger shipped with Visual Studio is called vsjitdebugger.exe, which would pop up a window and let you decide the next step. Visual Studio stepped further by allowing JIT debugging for scripts.

    Needless to mention, JIT Debugging is normally built on top of AeDebug.

  3. Postmortem Debugging is an overloaded term which could mean debugging a dump, or JIT debugging.

    Since I will cover JIT debugging in another article, I would prefer referring dump file debugging as Postmortem Debugging. 

Okay, now let's go back to the topic, what would you do after receiving a dump file?

  1. Understand the source of the dump file - under which condition was the dump file generated. Once you've confirmed the dump is coming from a trusted source, try to find out when and where the dump file was taken.

    0:001> .time
    Debug session time: Mon Dec 3 17:36:58.997 2012 (UTC - 8:00)
    System Uptime: 2 days 23:31:41.638
    Process Uptime: 0 days 0:00:14.156
    Kernel time: 0 days 0:00:00.015
    User time: 0 days 0:00:00.000

    0:001> vertarget
    Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64 Product: LanManNt, suite: Enterprise TerminalServer SingleUserTS
    kernel32.dll version: 6.1.7601.17514 (win7sp1_rtm.101119-1850)
    Machine Name:
    Debug session time: Mon Dec 3 18:37:21.103 2012 (UTC - 8:00)
    System Uptime: 3 days 0:32:03.743
    Process Uptime: 0 days 1:00:36.261
    Kernel time: 0 days 0:00:00.015
    User time: 0 days 0:00:00.000

    0:000> .lastevent
    Last event: 14d0.1874: Break instruction exception - code 80000003 (first chance)

  2. Check the dump file type - mini dump or full dump, kernel dump or user mode dump, whether the dump contains an exception record. Normally WinDBG would display the dump type when you open a dump file, here we'll use the command learned in Undocumented WinDBG.
      
    0:001> .dumpdebug
    ----- User Mini Dump Analysis
    MINIDUMP_HEADER:
    Version A793 (6804)
    NumberOfStreams 14
    Flags 9164
    0004 MiniDumpWithHandleData
    0020 MiniDumpWithUnloadedModules
    0040 MiniDumpWithIndirectlyReferencedMemory
    0100 MiniDumpWithProcessThreadData
    1000 MiniDumpWithThreadInfo
    8000 MiniDumpWithFullAuxiliaryState

If it's a user mode dump, additional information needs to be retrieved from the dump.

  1. What is the command line, and whether the process is a generic host such like dllhost.exe, svchost.exe taskhost.exe and w3wp.exe.
  2. Understand the bitness - whether it is a 64bit process or 32bit process. It would be tricky while debugging a 64bit dump of WOW32 process.
  3. Whether CLR is involved, and what is the CLR version (note there could be more than one CLR hosted).

(to be continued...)