Midway upon the journey of our life I found myself within a forest dark, For the straightforward pathway had been lost.
[INFERNO CANTO 1]
In the world of debugging, one could easily get lost without sufficient knowledge of the underlying mechanism. While well known examples being DLL (Dynamic-Link Libraries), FPO (Frame-Pointer Omission), LTCG (Link-time Code Generation), PE/COFF and SEH (Structured Exception Handling), there are many other technologies used by Microsoft:
- BBT (Basic Block Tools) is a suite of optimization tools designed to help reduce the working-set requirements for a Win32 application by applying advanced static analysis and code layout heuristics, and integrating profile data gathered from monitoring the program execution flow. In addition, BBT rearranges static data and resources sections for additional paging reduction.
- Detours is a library for instrumenting arbitrary Win32 functions on x86, x64, and IA64 machines. Detours intercepts Win32 functions by re-writing the in-memory code for target functions. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary.
- Vulcan is a single infrastructure for building a wide range of custom tools for program analysis, optimization, and testing. Through the Vulcan API, developers and testers can build custom tools with very few lines of code for basic block counting, memory tracing, memory allocation, coverage, failure insertion, optimization, compiler auditing etc. Vulcan scales to large commercial applications and has been used to improve the performance and reliability of products across Microsoft.
The following disassembly is directly related to Detours, MOV EDI, EDI is a placeholder which has 2 bytes for holding a NEAR JMP instruction. The NOP instructions has 5 bytes in total for holding an FAR JMP instruction (x86). In a short words, many Windows system DLLs have Detours in mind. The Visual C++ compiler has a command line option called /hotpatch (Create Hotpatchable Image) which does all the magic.
7541b4c1 0400 add al,0 7541b4c3 90 nop 7541b4c4 90 nop 7541b4c5 90 nop 7541b4c6 90 nop 7541b4c7 90 nop KERNELBASE!LoadLibraryExW: 7541b4c8 8bff mov edi,edi 7541b4ca 55 push ebp
NTDLL is not using the hot patch approach, the NOP instructions are just for padding to make sure each entry is aligned.
ntdll!NtQueueApcThread: 77236278 b80d010000 mov eax,10Dh 7723627d ba0003fe7f mov edx,offset SharedUserData!SystemCallStub (7ffe0300) 77236282 ff12 call dword ptr [edx] 77236284 c21400 ret 14h 77236287 90 nop ntdll!ZwQueueApcThreadEx: 77236288 b80e010000 mov eax,10Eh 7723628d ba0003fe7f mov edx,offset SharedUserData!SystemCallStub (7ffe0300) 77236292 ff12 call dword ptr [edx] 77236294 c21800 ret 18h 77236297 90 nop
With the introduction of KERNELBASE, a lot of kernel32 exported functions were forwarded.
0:000> .call kernel32!SetErrorMode(1)
^ Symbol not a function in '.call kernel32!SetErrorMode(1)'
0:000> u kernel32!SetErrorMode L1
75ac016d ff25b41da775 jmp dword ptr [kernel32!_imp__SetErrorMode (75a71db4)]
0:001> u poi(75a71db4)
75417991 8bff mov edi,edi
75417993 55 push ebp
75417994 8bec mov ebp,esp
75417996 51 push ecx
75417997 56 push esi
75417998 e836000000 call KERNELBASE!GetErrorMode (754179d3)
7541799d 8bf0 mov esi,eax
7541799f 8b4508 mov eax,dword ptr [ebp+8]
Basic Block Tools
BBT would merge duplicated blocks, rearrange binary blocks and do a lot crazy things to the symbol files (PDB). Your callstack will look weired as functions might get merged and overlapped, especially if C++ templates are used heavily. You can tell if optimization was performed on basic block level by examining the function body.
FPO was introduced with Windows NT 3.51 thanks to 80386 making ESP available for indexing, thus allowing EBP to be used as a general purpose register. But FPO makes stack unwinding unreliable, which in turn makes it painful to debug. You can tell if FPO was used by examining the function prologue/epilogue.
BOOL WINAPI Foobar()
55 push ebp
8B EC mov ebp, esp
B8 01 00 00 00 mov eax, 1
5D pop ebp
BOOL WINAPI Foobar()
B8 01 00 00 00 mov eax, 1
FPO information is available from both public and private PDB files, WinDBG has a command kv which can be used to examine this information:
0:000> kv ChildEBP RetAddr Args to Child 002bfdac 75d9339a 7efde000 002bfdf8 76f39ed2 notepad!WinMainCRTStartup (FPO: [0,0,0]) 002bfdb8 76f39ed2 7efde000 7b449f70 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo]) 002bfdf8 76f39ea5 005b3689 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo]) 002bfe10 00000000 005b3689 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
Link-time Code Generation
LTCG was introduced with the first version of .NET. It can be used with or without PGO (Profile Guided Optimization). If you were debugging optimized C++ application, you should already know that local variables and inline functions can be very different. With LTCG, cross-module inlining is even possible, in addition, calling convention and parameters can be optimized. Similar as BBT, functions might get merged.
Profile Guided Optimization
PGO (a.k.a. POGO) does a lot of optimization such as inlining, virtual call speculation, conditional branch optimization. What’s more, POGO is able to perform optimizations at extended basic block level.
The Microsoft Incremental Linker has an option /INCREMENTAL (don’t confuse it with an incremental compiler which makes use of precompiled header) which would affect debugging. In fact, the native EnC (Edit and Continue) is built on top of incremental linking technology. Sometimes we may get symbols like module!ILT+0(_main), the ILT (Incremental Link Table) serves the incremental linker by adding a layer of indirection, thus provides the flexibility for binary patching. The bad news is that incremental linker has to generate correct symbols and patch them into PDB as well. The patching process doesn’t discard unused symbols in a reliable manner. This would be challenging for debugger authors, since the integrity of symbols is not guaranteed by the MSPDB layer.
Function inlining means there will be no actual call. The stepper and symbol binding components in debugger might get confused.
Intrinsic functions are a special kind of function generated by the compiler toolchain (instead of coming from libraries or your code).