Debugging walkthrough: Access violation on nonsense instruction, episode 2


A colleague of mine asked for help debugging a strange failure. Execution halted on what appeared to be a nonsense instruction.

eax=0079f850 ebx=00000000 ecx=00000113 edx=00000030 esi=33ee06ef edi=74b9b8ad
eip=00c0ac74 esp=0079f82c ebp=0079f86c iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
00c0ac74 0000            add     byte ptr [eax],al          ds:002b:0079f850=74

If you've been debugging x86 code for a while, you immediately recognize this instruction as "executing a page of zeroes". If you haven't been debugging x86 code for a while, you can see this from the code bytes in the second column.

So how did we end up at this nonsense instruction?

The instruction is not near a page boundary, so we didn't fall through to it. We must have jumped to it or returned to it.

Since debugging is an exercise in optimism, let's assume that we jumped to it via a call instruction, and the return address is still on the stack.

0:000> dps esp l2
0079f82c  74b9b8b1 user32!GetMessageW+0x4
0079f830  008f108b CONTOSO!MessageLoop+0xe7
0:000> u user32!GetMessageW l3
USER32!GetMessageW:
74b9b8ad cc              int     3
74b9b8ae ff558b          call    dword ptr [ebp-75h]
74b9b8b1 ec              in      al,dx

Well, that explains it. The code bytes for the Get­MessageW function were overwritten, causing us to execute garbage, and one of the garbage instructions was a call that took us to page of zeroes.

But look more closely at the overwritten bytes.

The first byte is cc, which is a breakpoint instruction. Hm...

Since Windows functions begin with a MOV EDI, EDI instruction for hot patching purposes, the first two bytes are always 8b ff. If we unpatch the cc to 8b, we see that the rest of the code bytes are intact.

USER32!GetMessageW:
74b9b8ad 8bff            mov     edi,edi
74b9b8af 55              push    ebp
74b9b8b0 8bec            mov     ebp,esp

After a brief discussion, we were able to piece together what happened:

Somebody was trying to debug the CONTOSO application, so they connected a user-mode debugger to the application. Meanwhile, they set a breakpoint on user32!GetMessageW from the kernel debugger. Setting a breakpoint in a debugger is typically performed by patching an int 3 at the point where you want the breakpoint. When the int 3 fires, the debugger regains control and says, "Oh, thanks for stopping. Let me unpatch all the int 3's I put in the program to put things back the way they were."

When the breakpoint hit, it was caught by the user-mode debugger, but since the user-mode debugger didn't set that breakpoint, it interpreted the int 3 as a hard-coded breakpoint in the application. At this point, the developer saw a spurious breakpoint, didn't know what it meant, and simply resumed execution. This executed the second half of the MOV EDI, EDI instruction as the start of a new instruction, and havoc ensued.

That developer then asked his friend what happened, and his friend asked me.

TL;DR: Be careful if you have more than one debugger active. Breakpoints set by one debugger will not be recognized by the other. If the breakpoint instruction is caught by the wrong debugger, things will go downhill fast unless you take corrective action. (In this case, it would be restoring the original byte.)

Comments (17)
  1. Brian_EE says:

    There's an old saying "Two wrongs don't make a right."

    This is a case of "Two debuggers make a bug."

  2. Joshua says:

    eip = unmapped

    esp = unmapped

    That sucked.

  3. Boris says:

    I can't help but read MOV EDI, EDI as "Move, Eddie-Eddie."

  4. Paul says:

    Why was the developer using 2 debuggers in the first place?  I can sort of see why they expected a BP set in one to work in the other, but it's a very odd way of doing things.

  5. @Paul: They may have been debugging a program that interacts with a kernel-mode driver.  It would probably make more sense to stick with the kernel debugger then for both programs, but maybe they wanted to utilize the user-mode debugger for additional features, like profiling or event tracing.

  6. Gabe says:

    I'm now curious as to how, out of the domain of 2-byte NOP instructions, MOV EDI,EDI was chosen.

  7. alegr1 says:

    Another kind of mysterious exceptions happens when a debugger remembers a breakpoint by its code offset in a function, and then the function gets modified, and the offset is now pointing to the middle of an instruction.

  8. pc says:

    @Gabe:

    Raymond discusses MOV EDI,EDI as the NOP in these posts:

    blogs.msdn.com/…/10214405.aspx

    blogs.msdn.com/…/10381672.aspx

  9. Adam Rosenfield says:

    I've had occasion to use two debuggers on the same process a few times.  One was for debugging a C++ game engine (UnrealEngine 3), and the other was for debugging the scripting language embedded in the engine (UnrealScript).  The UnrealScript debugger was a janky Visual Studio extension, which has its share of problems, but it got the job done for stepping into and through script code and viewing variable values.

    However, one instance of Visual Studio couldn't debug both native C++ and UnrealScript at the same time, so when I wanted to debug both, I had to boot up the game using the UnrealScript debugger extension, and then attach a second instance of Visual Studio to the already-running process.  Surprisingly, I didn't run into any crazy problems from doing so like the one described in this post.

  10. Myria says:

    I'm going to be emailing this to some coworkers, to help in explaining how debuggers actually work.  It's interesting how many senior-level developers don't know how debuggers actually set breakpoints and such.

    On a side note, I wish /functionpadmin worked properly in x64 builds…

  11. Killer{R} says:

    But is'nt kernel debugger get notified about breakpoint exceptions before user-mode debugger?

    However breakpoints can be used to implement API hooking, and (another however) GetMessageW  is a potential target for keylogging malwate.

  12. Erik F says:

    @Killer: According to msdn.microsoft.com/…/ff560042(v=vs.85).aspx , the kernel debugger gets notified first. As the article mentions, the "breakpoint breaks into the user-mode debugger, even though the breakpoint was actually set from the kernel debugger."

    As an aside, I've never considered running two debuggers at the same time before, so this is a situation that's never crossed my mind before.

  13. Erkin Alp Güney says:

    [Hot-patching is not an application feature. It's an OS internal feature for servicing. Who are these people who keep trying to patch code they didn't write?! -Raymond]

    Some [redacted] software uses them. The ReactOS team has encountered such a thing. ReactOS is a clean-room FLOSS reverse-engineering of Windows XP.

  14. Henke37 says:

    I remember having to use two debuggers at the same time once. One for actionscript and one for C. And for additional irony, I was debugging a debugging plugin.

  15. Dave Bacher says:

    @Adam Rosenfield:

    I cannot speak for the Unreal Engine debugger implementation — however the Torque and Unity 3D visual studio extensions just use a network call for the debugger, and that was a common tactic around when Unreal originally came out.  I seem to recall one of the Torque IDE shells also supporting Unreal 2004, but I might be mistaken on that (and I don't remember the specific editor involved enough to do a good search for it).

    But if you have the classic VM implementation of some byte codes invoking function pointers (or a giant switch table), it's really easy to track the breakpoints.  At that point, you use the console parser for the debugger — and it already has the commands to go get the value and set the value of a variable, for example.  You just tag the messages somehow (XML, JSON, CSV, w/e) and then make sure they're machine readable.  If you need to, you might even have something like "set connection json" or w/e as a console command, and keep the network console's state separate from the in-game console's state.  (which likely is necessary depending on how you're doing it, anyway — e.g. a TCP console probably wouldn't hold onto the last N messages, and would instead allow whatever client is attached to manage that part, while the in-game console probably needs a buffer of previous messages in order to maintain its UI state)

    But anyway — point is — they wouldn't need to do anything that conflicts here, so they probably don't.

  16. Joshua says:

    [Hot-patching is not an application feature. It's an OS internal feature for servicing. Who are these people who keep trying to patch code they didn't write?! -Raymond]

    Those who fix your bugs because you will not.

  17. Adam Rosenfield says:

    > But anyway — point is — they wouldn't need to do anything that conflicts here, so they probably don't.

    Exactly.  I don't remember the exact transport mechanism (it might have TCP, or maybe local named pipes), but the script debugger always functioned within a small set of well-defined hooks in the VM implementation.  If you were stepping through script code, the C++ code was just running in a small modal loop inside the VM, while if you were stepping through C++ code, the script debugger was patiently waiting for you to hit the next of its hooks.

Comments are closed.

Skip to main content