When you crash on a mov ebx, eax instruction, there aren’t too many obvious explanations, so just try what you can

A computer running some tests encountered a mysterious crash:

eax=ffffffff ebx=00000000 ecx=038ef548 edx=17b060b4 esi=00000000 edi=038ef6f0
eip=14ae1b77 esp=038ef56c ebp=038ef574 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
14ae1b77 8bd8            mov     ebx,eax 

A colleague of mine quickly diagnosed the proximate cause.

*Something* marked the code page PAGE_READWRITE, instead of PAGE_EXECUTE_READ. I suspect a bug in a driver. FOO is just a victim here.

0:002> !vprot 14ae1b77 
BaseAddress:       14ae1000
AllocationBase:    14ae0000
AllocationProtect: 00000080  PAGE_EXECUTE_WRITECOPY
RegionSize:        00001000
State:             00001000  MEM_COMMIT
Protect:           00000004  PAGE_READWRITE
Type:              01000000  MEM_IMAGE

This diagnosis was met with astonishment. "Wow! What made you think to check the protection on the code page?"

Well, let's see. We're crashing on a mov ebx, eax instruction. This does not access memory; it's a register-to-register operation. There's no way a properly functioning CPU can raise an exception on this instruction.

At this point, what possibilities remain?

  • NX, which prevents the CPU from executing data.
  • Overclocking, which will cause all sorts of "impossible" things.

  • A root kit.

(Note that the second and third options involve rejecting the assumption that the CPU is behaving properly.)

These are in increasing order of paranoia, so you naturally start with the least paranoid possibility.

Then, of course, there's the non-psychic solution: Ask the debugger for the exception record.

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 14ae1b77 (FOO!CFrameWnd::GetAssociatedWidget+0x00000047)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000008
   Parameter[1]: 14ae1b77
Attempt to execute non-executable address 14ae1b77

That last line pretty much hands it to you on a silver platter.

Comments (20)
  1. Marco Mastropaolo says:

    Somehow it happened to me the opposite – a mov [esp+something], something which raised an access violation even if esp was well within acceptable values. And not in kernel mode, but in it's almost opposite, a .NET app.

    Of course, the culprit was also NX.


  2. Marco Mastropaolo says:

    Well what I forgot to mention in the previous post, is that what allowed me to catch it was years of reading your blog :) — thanks

  3. xbox2 says:

    Bugs caused by overclocking is exceedingly rare. More common in reality is cheap cooling solutions bundled with mass produced junk, which doesn't meet the thermal dissipation requirements of the components.

  4. Veltas says:

    Bugs caused by overclocking is exceedingly rare.  More common in reality is cheap cooling solutions bundled with mass produced junk, which doesn't meet the thermal dissipation requirements of the components.

    Surely that's more likely to occur with over-clocking, though?

    And yeah these bugs are rare, but still happen.  And when they do happen, apparantly they cause a lot of head-scratching from the bug report side!

  5. Joshua says:

    UnmapViewOfFile(GetModuleFileHandle(NULL)); crashes in a very interesting way.

  6. squizzle says:

    Veltas, that reminds me of a Larry Osterman post: blogs.msdn.com/…/104165.aspx

    "One in a million is next Tuesday"

  7. dave says:

    Bugs caused by overclocking is exceedingly rare.

    Because "fails to run correctly on a malfunctioning processor" does not indicate a bug in the software?  ;-)

    (Or, "bug report" != "bug")

  8. David Walker says:

    I thought that every instruction would (or could) fail to run correctly on a malfunctioning processor!

  9. Joshua says:

    @dave, But bugs caused by assuming uniform CPU rates do happen. There is no reason a priori that a quad-processor machine must have all four processors at the same speed. Somewhat more common is assuming that a CPU declared for speed X really has speed X and wondering why your timing is off or some other race condition somewhere.

  10. BJ says:

    Another great example of why psychic debugging continues to impress otherwise intelligent developers: "so you naturally start with the least paranoid possibility" – unfortunately there is a group of developers who do assume they have discovered a brand new bug in the CPU / assembler / C++ / C# / .Net Framework / library / application when the most likely probability is something far more obvious – if you look at problems the right way.

  11. Anonymous Coward says:

    Oh, I know another potential cause! A buggy virtual machine or emulator.

  12. mpz says:

    Splitting hairs here, but surely the CPU is behaving properly in the case of a root kit; it is executing whatever the root kit wishes to execute. It's just not executing *your* code properly ;-)

    Of course all bets are off wrt anybody else's code, so your point is still mostly valid.

  13. Joshua says:

    @Yuhong Bao. Wow just wow.

    Look I know he's responsible for a lot of oddball posts, but this was a good one.

  14. spool # says:

    But what changed the protection of the page PAGE_READWRITE from executable

    FOO!CFrameWnd::GetAssociatedWidget+0x47: <— Looks like we are well inside the function and hence it should be executable !!!

    14ae1b77 8bd8            mov     ebx,eax

    Interesting though !!!

  15. Jim says:

    @Yuhong Bao: Thanks, very interesting. I particularly like the euphemism "specification update" for bug list.

  16. @spool says:

    It's right there in the article: "I suspect a bug in a driver. FOO is just a victim here."

  17. spool says:

    I was more interested in the actual driver and hence scenario/conditions which would lead to this situation.

    I understand that this could have been due to the some buggy driver changing the page protections for any xyz reason to achieve something & incorrectly doing so on a wrong address which happens to be our function and hence crash !!!

  18. Matt T says:

    Another reason I've seen is if you're using an old debugger and it's assuming one instruction set, when actually another one is in use – for example an old 32-bit debugger debugging a 64-bit process might get confused and think that those opcodes are 32-bit ones, rather than 64-bit ones, i.e. your debugger is lying to you!

  19. Matt T says:

    @spool: It's likely caused by someone doing inline hooking – you need to write a jump on that page, and lots of people change the page permission from PAGE_EXECUTE_READ to PAGE_READWRITE and then back to PAGE_EXECUTE_READ afterwards, instead of via PAGE_EXECUTE_READWRITE. This means another thread touching code on that page will explode if it touches a different function on that page during the hooking process.

    Hooks are often employed by Anti virus software to watch for bad behaviour, and not all anti-virus vendors are very good at it.

Comments are closed.

Skip to main content