Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?


If you look at the disassembly of functions inside Windows DLLs, you'll find that they begin with the seemingly pointless instruction MOV EDI, EDI. This instruction copies a register to itself and updates no flags; it is completely meaningless. So why is it there?

It's a hot-patch point.

The MOV EDI, EDI instruction is a two-byte NOP, which is just enough space to patch in a jump instruction so that the function can be updated on the fly. The intention is that the MOV EDI, EDI instruction will be replaced with a two-byte JMP $-5 instruction to redirect control to five bytes of patch space that comes immediately before the start of the function. Five bytes is enough for a full jump instruction, which can send control to the replacement function installed somewhere else in the address space.

Although the five bytes of patch space before the start of the function consists of five one-byte NOP instructions, the function entry point uses a single two-byte NOP.

Why not use Detours to hot-patch the function, then you don't need any patch space at all.

The problem with Detouring a function during live execution is that you can never be sure that at the moment you are patching in the Detour, another thread isn't in the middle of executing an instruction that overlaps the first five bytes of the function. (And you have to alter the code generation so that no instruction starting at offsets 1 through 4 of the function is ever the target of a jump.) You could work around this by suspending all the threads while you're patching, but that still won't stop somebody from doing a CreateRemoteThread after you thought you had successfully suspended all the threads.

Why not just use two NOP instructions at the entry point?

Well, because a NOP instruction consumes one clock cycle and one pipe, so two of them would consume two clock cycles and two pipes. (The instructions will likely be paired, one in each pipe, so the combined execution will take one clock cycle.) On the other hand, the MOV EDI, EDI instruction consumes one clock cycle and one pipe. (In practice, the instruction will occupy one pipe, leaving the other available to execute another instruction in parallel. You might say that the instruction executes in half a cycle.) However you calculate it, the MOV EDI, EDI instruction executes in half the time of two NOP instructions.

On the other hand, the five NOPs inserted before the start of the function are never executed, so it doesn't matter what you use to pad them. It could've been five garbage bytes for all anybody cares.

But much more important than cycle-counting is that the use of a two-byte NOP avoids the Detours problem: If the code had used two single-byte NOP instructions, then there is the risk that you will install your patch just as a thread has finished executing the first single-byte NOP and is about to begin executing the second single-byte NOP, resulting in the thread treating the second half of your JMP $-5 as the start of a new instruction.

There's a lot of patching machinery going on that most people don't even realize. Maybe at some point, I'll get around to writing about how the operating system manages patches for software that isn't installed yet, so that when you do install the software, the patch is already there, thereby closing the vulnerability window between installing the software and downloading the patches.

Comments (37)
  1. A. Skrobov says:

    If you used JMP $+2 as a two-byte NOP, then the patch would require changing just one byte instead of two. This may be essential if the hot-patch point isn't DWORD-aligned.

  2. ErikF says:

    @A. Skrobov: I'm guessing that JMP $+2 wasn't used because it messes up the cache (at least it did in 486s, which is when I stopped regularly doing assembly language); as well, a quick look could possibly miss that the function was patched (MOV looks quite different from JMP).

    I'm truly impressed by how far the operating system has improved in its ability to fix things up on the fly. IMO, this is way better than shimming in patches in the export table and orders of magnitude better than what had to be done in DOS (playing with the interrupt table or looking for magic sequences)!

    Has this affected the working sets of the DLLs any? It probably hasn't, seeing as the alignment requirements have increased over time, but it would be interesting to know.

  3. MrE says:

    From what I could tell, this behavior has changed in x64 versions of Windows. What's the best way to hotpatch a function on this architecture?

    [Hot-patching is not an application feature. It's an OS internal feature for servicing. Who are these people who keep trying to patch code they didn't write?! -Raymond]
  4. Adam Rosenfield says:

    How often do Windows DLLs actually get hot-patched?  I feel like this is one of those features that you never notice when it's working properly, but you do notice when it's not — people get mighty grumpy when they have to reboot to install a patch.

  5. mgetz says:

    Well I would imagine this changed with vista and 7 as well on both platforms, mostly because of Data Execution Prevention (on by default on server, and turned on by security nuts like me on desktop) which would require that the entire page be taken offline, marked as not executable and writable, edited, and then marked non-writable and executable again and moved back into service.

  6. Joshua says:

    @MGetz: I did not imagine the flag PAGE_EXECUTE_READWRITE.

  7. Cesar says:

    Why not a two-byte "long NOP" (0x66 0x90) instead? AFAIK, it would use even less resources (IIRC, the instruction decoder can recognize it as a two-byte NOP).

  8. @MGetz: good question – I wonder how they do it with DEP?  To do what you describe sounds like you'd have to suspend the threads anyway, which doesn't work according to this post…  I must be missing something obvious.

    (It's a cool-sounding feature; too bad there always seems to be some patch in the monthly patch release that makes me reboot anyway – it only takes one patch out of the group of 10 to cause a reboot…  I guess the only advantage is that I'm protected by some of the patches before I actually reboot.)

  9. Billy O'Neal says:

    @MGetz: Windows is not (in the general case) a w^x operating system — DEP merely enforces execution access control bits — there's nothing stopping a page from being marked both writable and executable.

  10. Cesar says:

    @Raymond: "Who are these people who keep trying to patch code they didn't write?"

    Well, from following the Wine project, it seems they are mostly copy protection vendors (the Wine project, which is mostly written in C, had to add assembly stubs to a few functions to make them look more like what these broken programs expect).

  11. MrE says:

    @Raymond: "Who are these people who keep trying to patch code they didn't write?"

    My question was probably inaccurate to receive such a rude answer. Here's another attempt:

    If you gave up that method on x64, is it because the whole hotpatching context is no longer relevant there, or because a better approach has been chosen? If so, which one is it?

    [Sorry. It's just that I see soooo many people trying to do crazy patching, I have no idea what they're trying to accomplish. They just seem to loooooove patching. And code injection. -Raymond]
  12. John says:

    @MrE:My question was probably inaccurate to receive such a rude answer.

    That's a typical response from Raymond.  Part of the reason I read this blog is because he's the cynic's cynic, so I don't feel so bad about the stuff I come out with!

  13. Mmm says:

    @MGetz

    Just have a new page in the page table with write attr set, pointing at the same physical address ?

  14. Joshua says:

    [Who are these people who keep trying to patch code they didn't write?! -Raymond]

    I am one of them. Because Microsoft implemented POSIX.1 support by checkbox for each requirement without really understanding it, the brunt of dealing with it has fallen on application programmers. I disable my patching on Wine because it inherits a true POSIX.1 from the underlying architecture and so does not need to be patched.

    It's always the same things that come up. User or applicaiton need to delete file in use: Unlocker (yes I know how dangerous) is popular because it allows it. Application needs a common storage area for all users on the PC: Raymond says use service. Setgid would be cheaper and more appropriate for most cases.

    [And patching fixes this? -Raymond]
  15. Alex Grigoriev says:

    Was hotpatching actually ever used in any update?

  16. Joshua says:

    [And patching fixes this? -Raymond]

    Sometimes. It was either that or write driver.

  17. @Joshua: "Application needs a common storage area for all users on the PC: Raymond says use service."

    I suspect Raymond actually says use the Common App Data folder, because that's what it's there for. Hacking around patching Windows functions is clearly a bad way to be writing software.

  18. Patrick Farrell says:

    Raymond: Totally OT, but I'm adding your blog back to my daily feed.  The awesome of your posts certainly hasn't diminished.  Thanks for keeping up the great blog.

  19. Bob says:

    On x64 5 bytes only allows you to jump within a +-2GB range.

  20. GrayShade says:

    @Raymond: "that you can never be sure that at the moment you are patching in the Detour, another thread isn't in the middle of executing an instruction that overlaps the first five bytes of the function"

    Is it any different with the MOV instructions?

    [You can patch two bytes atomically. -Raymond]
  21. Owen says:

    Also, and perhaps most importantly (it is possible to freeze all the threads of a process) it is not possible for a thread to be in the middle of a single instruction

  22. Koro says:

    "Who are these people who keep trying to patch code they didn't write?!"

    That would be me, patching CreateFontIndirectExW in some processes that pass LOGFONT.lfQuality = CLEARTYPE_QUALITY, despite me having explicitely disabled ClearType, to correct their error.

  23. David Walker says:

    Some applications which aren't yet installed do seem to get security patches downloaded and added to some list of updates somewhere.  I am pretty sure that I have seen security updates for MS Works and some other, popular, non-MS programs which I didn't actually have installed, downloaded.  I figured that the updates were downloaded so that if and when I installed those programs, the patches would be there.  (Although somehow those patches have to actually get installed.  Maybe I'm confusing this with shims, which are listed in the registry whether those programs exist on the target system yet or not.)

    Updates for Office programs are (mostly) not downloaded until after you install Office.  At least, I know that once you install Office and check Microsoft Update, you'll see a bunch of updates for Office.  

    It's an interesting area, and I'm sure it's tricky to manage.

  24. JamesNT says:

    @Raymond Chen,

    Please, for the love of all things Holy, keep this train of thought going in future blogs posts.  I would love to read more about how Windows patching works.  

    I come from the old NT days so you can probably imagine why I am curious.

    JamesNT

  25. Rafael Rivera says:

    Thought: What would the savings be, in terms of perf/memory use, if those bytes were deduped system wide? :)

  26. MalcolmM says:

    @Rafael: the small savings would far outweigh the convenience they provide :)

  27. Michael Edgar says:

    This is why the multi-byte NOP instruction was invented. Unfortunately, new general-purpose instructions like multi-byte NOP are most likely to be desired by those who can't use them for backwards compatibility. A shame!

  28. kinokijuf says:

    @David Walker: that's because Office installs Works converters.

  29. dsquid says:

    @Raymond Chen,

    Great article as usual – thanks very much for the insight!

    To your question about "who are these people hooking other people's code?!" – in our case, we hook other people's software to monitor its behavior and to layer "modern" functionality on top. Our target market is a large, established install base of a number of applications developed by other ISVs for particular vertical markets in the early 90's (!!) which are still in wide use today in industry.

    We carefully study them and then do things like, for example, hooking TextOutA to observe what they're writing to the screen…combined with some relative positioning code it's pretty straightforward to know that "Oh, he's in state X because the app's hand-drawn status bar says Y"

    It's definitely not the "best" way to approach this (wouldn't it be nice if everyone re-wrote their legacy apps with APIs and then re-deployed them to the hundreds of thousands of installed sites?!) but if you want to add value to existing installations which are going nowhere fast (these setups typically cost $20-$30K each), hooking is the way.

    Another canonical example is a pokerbot – see http://www.codingthewheel.com/…/how-i-built-a-working-online-poker-bot-7

    [I was kind of jaded by seeing all the questions on stackoverflow of the form "I'm having trouble hooking this API / injecting code / patching this running application". It looks like 25% of all Windows applications exist in order to hook other Windows applications… -Raymond]
  30. Miguel says:

    Here is a workthrough of how Windows applies such hotpatches:

    jpassing.com/…/windows-hotpatching-a-walkthrough

  31. Georg Wicherski says:

    What would be way more interesting is to know, why "mov edi, edi" and not "lea esi, [esi]" or any of the other "well known" two-byte NOPs?

  32. Joshua says:

    [Go ahead and compile your apps with the /hotpatch flag. -Raymond]

    Now that's good to know.

    [It's your app – you can compile it any way you like. Put 50 nops at the start of each function if you like. My point was that the hotpatch points in the OS are for OS servicing. If an app tries to use them to patch the OS, then when an actual hotpatch needs to be applied, it will fail (or worse, crash the app). -Raymond]
  33. f0dder says:

    "Hot-patching is not an application feature. It's an OS internal feature for servicing. Who are these people who keep trying to patch code they didn't write?!" – I agree wrt. applications trying to patch the OS, but it would've been nice if we could use it for our *own* applications.

    [Go ahead and compile your apps with the /hotpatch flag. -Raymond]
  34. Gabe says:

    How often (or under what circumstances) does hotpatching actually occur? I am wondering how likely it is for a rogue hotpatcher to be affected by a legitimate OS hotpatch.

  35. Owen says:

    Rather than MOV EDI, EDI or LEA ESI, [ESI], I would have personally picked XCHG AX, AX (66h 90h). I can only guess that Microsoft must have had their own reasons for picking MOV EDI, EDI after lots of testing; do some CPUs perhaps choke on XCHG r/m16, r16 perhaps? Its easily plausible it would cause older CPUs to jump into microcode.

  36. Good Point says:

    "Was hotpatching actually ever used in any update?"

    According to these guys, since Server 2003 SP2 it has not been used.

    jpassing.com/…/windows-hotpatching

    http://www.itwalkthru.com/…/hotpatching-great-idea-microsoft-but.html

  37. Myria says:

    In addition to the /hotpatch compiler option, you need the /functionpadmin linker option in order to insert the 5 (x86-32) or 8 (x86-64) bytes before the mov edi, edi of each function.

    My question is, how does Windows handle the case where the "mov edi, edi" crosses a quadword boundary, or even worse, a page boundary?

    [Presumably the linker knows not to do that. (See: ALIGN directive.) -Raymond]

Comments are closed.