Runtime Code Patching – Not for the Faint of Heart


I have been involved in several conversations recently that have revolved around the joys of runtime code patching. I am always shocked to hear people say that they are ok with this idea of code patching at runtime. Moreover – it shocks me that they think it is easy to get right! I do think that code patching can have its place in a system if it is implemented correctly and if its intent and semantics are fully disclosed to any potential users.


So what exactly is code patching? There are tons of examples of it on the web (mostly hacker sites! 😀 ) and many books (again – mostly hacker stuff) that describe it in detail – just “Live Search” “runtime code patching” , but basically patching is the process of replacing a set of op codes in memory with a different set of op codes – at runtime. This seems simple enough but the devil is in the details. Let’s dig in!


Consider these op codes from a simple and mostly useless function:


   01001175 8bff             mov     edi,edi


   01001177 53               push    ebx


   01001178 50               push    eax


   01001179 53               push    ebx


   0100117a 51               push    ecx


   0100117b 52               push    edx


   0100117c 5a               pop     edx


   0100117d 59               pop     ecx


   0100117e 5b               pop     ebx


   0100117f 58               pop     eax


   01001180 5b               pop     ebx


   01001181 c3               ret


 


This code is kind of silly – but it has nice properties that expose the problems with patching, more on that in a bit. It is also made up of very commonly generated op codes. The format of the assembly listing is:


 [address] [op codes] [mnemonics for op codes]


 


The typical reason for installing a patch is to either bypass or modify the behavior of an existing function. Therefore the canonical patch is “op codes for a jmp to an absolute address”, written over the beginning of an existing routine like so:


 


   01001175 ea871100011b00   jmp     001b:01001187


   0100117c 5a               pop     edx


   0100117d 59               pop     ecx


   0100117e 5b               pop     ebx


   0100117f 58               pop     eax


   01001180 5b               pop     ebx


   01001181 c3               ret


 


Here we replaced the first few instructions of our function with an absolute jump to some other code, presumably code that we wrote and loaded into the system in some way.


 


So now that we know what a patch is – what is the problem? This seems like it will work? What exactly is it that we need to worry about?  Well, the most obvious thing we have to worry about is another thread running the code that we are patching, while we are patching it. The reason we need to worry about this is because if we modify the code that another thread is running, while it is running it – it will crash, or at the very least do very strange things. We need to make sure that any thread running the code we are patching either executes all of the old op codes or just the new one – but never a combination of the two. You can imagine a scenario like this:


 


1.       Thread T1 executes the first instruction from the old op codes at address 0x01001175


2.       The old op codes are overwritten with the new "jump" op code, by thread T2


3.       Thread T1 executes the op code at 0x01001177, which is now an address in the middle of the jump instruction that T2 wrote over old op codes


 


This would be really bad! J


 


So what to do? The next logical progression in analyzing the issue goes like this:


 


“Well, the problem is that we have a race with other threads running the code we are patching,  so I’ll just make sure that no other threads are in that code when I do the patch!”.


 


 Perfect! Well, not quite. Although we can corral all the other running processors (using DPCs, IPIs, etc.) and make sure that none of the running threads are in the code we are going to patch. But believe it or not this still isn’t enough to make our patch solid! To clarify, let’s restate the problem with our patching approach: we need to synchronize with all of the other threads currently running the code we are tying to patch. However, the latest incarnation of methodology, will only cause us to synchronize with all the threads currently running on other processors. What about the threads that aren’t running? Is there any synchronization required there? Actually and unfortunately – yes. The problem is – we can have threads that aren’t currently running but that have the address of the old op codes, where our new instruction is now residing, saved away for future use. This can occur if a thread was context swapped while executing the code we want to patch (when a thread is context swapped, the current instruction pointer is saved away so that it can be restored when the thread runs again). Other things that can cause the instruction pointer to be saved away are: exceptions, interrupts, etc..


 


So what is the moral of the story here? Don’t patch code? Well that may be a little extreme, but the moral is at least to never patch multiple instructions. Working in the Windows Online Crash Analysis (OCA ) data for a long time now, I can tell you that I have seen many failed attempts at doing this correctly that have ended in a BSOD. J


 


In fact, if you look at post Server 2003 Windows binaries, you will notice that the generated code follows this format almost exclusively:


 


7731a321 90               nop


7731a322 90               nop


7731a323 90               nop


7731a324 90               nop


7731a325 90               nop 


7731a326 8bff             mov     edi,edi


7731a328 55               push    ebp


7731a329 8bec             mov     ebp,esp


… <more op codes here>


 


This type of code generation allows for a patch to be installed safely at runtime. The 2 bytes at the start of the function (mov edi, edi) is enough space to hold the op code for a “relative short jump”, which be crafted to jump to the address of the 5 bytes of nop op codes. These bytes would have been previously overwritten with a “jmp ADDRESS” op code. This was a conscious change that was made to allow servicing of existing binaries on long running machines, without having to reboot to replace the binaries. The same rules apply though – the other processors must be corralled in order to make sure no thread is active in the code we want to patch at the time the patch is applied. Again – the reason this methodology works is because the op code boundaries match. In other words, either code will execute the “mov edi, edi” or the “short relative jump”. If it executes the former, then the routine runs normally through the code. If it executes the latter, then it will jump to the patch installed over the nop op codes. No problem here. There are issues with the instruction caches on processors as well, but those issues can be managed within the corralling mechanism. One last point is, we don’t have to worry about the “normal path” code running the patch that was written over the  nops because there is no path to that code except via the “short relative jump” that we wrote over the first instruction. Nice huh?!


 


Well, I hope this has been somewhat interesting or useful in some way. Please let me know if there are any other things that would be interesting to talk about.


Comments (11)
  1. Nice post, Jonathan – thanks for explaining  in a clear, concise fashion the issues revolving around this.

  2. eranb says:

    Very interesting post.

    Can you further elaborate on the caching issues involved? Also, why do we need to corral all running threads? Do we care if a thread is now running inside the old function?

    Thanks,

    Eran.

  3. theelvez says:

    Thanks for the comments molotov and Eran!

    Eran – here is one example of a cache issue with the instruction cache (from the Intel IA32 Processors Manuals):

    "For Intel486 processors, a write to an instruction in the cache will modify it in both

    the cache and memory, but if the instruction was prefetched before the write, the old

    version of the instruction could be the one executed. To prevent the old instruction

    from being executed, flush the instruction prefetch unit by coding a jump instruction

    immediately after any write that modifies an instruction."

    There are other processor dependent issues as well, but they all revolve around the fact that entry points into the new instruction op code can exist when the caches and memory get out of sync. Thanks!

  4. theelvez says:

    Eran – also about your second point – right – we don’t care if someone was already passed that patch area – we just need to manage the accesses to the area around the patch. Thanks.

  5. theelvez says:

    Daniel Pearson pointed out an error in my post that I want to share with everyone. I led you to believe (from my example disassembly at the end of the post) that the 5 bytes of nop opcode that get overwritten are at the end of the function. They are not – they are actually the 5 bytes preceding the function. It makes totally sense when you think about it – because the 2 byte jump instruction can only jmp 127/128 bytes in either direction respectively. Why is this important? Well if you had a function that was larger than 127 bytes you couldn’t reach your nop patch bytes if they were at the end! 🙂

    It is a subtle but very important point. Thanks Daniel!

  6. Eternal Idol says:

    The mov edi, edi is used by Microsoft Hotpatching.

  7. thenshesaid@msn.com says:

    >> So what is the moral of the story here? Don’t patch code? Well that may be a little extreme, but the moral is at least to never patch multiple instructions.

    I think that’s not enough. I saw the detour library by M$ used CopyMemory to copy op bytes, where CopyMeomory was finally spread to

    rep movsd;

    So even you just want to copy one single instruction, there is also possiblity that another thread gets CPU before all bytes are copied. (Also consider multi-core processor)

  8. Eternal Idol says:

    Sure but that could be easily fixed using instructions like cmpxchg8b.

  9. awana81 says:

    How does the detours library handle the caching issue?

  10. zhzhtst says:

    Great! Can you describe more detail about "Hot Patch" technology?

  11. Hi,

    The problem with such technique is that we need to make such as technique atomic as if it fails in between it can lead to opcode corruption.

    The only security concern is that such a technique can be abused by Malware Developers to create Metamorphicpolymorphic malware, which can evade AV.

    Nevertheless it is a good piece of blog

Comments are closed.

Skip to main content