What does "Hot Patchability" mean and what is it for?

I got a question on my earlier ABI post about Hot Patchability, so I thought I’d
go into excruciating detail on that one, since it’s not quite as complicated as
exception handling

What the heck does Hot Patchable mean?

Hot patchable means that, primarily, you’re able to take a running application and,
given sufficient privileges, atomically changes all calls from function A to switch
to function B. At a high level, this allows you to patch a running process without
requiring that the process be stopped [2 steps better than a reboot!] This is really
important in the “Server Up-Time” scenarios that are becoming more and more common.
It allows a patch to be deployed without even stopping a process [for more than
the length of a standard context switch, anyway].

Why should I care?

Maybe you won’t, but if you’re deploying server code, you probably don’t want to
have to terminate the process to deploy a security patch, right? Microsoft’s customers
really don’t like it when we make ’em restart their server processes.

Okay, I care, how does it work?

First, remember that I’m a compiler guy, not a debugger guy, or a kernel guy, or
anything else, but here’s my understanding: First, you build your patched function.
This is a non-trivial amount of work. It must have a compatible function
signature with the function it’s replacing. It also needs to have special code to
access any globals that are needed, since it’s generally loaded as a separate DLL
[though I imagine a tricky debugger guy could do some kind of nutty code injection
where that isn’t necessary]. Once you have the code authored, you have to deploy
the patch. This is where the ABI restrictions come in.

First, all functions must start with a 2 byte (or larger) instruction – if you start
with a push, stick a size prefix on it. Second, all functions must be preceded by
6 bytes of padding. Finally, any image that needs to be hotpatchable should also
have some amount of ‘scratch space’ within 2GB of it’s image location. Why, you
ask? Well, here’s how hot-patching actually works:

Pause the process and load your hot patch dll into the address space [again, I don’t
know all the mechanics for this, but I know it’s not too difficult. Next,
write to that ‘scratch space’ the address of your hot patch function. Now, write
the 6 bytes JMP [PC-relative scratch space] into the 6 bytes of padding
before the function you’re trying to replace. Finally, write the 2 bytes jmp PC-6
into the first two bytes of your function. Resume the process, and your hot patch
function is merrily running instead of the old one.

How does that work again?

The point of the 2 byte instruction at the start of each function is so that you
don’t ever have to worry about pausing your process in the middle of the two bytes
you’re going to change with the jmp PC-6 instruction. Nothing else
is really interesting. You setup a launch pad to your scratch space, which is where
the target address lives. No rocket science, here, nosiree.

What if I don’t want hot-patchability

Honestly, I don’t think there’s anything that prevents you from breaking these particular
rules, except the cost is so minimal, there’s really just no good reason not to
do it. X86 has something like a 30% hot-patchable kernel. And x64 has a 100% hot-patchable
kernel. You tell me which one is better.

Comments (7)

  1. nksingh says:

    This is a technically really cool solution, and I’m glad you explained how this works, but what is the real application for this?  It seems like if you’re hot patching more than one function at a time (which would be necessary for a significant change in program functionality), you’d have a lot of race-condition issues or versioning issues (new func A calls old func B and so on).  You might also have races in the single function case between multiple threads.  (I buy that this is probably all soluble by debugging api magic).

    In the end, though, who runs services which may not go down on a single node Windows machine?  Okay, maybe DataCenter edition users.  Who runs application code within a single process image which cannot be recycled??  Good work, but why?

  2. Kevin Frei says:

    You won’t have races, because you pause the process, load the DLL, patch all necessary functions, then resume the process.

    Why?  Because people don’t like disconnecting their users from network services.  I don’t like rebooting my machine.  I don’t want to have to shut all my apps down, restart the machine, then load them all back up.  It’s irritating, and for a mission critical application (Data center, whatever), the downtime translates to $$ lost.  Anyone else want to give a reason to not want to restart a process or a system?

  3. sharninder says:

    Or you let the customer use something like the patchpoint proxy by bluelane (bluelane.com). And no … I dont work for bluelane !

  4. Igor Levicki says:

    This would be ok if you haven’t advised people to use the size prefix (also known as LCP) — it slows instruction decoding on Core 2 CPUs in x64 mode.

  5. Bernard Lim says:

    Is this why I’m seeing a lot of "mov edi, edi" instructions at the start of many functions these days?

    correct me if I’m wrong, "mov edi, edi" is exactly 2 bytes?

    also, sorry if this sounds like a stupid question, how do you pause a process, through API calls?

  6. Kevin Frei says:

    Yes, that’s what they’re there for – x86 hot-patching.  x64 hot-patching is really cheap.  x86 hot-patching is not quite so cheap, because it’s full of 2 byte NOP’s…

    As far as the LCP on the opening PUSH, yes it slows down Core2’s by a single cycle.  But I challenge anyone to actually measure that impact in any real-world scenario…

  7. IInspectable says:

    Technically, it is not at all necessary to pause a process (unlike Detouring). The 2-byte no-op can be overwritten with a short jump instruction atomically, making this scheme completely thread-safe. You can hotpatch a process while it is running.

Skip to main content