Why hasn’t the API hook mechanism for x64 been standardized like it was for x86?


Joshua posted to the Suggestion Box, "Around the time of WinXP SP2 x86, the API hook mechanism was standardized. Why wasn't the same thing done for x64?"

Who said it was standardized for x86?

Hooking APIs is not supported by Windows. There may be specific interfaces that expose hooks (like Co­Register­Initialize­Spy to let you monitor calls to CoInitialize and CoUninitialize, and Set­Windows­Hook­Ex to let you hook various window manager operations) but there is no supported general API hooking mechanism provided by the operating system.

So I don't know where you got that idea from.

Comments (28)
  1. parkrrrr says:

    Perhaps your commenter views the Detours library (research.microsoft.com/…/detours) as some sort of standardization. Though if that's the case, then it has been standardized for x64 as well, as of the latest version.

    (Having just worked on some API hooking of x64 code myself a couple weeks ago, I can say it's significantly more interesting than hooking x86 code. There's no 64-bit immediate JMP instruction, and there's not necessarily 64 bits worth of NOP instructions before function entrypoints. Good luck!)

  2. Justin says:

    But what about research.microsoft.com/…/detours I thought that was the 'standard' (of sorts).

    [You seem to have confused "standard" with "de facto standard". And I'm not even sure detours is a de facto standard, seeing as how there are so many viable competing hooking mechanisms. -Raymond]
  3. John says:

    It's an informal standard (the best kind).  Windows x86 is going to "support" this until the end of time.

  4. dave says:

    "Detours Express 3.0 is available for immediate download under a no-fee, click-through license for research, non-commercial, and non-production use.  Detours Express is limited to 32-bit processes on x86 processors."

    Any bets on whether it shows up in production use?

  5. Anonymous Coward says:

    I've always found this to be a bit of a weakness in the design of the Win32 dynamic linker. There is no easy way to start a process with substituted DLLs or functions. If the interface they export is the same, it should be possible, I think, just like in COM where you can pass any old object to a function as long as it supports the interface the function wants. In other words, Windows starts with something that could have been highly modular, and turns it into what is in effect a very monolithic system.

  6. Anonymous Coward says:

    Matt, that doesn't qualify as easy. First off the ACT doesn't work on my computer (system requirements + Dotnet dependency) and secondly I've looked at that before and couldn't find much information on how to create your own shims. In the end it was easier to either patch the relevant executable modules, or hook the relevant functions in memory, depending on whether an external process or the current process must be subjected to the switcheroo.

  7. Joshua says:

    [You seem to have confused "standard" with "de facto standard".] -Raymond.

    That was also my mistake. Given the change in SP2 to the function entry points to allow atomic hooking in most cases, combined with the release of Detours around the same time, I though it was intentional.

    [Unlikely, especially considering that Detours doesn't use that technique! -Raymond]
  8. Michael Grier [MSFT] says:

    @Joshua:

    The codegen change was to enable hot patching, not hooking.  You can choose to look at hot patching as a form of hooking but you'll find absolutely no support for taking advantage of these changes for things other than the OS patching itself.  In fact you'll find no support for the behavior of the system once you start down this path.  You should view code that has this kind of hooking behavior in a similar fashion to software packages that replace system components.

  9. Joshua says:

    [Unlikely, especially considering that Detours doesn't use that technique! -Raymond]

    I couldn't agree to the license, so I'd never actually seen the innards of Detours.

  10. Detours Express 3.0 for x64 processes please..until then it will never become a "standard".

  11. I wish they had a version of Detours priced more reasonably for smaller applications and developers.  Detours Professional is too expensive, and Detours Express is not licensed for any type of commercial use.

    Instead I set up my own API hooking mechanism based on sample source code from Jeffrey Richter's book.  But that mechanism isn't perfect; I never got it working quite right once multiple threads were involved – I had to use Sleep to prevent deadlocks (horrors! but I had to move on).

    It's hard to justify a $10,000 expense on hooking just "one little API" to fix some minor undesired behavior – especially when I achieved practically the same thing with only a few hundred lines of code – albeit with a risk of deadlock.  Would I prefer a more production-proven version?  Of course – but not for $10,000.

    Would I rather avoid hooking an API?  Of course.  In an ideal world, I'd have full source code to all of my app dependencies, to add missing features/fix bugs.  But I don't, and so sometimes the last-resort alternative has to be an API hook.

  12. Pablo says:

    You can use Deviare API Hook that supports x32 and x64:

    http://www.nektra.com/…/deviare-api-hook-windows

  13. @JamesJohnson says:

    Using a hack to fix broken components is how horrendous hacks get introduced into the codebase. When you start down this path there will be someone who curses the day you were born when they inherit your code.

    The correct thing to do is file a bug with the people who wrote the component. If they don't answer your calls, deprecate the use of the component – you can't have your app rely on components that aren't supported or updated when you find bugs.

  14. Paul Shmakov says:

    Another alternative to Detours is an open source EasyHook library http://easyhook.codeplex.com

  15. Pablo says:

    James, I understand your point but it isn't always an alternative. I work supporting legacy applications in different platforms and you can't believe the level of lock-in that enterprises have to some technologies.

    They have to keep things working and they don't migrate their systems just because they cannot make these applications work in the new environments.

  16. @James Johnson says:

    > Using a hack to fix broken components is how horrendous hacks get introduced into the codebase.

    Sometimes it's also how products get a v1.0 released six months before the competition and takes all the market instead of getting the company on the long list of chapter 11 appliants.

  17. Actually, in this case, the correct thing to do is rewrite much of the software in question.  Maybe this will happen at some point, but it would be a significant, time-consuming, and expensive undertaking.  Until then – yes, it is a hack.  Along with many other unpleasant hacks already in the codebase that I have to deal with.

    Since that hasn't happened yet, I did what I did to work around an issue in a 12-year-old legacy component used in a similar 12-year-old legacy development environment that (1) we don't have source code to, (2) the vendor of said component has been bought out twice, and would have no interest in modifying such an old component when newer components exist that do the same task.

    Someday, it will be moved to something newer, and the old component and API hooking can then be eliminated.

  18. Anonymous says:

    There's no 64-bit immediate JMP instruction,

    I assume you can still push a qword and then RET…

    and there's not necessarily 64 bits worth of NOP instructions before function entrypoints.

    In the general case (not just what a particular compiler generates), I'm not sure I see why this isn't a problem on x86 too…  In general rewriting binary code is not easy.

  19. parkrrrr says:

    @Anon: You can push a 64-bit register, but you cannot push an immediate 64-bit value (the 64-bit immediate push instruction sign-extends a 32-bit immediate operand. And no, you can't just push two 32-bit values. There's no 32-bit PUSH instruction in 64-bit mode.) So now you have to push a register, load it with a 64-bit value, push that, then RET to a function with a custom prologue that will pop that register back. This turns out to be a lot of bytes. And all this ignores the fact that you might need to rewrite an immediate CALL or JMP instruction in the code you overwrote. (USER32!SetCaretPos has one of those.)

    Obviously, though, it's not impossible. I've got working code that does it in the limited cases for which we needed it.

    In the general case, not having enough NOPs could be a problem on x86. In the specific case of hooking OS functions, it's not because of the hot patching stuff referred to above.

  20. Joshua says:

    @parkrrr: 12 bytes

    mov rax, immed64

    jmp rax

    At function prolog, rax may be scribbled safely if the normal calling convention is in use.

    The 64 bit function prologs seem to be almost always hotpatchable as though there was an intended way to do it, but it changes from one version to the next (and one routine to the next).

  21. Random832 says:

    @AC "If the interface they export is the same, it should be possible, I think, just like in COM where you can pass any old object to a function as long as it supports the interface the function wants."

    Functions aren't objects, they aren't interfaces, and they aren't modules. What happens if you replace HeapAlloc but not HeapFree? And you don't know how the various functions might interact with each other internally, or what undocumented data structures they might touch.

  22. @Random832:

    Well, if someone wanted to shim a function then they would have to know what they are doing and make any relevent shims too. I guess this is something that should be rather obvious, probably was to AC too, but this obvious restraint wasn't quite as obvious for you.

    I also think he was heavily hinting at third party DLLs more than hooking Windows functions. I know the theory is the same, it is hard to know how things work internally, but there could be mitigating circumstances, like an ex-employee who knows the application.

    Anyway.

    @AC:

    Yes, theoretically, if you match up calling conventions and parameters then it would be possible to hook a function that way. This is how IAT patching works. Of course, IAT patching has its own problems (like how you have to patch before you call the function once).

  23. Myria says:

    Raymond's right on this one.  For everyone's sake, don't hook functions in production software.  Debugging the crap that comes up because some applications like hooking Windows APIs is really annoying.  I get asked to figure this stuff out on occasion, because not many programmers understand how it all works.

    The only time I've ever done API hooking in production non-diagnostic software was to hook KiUserExceptionDispatcher in Windows 2000 to support vectored exception handling until our customers could upgrade to XP or later.  The hook code wasn't run in later versions of Windows, instead calling the proper API (AddVectoredExceptionHandler).

  24. Antariy says:

    > Joshua posted to the Suggestion Box, "Around the time of WinXP SP2 x86, the

    > API hook mechanism was standardized. Why wasn't the same thing done for x64?"

    >

    > Who said it was standardized for x86?

    I suppose he means a hotpatching feature, i.e. a link-time padding before

    functions – well known "add edi,edi". This takes 2 bytes, plus 3 bytes of the

    standard prologue "push ebp / mov ebp,esp" in total there is a space for 5 bytes

    relative jump "db 0E9h, xx, xx, xx, xx" which would be written by hooking software

    to redirect the execution path to its own code. With having this de facto

    "standard" it is easy to pass the control back from the hook to the original code

    – just execute standard prologue and jump to "original_function_entry_point+5".

    Of course, the way of functions padding is not the true standard, but it actually

    simplifies hooking routines – there is no need to include an instructions length

    disassembler into code which writes redirection.

    As for x64 – there is no such a simple way to write a redirection, because of

    *still-4GB-relative* nature of the "jmp xxxxxxxx" instruction. The hook code

    injected into the process could theoretically be at the address which lies far

    away from the +2GB/-2GB bounds relative to the function entry point.

    So, there is need in some other code, like:

    push rax

    mov rax,1234567890ABCDEFh

    xchg rax,[rsp]

    ret

    In this example XCHG #LOCK delay will not noticeable affect the performance

    especially if hooking code does "heavy" logging I/O etc, also this code

    invalidates the CPU backtrace cashe (it uses RET with the address was not pushed

    by a call), so it will have more performace impact that "jmp xxxxxxxx" anyway.

    It does preserve RAX, but takes 13 bytes (with the prologue size difference),

    which probably decided being "too much to write some padding junk of this size" :)

    [The question is, does the hook function itself contain the "mov edi, edi" instruction + 5 nop bytes? And when the hook function unhooks, does it look at its own nop bytes and copy them to the original hooked function? Without that, you still have the problem that you can't have two people hooking the same function. The "mov edi, edi" and 5 nop bytes are for servicing, not for hooking. -Raymond]
  25. Joshua says:

    [And when the hook function unhooks]

    Since when does this kind of hook unhook? I understood it as being as one-way as a TSR hook from the DOS days.

    [The "mov edi, edi" and 5 nop bytes are for servicing]

    Since the only thing I'm using it for is fixing bugs that I'm waiting for a fix for, I could call this servicing.

    On a slightly different tack, Raymond is wise to not dive too deep into the debate between hook and no-hook. Any debate whose origin is idealist vs. pragmatist cannot be won by either side.

    [No, you're not servicing. You're hooking. If a hotfix comes in that wants to service the function (say to fix a security flaw), it will hotpatch the function and collide with your hook. Result: Who knows!? -Raymond]
  26. Guest says:

    IIRC some Sysinternals tools uses Detours, is that 'production use' ?

Comments are closed.