The history of calling conventions, part 5: amd64


The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64).

The AMD64 takes the traditional x86 and expands the registers to 64 bits, naming them rax, rbx, etc. It also adds eight more general purpose registers, named simply R8 through R15.

  • The first four parameters to a function are passed in rcx, rdx, r8 and r9. Any further parameters are pushed on the stack. Furthermore, space for the register parameters is reserved on the stack, in case the called function wants to spill them; this is important if the function is variadic.

  • Parameters that are smaller than 64 bits are not zero-extended; the upper bits are garbage, so remember to zero them explicitly if you need to. Parameters that are larger than 64 bits are passed by address.

  • The return value is placed in rax. If the return value is larger than 64 bits, then a secret first parameter is passed which contains the address where the return value should be stored.

  • All registers must be preserved across the call, except for rax, rcx, rdx, r8, r9, r10, and r11, which are scratch.

  • The callee does not clean the stack. It is the caller's job to clean the stack.

  • The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address, this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in order to restore 16-byte alignment.

Here's a sample:

void SomeFunction(int a, int b, int c, int d, int e);
void CallThatFunction()
{
    SomeFunction(1, 2, 3, 4, 5);
    SomeFunction(6, 7, 8, 9, 10);
}

On entry to CallThatFunction, the stack looks like this:

xxxxxxx0 .. rest of stack ..
xxxxxxx8 return address <- RSP

Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack frame, which might go like this:

    sub    rsp, 0x28

Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.

xxxxxxx0 .. rest of stack ..
xxxxxxx8 return address
xxxxxxx0   (arg5)
xxxxxxx8   (arg4 spill)
xxxxxxx0   (arg3 spill)
xxxxxxx8   (arg2 spill)
xxxxxxx0   (arg1 spill) <- RSP

Now we can set up for the first call:

        mov     dword ptr [rsp+0x20], 5     ; output parameter 5
        mov     r9d, 4                      ; output parameter 4
        mov     r8d, 3                      ; output parameter 3
        mov     edx, 2                      ; output parameter 2
        mov     ecx, 1                      ; output parameter 1
        call    SomeFunction                ; Go Speed Racer!

When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the second call, then, we just shove the new values into the space we already reserved:

        mov     dword ptr [rsp+0x20], 10    ; output parameter 5
        mov     r9d, 9                      ; output parameter 4
        mov     r8d, 8                      ; output parameter 3
        mov     edx, 7                      ; output parameter 2
        mov     ecx, 6                      ; output parameter 1
        call    SomeFunction                ; Go Speed Racer!

CallThatFunction is now finished and can clean its stack and return.

        add     rsp, 0x28
        ret

Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to reserve parameter space and keep re-using it.

[Updated 11:00am: Fixed some places where I said "ecx" and "edx" instead of "rcx" and "rdx"; thanks to Mike Dimmick for catching it.]

Comments (32)
  1. Anonymous says:

    (Shouldn’t some of those references to ecx and edx be to rcx and rdx, i.e. doubleword registers?)

    I assume that using a single subtraction to adjust the stack for the whole duration of the function – including function call parameters – simplifies the exception unwind procedure.

    Context: SEH exceptions on AMD64 (for 64-bit programs) are table-based, NOT based on an exception handler chain at fs:[0] as on x86. Raymond, any idea why x86 is the only architecture which uses this frame-based exception handler chain?

  2. Anonymous says:

    Actually, that ecx and rdx is correct, per rule 2: "Parameters that are smaller than 64 bits are not zero-extended." Since the parameters are 32-bit integers, the values are passed in ecx, ecx, r8d and r9d. (r8d and r9d are the pseudo registers that represent the bottom 32 bits of the 64-bit r8 and r9 registers.)

    As to why x86 is the only platform that uses frame-based exception handling: I have no idea. Just further evidence that x86 is the weirdo.

  3. Anonymous says:

    Oh wait in the discussion paragraphs I used ecx and edx instead of rcx and rdx, right. Good catch, Mike.

  4. Anonymous says:

    You can only use a table-based exception handling scheme when you can reliably walk up the stack at any point. That’s not possible on x86 without breaking backward compatibility, given the profusion of private calling conventions used in code written in asm. I’ve always assumed that the original NT design team didn’t switch to a table-based scheme at the start because they needed to make it easy to port such code.

  5. Anonymous says:

    (you still use ecx/edx in the code)

    Are the XMM registers ever used for parameter passing?

    Which function (caller/callee) should save which XMM registers?

  6. Anonymous says:

    ecx/edx: I discussed this in an earlier comment: http://weblogs.asp.net/oldnewthing/archive/2004/01/14/58579.aspx#58683

    I do not believe that the XMM registers are involved in parameter passing. I don’t have the XMM rules memorized; I’ll lok them up when I get back from vacation.

  7. Anonymous says:

    Oops, sorry about that, Raymond :(

  8. Anonymous says:

    In case anybody was still keeping score: I looked up the XMM rules. The XMM registers are used for passing floating point parameters. XMM0 through XMM3 receive the first four floating point parameters. They, as well as XMM4 and XMM5 are scratch. XMM8 through XMM15 are preserved.

  9. Anonymous says:

    Raymond, I believe the second diagram is incorrect, the return address should be after the arguments as such:

    xxxxxxx8 .. rest of stack (minus arg area) ..

    xxxxxxx0 (arg5)

    xxxxxxx8 (arg4 spill)

    xxxxxxx0 (arg3 spill)

    xxxxxxx8 (arg2 spill)

    xxxxxxx0 (arg1 spill) <- “.. rest of stack ..” from first diagram

    xxxxxxx8 return address <- RSP upon entry to callee

    It is implied in your comments, but worth calling out independently that leaf frames are not required to align the stack to 16byes.

  10. Anonymous says:

    Actually the diagram is correct – we’re just drawing different diagrams. My diagram is the stack layout of the caller *before* it calls the child function (and what’s more, a function that requires no stack slace for locals). Your diagram is the stack layout of the child function immediately *after* the "call" instruction.

    Both diagrams are correct; they are just diagrams of different things.

    Good point about the stack alignment at the leaf.

  11. Anonymous says:

    Ahh — I see, I misread the diagram as being that upon entry to the callee… My bad!

  12. Anonymous says:

    Is there any technical documentation that goes over the AMD64 calling convention described in this blog? Not that the information above isn’t good enough, just that it’d be nice to see a reference. =)

  13. Anonymous says:

    The history of calling conventions

  14. Anonymous says:

    Commenting closes after two weeks. (Okay, I was late to this one.)

    http://weblogs.asp.net/oldnewthing/archive/2004/02/21/77681.aspx

  15. Anonymous says:

    Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall….

  16. Anonymous says:

    Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall….

  17. Anonymous says:

    Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall….

  18. Anonymous says:

    Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall….

  19. Anonymous says:

    Official (though preliminary) documentation.

  20. Anonymous says:

    I just passed a milestone: I just had my first experience where I desperately needed an answer and search…

  21. Anonymous says:

    As you know if you are a Visual Studio Team System user, we provide two types of profilers with the product;

  22. Anonymous says:

    As you know if you are a Visual Studio Team System user, we provide two types of profilers with the product;

  23. Anonymous says:

    Jeez, you expect an answer in 12 minutes?

Comments are closed.