The compiler can make up its own calling conventions, within limits

A customer was confused by what they were seeing when debugging.

It is our understanding that the Windows x86-64 calling convention passes the first four parameters in registers rcx, rdx, r8, and r9. But we're seeing the parameters being passed some other way. Given the function prototype

int LogFile::Open(wchar_t *path, LogFileInfo *info, bool verbose);

we would expect to see the parameters passed as

  • rcx = this
  • rdx = path
  • r8 = info
  • r9 = verbose

but instead we're seeing this:

rax=0000000001399020 rbx=0000000003baf238 rcx=00000000013c3260
rdx=0000000003baf158 rsi=000000000139abf0 rdi=00000000013c3260
rip=00007ffd69b71724 rsp=0000000003baf038 rbp=0000000003baf0d1
r8=0000000001377870  r9=0000000000000000 r10=000000007fffffb9
r11=00007ffd69af08e8 r12=00000000013a3b80 r13=0000000000000000
r14=0000000001399010 r15=00000000013a3b90
00007ffd`69b71724 fff3        push rbx
0:001> du @rdx          // path should be in rdx
00000000`03baf158  "`"
0:001> du @r8           // but instead it's in r8
00000000`01377870  "C:\Logs\Contoso.txt"

Is our understanding of the calling convention incomplete?

There are three parties to a calling convention.

  1. The function doing the calling.
  2. The function being called.
  3. The operating system.

The operating system needs to get involved if something unusual occurs, like an exception, and it needs to go walking up the stack looking for a handler.

The catch is that if a compiler knows that it controls all the callers of a function, then it can modify the calling convention as long as the modified convention still observes the operating system rules. After all, the operating system doesn't see your source code. As long as the object code satisfies the calling convention rules, everything is fine. (This typically means that the modification needs to respect unwind codes and stack usage.)

For example, suppose you had code like this:

extern void bar(int b, int a);

static void foo(int a, int b)
  return bar(b + 1, a);

int __cdecl main(int argc, char **argv)
 foo(10, 20);
 foo(30, 40);
 return 0;

A clever compiler could make the following analysis: Since foo is a static function, it can be called only from this file. And in this file, the address of the function is never taken, so the compiler knows that it controls all the callers. Therefore, it optimizes the function foo by rewriting it as

static void foo(int b, int a)
  return bar(b + 1, a);

It makes corresponding changes to main:

int __cdecl main(int argc, char *argv)
 foo(20, 10); // flip the parameters
 foo(40, 30); // flip the parameters
 return 0;

By doing this, the compiler can generate the code for foo like this:

        inc     ecx
        jmp     bar

rather than the more conventional

        mov     eax, edx
        inc     eax
        mov     ecx, edx
        mov     edx, eax
        jmp     bar

You can look at this transformation in one of two ways. You can say, "The compiler rewrote my function prototype to be more efficient." Or you can say, "The compiler is using a custom calling convention for foo which passes the parameters in reverse order."

Both interpretations are just two ways of viewing the same thing.

Comments (21)
  1. Ben says:

    I think the conventional example should read "mov edx, ecx; mov ecx, eax". I do love these compiler optimisation examples, I would never have thought of this one.

  2. Joshua says:

    Now that we are finally back on calling conventions, I noticed the calling convention requires __chkstk to be called, but it does not seem to be exported from system DLLs.

    It is exported from the C standard library, but if you're not linking against it anymore…

    [Why would a system DLL export __chkstk? Then you would have people linking to the wrong __chkstk. -Raymond]
  3. pc says:

    Was the different calling convention causing a problem, or just confusion as they debugged through the assembly and registers?

    Presumably compilers always use the standing convention when compiling a function for use by an external module.

    Do compilers ever do something weird for a function marked as callable from an external module, where the external uses get one address which uses the standard convention, and internal uses go to a different address which uses an optimized convention? Kind of like inlining?

    Actually, I would expect a clever compiler to optimize the "foo" function out completely and just inline it.

    [This was obviously a simplified example. The real foo was not inlined because it was large, or it had multiple callers, or something. -Raymond]
  4. mikeb says:

    And of course the extreme example of this is when compilers inline functions calls – sometimes even for functions that are not explicitly marked as inline in the source file.

  5. CarlD says:

    @pc I believe that when PGO is used it's possible for the compiler to compile a single source function into several different binary forms depending on the caller.  It's definitely possible to get inline and out of line versions of a function even without PGO.

  6. Joshua says:

    [Why would a system DLL export __chkstk? Then you would have people linking to the wrong __chkstk. -Raymond]

    How can there be a wrong __chkstk if it's effect is specified in the ABI?

    [Why would you link to some random system DLL's __chkstk (that isn't part of that system DLL's contract) when you could link to the one in the C runtime (which is part of the C runtime's contract)? Maybe a future version of the OS gets rid of that system DLL. Should system DLLs also export malloc? -Raymond]
  7. Myria says:

    The x86-64 unwind information in .pdata allocates 4 bits for specifying which registers are saved on the stack and which register is the frame pointer.  Since this can specify the full range of register numbers, does this mean that it is legal to use what is normally a volatile register as a frame pointer in a function that doesn't call any other functions?  (This in theory could be useful for a leaf function that calls _alloca/__chkstk).

  8. Myria says:

    @Raymond: (Regarding reply to Joshua) How does the stack unwind system recognize __chkstk itself?  Does it look for __chkstk's binary pattern, or does __chkstk itself follow the standard x86-64 ABI rules?  If its binary pattern is recognized specifically by RtlVirtualUnwind, that would make it impossible to properly implement alloca in a third-party compiler without infringing Microsoft's copyright, since msvcrt.dll does not export it.

    [It doesn't need to do anything special to recognize __chkstk. On x64, __chkstk is a normal function with standard stack usage, unlike x86. On x64, the caller does the "sub rsp, n". Therefore, the existing unwind codes describe it adequately. -Raymond]
  9. Joshua says:

    @Myria: __chkstk violates the /normal/ ABI pretty hard by being called inside the function prolog.

    [Maybe a future version of the OS gets rid of that system DLL]

    Can't get rid of kernel32.dll. Interesting that it is currently exported from 64 bit kernel32.dll (but not 32 bit kernel32.dll) and not documented that it is so. I suppose this means the export is allowed to disappear in the future. IE8 was discovered to be linked against it, but that won't pose a problem for future Windows versions.

  10. So … can we implement our own __chkstk if we aren't using Visual Studio's C runtime?

    [You're asking the wrong person. -Raymond]
  11. philiplu says:

    __chkstk should be a statically linked even when linking to the DLL version of the CRT.  You can always just extract the chkstk.obj object from the library and link against that.

    In general, you can't completely avoid the C runtime when it's used to provide compiler support, like __chkstk or the structured exception handler support.

  12. @Raymond: oh, I wasn't expecting you to answer; just thought someone might know.  And as it turns out I was confusing __chkstk with the /GS (buffer security check) functions anyway. :-)

    @philiplu: thanks, that sounds sensible – and I wonder if I can do the same for the cookie functions?  So far I've been able to avoid needing __chkstk by virtue of using mainly static variables, and I just turn the /GS option off, but it is always good to have options.  (These are small-to-trivial applications, which don't take untrusted input, so the risk is minimal.)

  13. JM says:

    @mikeb: "sometimes even for functions that are not explicitly marked as inline in the source file"? I should *hope* my compiler isn't so dense as to require me to sprinkle "inline" around. "inline" is a relic from more innocent times — if you need to tweak inlining at all these days it's typically to explicitly mark a function as *not* to be inlined.

  14. philiplu says:

    You can probably get the /GS support working without the rest of the CRT, but it's more involved than, say, __chkstk.  I wrote most of that support code about a decade ago, but left Microsoft 7 years ago, so I'm not sure how much it's changed since my time there.  The good news is it appears that the CRT source code now includes the necessary source files (gs_*.c and a few others, probably).  We didn't use to ship all the support files, especially for things like SEH and C++ EH support.

  15. Neil says:

    So, as optimised code becomes ever harder to debug, and unoptimised code becomes ever slower (it's all very well using templates instead of macros but in full debug builds they become out-of-line function calls), is there a happy medium?

  16. not an anon says:

    @Neil — yes there is.  Modern compilers are developing the ability for their optimizer to emit special debug data that tells the debugger what the optimizer did to variables; this, along with improvements to unwinding (in particular, the ability to unwind without an explicit frame pointer), helps when debugging optimized code.

    (If you want to know more, look up the GCC -fvar-tracking and -fvar-tracking-assignments options.)

  17. Rick C says:

    "[You're asking the wrong person. -Raymond]"

    Nah.  You are able to tell him he CAN implement it himself:  the answer, trivially, is "yes".  Whether it's an ill-advised idea is another matter. :)

  18. Alex Cohn says:

    @JM one justified usage of `inline` with Microsoft compiler is to avoid "duplicate symbol" linker error.

  19. inliner says:

    To discussion about inline keyword: It has very little to do with inlining optimization. It means, there are several copies of this function (usually because function was defined in header file), but that's ok.

    It is related to inlining this way: to inline a function in a compilation unit, the function code needs to be available, usually achieved by defining it in a header file, so inline keyword is needed to avoid build errors.

  20. Random832 says:

    Could (or, rather, does) the compiler ever do both? That is, make an "optimized" entry point that does this, and a "standard" entry point that you get for external usage of the function or if you take the address?

    [I don't see why not. You could call it "shared inlining." -Raymond]

Comments are closed.

Skip to main content