Non-capturing C++ lambdas can be converted to a pointer to function, but what about the calling convention?


First, let's look at how lambdas are implemented in C++.

It is similar in flavor to the way lambdas are implemented in C#, but the details are all different.

When the C++ compiler encounters a lambda expression, it generates a new anonymous class. Each captured variable becomes a member of that anonymous class, and the member is initialized from the variable in the outer scope. Finally, the anonymous class is given an operator() implementation whose parameter list is the parameter list of the lambda, whose body is the lambda body, and whose return value is the lambda return value.

I am simplifying here. You can read the C++ language specification for gory details. The purpose of this discussion is just to give a conceptual model for how lambdas work so we can get to answering the question. The language also provides for syntactic sugar to infer the lambda return type and capture variables implicitly. Let's assume all the sugar has been applied so that everything is explicit.

Here's a basic example:

void ContainingClass::SomeMethod()
{
 int i = 0, j = 1;
 auto f = [this, i, &j](int k) -> int
    { return this->calc(i + j + k); };
 ...
}

The compiler internally converts this to something like this:

void ContainingClass::SomeMethod()
{
 int i = 0, j = 1;

 // Autogenerated by the compiler
 class AnonymousClass$0
 {
 public:
  AnonymousClass$0(ContainingClass* this$, int i$, int& j$) :
   this$0(this$), i$0(i$), j$0(j$) { }
  int operator(int k) const
     { return this$0->calc(i$0 + j$0 + k); }
 private:
  ContainingClass* this$0; // this captured by value
  int i$0;                 // i captured by value
  int& j$0;                // j captured by reference
 };

 auto f = AnonymousClass$0(this, i, j);
 ...
}

We are closer to answering the original question. but we're not there yet.

As a special bonus: If there are no captured variables, then there is an additional conversion operator that can convert the lambda to a pointer to a nonmember function. This is possible only in the case of no captured variables because captured variables would require an AnonymousClass$0 instance parameter, but there is nowhere to pass it.

Here's a lambda with no captured variables.

void ContainingClass::SomeMethod()
{
 auto f = [](int k) -> int { return calc(k + 42); };
 ...
}

The above code gets transformed to

void ContainingClass::SomeMethod()
{
 class AnonymousClass$0
 {
 public:
  AnonymousClass$0()  { }
  operator int (*)(int k) { return static_function; }
  int operator(int k) const { return calc(k + 42); }
 private:
  static int static_function(int k) { return calc(k + 42); }
 };

 auto f = AnonymousClass$0();
 ...
}

Okay, now we can get to the actual question: How can I specify the calling convention for this implicit conversion to a pointer to nonmember function?

(Note that calling conventions are not part of the C++ standard, so this question is necessarily a platform-specific question.)

The Visual C++ compiler automatically provides conversions for every calling convention. So with Visual C++, the transformed code actually looks like this:

void ContainingClass::SomeMethod()
{
 class AnonymousClass$0
 {
 public:
  AnonymousClass$0()  { }
  operator int (__cdecl *)(int k) { return cdecl_static_function; }
  operator int (__stdcall *)(int k) { return stdcall_static_function; }
  operator int (__fastcall *)(int k) { return fastcall_static_function; }
  int operator(int k) { return cdecl_static_function(k); }
 private:
  static int __cdecl cdecl_static_function(int k) { return calc(k + 42); }
  static int __stdcall stdcall_static_function(int k) { return calc(k + 42); }
  static int __fastcall fastcall_static_function(int k) { return calc(k + 42); }
 };

 auto f = AnonymousClass$0();
 ...
}

In other words, the compiler creates all the conversions, just in case. (The versions you don't use will be removed by the linker.)

But only for noncapturing lambdas.

Comments (24)
  1. kantos says:

    I suspect there is the implicit asterisk of x86-64 where windows uses only one calling convention? or does the compiler create a __vectorcall overload in that case?

  2. Adam Rosenfield says:

    It's theoretically possible for the compiler to generate conversions for capturing lambdas, but it requires being able to generate and execute new machine code at runtime — easy enough on normal OSes (e.g. VirtualProtect), but not possible on platforms like Xbox 360 where you can't create new executable pages at runtime for security reasons.

    To convert a capturing lambda to a bare function pointer, you first create a static function that takes an extra context pointer, like in the non-capturing case.  Then, on each conversion attempt, you allocate a new trampoline by copying a trampoline template for your CPU and fill in a pointer-sized placeholder with the context pointer.  The trampoline is a short piece of code which loads that context pointer into the first argument's location, possibly adjusts the other arguments if necessary, and then jumps to the static function.  Then of course at some pointer later on in the program, you have to deallocate the memory you allocated for the trampoline when the lambda is destroyed.

    The GNU foreign function call library (http://www.gnu.org/…/libffcall) is capable of doing this on a number of systems.

    [It's more than just creating the trampoline. You also have to register the trampoline with the operating system so that it knows how to unwind the code, should an exception be taken inside the trampoline. (Also, C++ needs to support machines that have separate address spaces for code and data.) -Raymond]
  3. Rodrigo says:

    Wow I'm just studying C++ lambdas, very informative post. Thank you.

  4. Joshua says:

    > but it requires being able to generate and execute new machine code at runtime

    Inability to do that results in me not selecting that platform as a target.

  5. Kevin says:

    @Adam Rosenfield: How does that interact with NX/DEP?

  6. Ben Voigt says:

    @Kevin: It's NX and DEP that make the VirtualProtect call necessary; previously your run-of-the-mill dynamic allocation was readable, writable, and executable.  Anyway, W^X is more interesting.

  7. Adam Rosenfield says:

    @Joshua: iOS is another such platform.  iOS is a pretty big market you're ignoring, from a business point-of-view.

    @Kevin: When you control all of the code, it's pretty easy to call VirtualAlloc(PAGE_READWRITE), copy in the desired machine code, and VirtualProtect(PAGE_EXECUTE_READ) on a page of memory.  No problems with NX/DEP when doing that.  For an attacker, though, it's much harder to write an exploit which performs all of those operations while copying in a piece of untrusted attacker code during the step where the memory is writeable.

  8. mikeb says:

    Awesome article.  It's nice to see the mechanisms for these features brought out from "behind the curtain".

  9. jon says:

    A very useful feature it is too. Being able to use a lambda with APIs like EnumWindows is really nice.

  10. Myria says:

    What about __vectorcall?  Since now there are 6 calling conventions, 4 of which are relevant here…

    @Joshua: Beyond the platforms already mentioned, Windows in Metro applications also prohibits marking pages executable…at least for the approval process.  Nothing actually stops it at the moment.  The NT kernel in 8.1 has a flag to prohibit a process from marking pages executable, but it's not used in anything I know of, because it breaks DLL relocation.

  11. Joshua says:

    @Myria, et all: Don't care about mobile phone as target. They get fed webpages anyway. Don't care about Metro either for the same reason.

  12. Michael says:

    On __vectorcall: it looks like the conversion operator is there (at least on 2015), but possibly not entirely supported on x86. At the very least, converting the introductory __vectorcall example so that AddParticles is a lambda results in these compile errors:

    error C2719: 'p1': formal parameter with __declspec(align('16')) won't be aligned

    error C2719: 'p2': formal parameter with __declspec(align('16')) won't be aligned

    error C2719: 'unnamed-parameter': formal parameter with __declspec(align('16')) won't be aligned

    On x64, the codegen for calling a normal __vectorcall function through a function pointer and calling a lambda converted to a __vectorcall function pointer through a function pointer seem similar.

    (This is what I'm referring to as the "introductory __vectorcall example" blogs.msdn.com/…/introducing-vector-calling-convention.aspx )

  13. Evan says:

    @Adam Rosenfield: "It's theoretically possible…"

    I actually wrote a nifty little library that does exactly that: It uses libffi to dynamically allocate some memory for a thunk, generate code into it that pushes the extra parameter then calls the function, and then marks it as executable.

    (I haven't actually *used* it for anything — I'm not sure how good of an idea I think it is — I just wanted to make sure I could do it. :-) If anyone actually wants to use it let me know…)

  14. not an anon says:

    Apologies in advance, but

    @Raymond — considering that Suggestion Box 4 has been closed for a while, how should we get a hold of you regarding topics to cover?  One of my fellow TDWTFers stumbled upon an utterly bizarre error message in Windows 7 x64, apparently generated by explorer.exe:

    "Too many other files are currently in use by 16-bit programs.  Exit one or more 16-bit programs, or increase the value of the FILES command in your Config.sys file."

  15. Azarien says:

    I've noticed that lambdas are calling convention agnostic when I discovered it's possible to use lambda for WndProc (which requires stdcall). It did work in VS2012, but not in GCC, at least back then.

  16. Mark says:

    not an anon: that message is string 0x2103 from shell32.dll.mui. A quick scan of shell32!_ExecErrorMsgBox shows this is displayed when one of the arguments is 4 (ERROR_TOO_MANY_OPEN_FILES). That function also displays "Windows cannot run this program because it is not in a valid format." when that argument is 11 (ERROR_BAD_FORMAT), so my guess is that for some reason ShellExecute is returning ERROR_TOO_MANY_OPEN_FILES, and the shell's message for that situation hasn't been updated in a long time.

  17. Mike Dimmick says:

    @not an anon, @Mark: The number of handles in a Win32 process is limited to about 16.7 million (very nearly 2^24 – 1). That's due to handles being 32-bit values with 8 bits reserved for a handle re-use count. (This is the current architecture: 64-bit code should never assume that the upper 32 bits are unused.) The actual handle tables are allocated from paged pool.

    The machine in question either has some Explorer plug-in installed which is leaking handles, or something is leaking paged pool.

    In normal use, the likelihood of running out of handles is so low that I'm not surprised that the message hasn't been updated. This all assumes that it isn't some rogue plug-in changing the last error code before Explorer itself actually sees it.

    Information from blogs.technet.com/…/3283844.aspx .

  18. Mark says:

    Mike Dimmick: yeah, I assumed that this would be a plugin, given the improbability of running out of handles and Explorer not dying shortly afterwards. A quick look at some of the custom columns in task manager would be a worthwhile sanity check, though.

  19. Peter says:

    I might be wrong, but I think, converting to nonmember function also requires the calc function, to be static or nonmember function

    I think this is also a requirement besides having no captured variables

  20. Evan says:

    @Peter: "I think this is also a requirement besides having no captured variables"

    It's a consequence of 'this' basically being a hidden variable; I'm pretty sure Raymond intended that interpretation (indicated by the change in invocation from this->calc(…) to just calc(…)). The standard wouldn't use that term of course, but I think that's why it wasn't mentioned.

    ["The language also provides for syntactic sugar to infer the lambda return type and capture variables implicitly. Let's assume all the sugar has been applied so that everything is explicit." If "calc" were a nonstatic member function, then "this" would have been captured. -Raymond]
  21. Adam Rosenfield says:

    [It's more than just creating the trampoline. You also have to register the trampoline with the operating system so that it knows how to unwind the code, should an exception be taken inside the trampoline. (Also, C++ needs to support machines that have separate address spaces for code and data.) -Raymond]

    Good point about the OS unwind info.  I'm unfamiliar with the implementation details of exactly how those work, but I'd presume that it's not too difficult to generate the proper machine code which does that correctly for the trampoline (after all, compilers do it all the time).  It just makes the machine code a little more complicated than a push+jmp or equivalent.

    Re: supporting machines with separate address spaces for code and data, I think that's pretty much equivalent to my statement that this requires being able to generate and execute new machine code at runtime.  You need OS support to copy or move dynamically generated code from the data address space to the code address space, which OSes like Win32/Mac OS X/Linux provide, but some OSes like Xbox 360 and iOS do not provide.  If C++ made converting capturing lambdas into bare function pointers a mandatory feature of the spec, then those platforms would not be able to claim 100% support for the spec.

    It makes perfect sense that ISO/ANSI decided not to make that a mandatory (or optional) feature of C++11.  But even though it's not part of the spec, it's possible to implement in a reasonable way on a subset of platforms.

  22. jingyu9575 says:

    I can't compile AnonymousClass; the closest code I can get is

     (*operator int())(int k) { return static_function; }

     int operator () (int k) const { return calc(k + 42); }

    which compiles on g++ but fails on clang++ and cl.

    [The code was intended to be illustrative, not to be actually compilable. -Raymond]
  23. Joshua says:

    [… should an exception be taken inside the trampoline.]

    We could avoid this problem by writing a trampoline that can't raise any recoverable exceptions. (RIP pointing into free RAM is not recoverable.) This requires register this calling convention on the receiving function (doable if it's not a COM class) and doesn't break the rollback code if _chkstk doesn't (we note that _chkstk uses a register calling convention.)

    trampoline:

       mov rax, this  ;this = constant

       jmp [rip]      ;points just past jmp instruction

       dq member_function_pointer

    Here we take full advantage of the compiler allowed to alter the calling convention within limits.

    [You can't prevent in-page errors. An app could try to recover them by retrying the I/O operation. -Raymond]
  24. Joshua says:

    [You can't prevent in-page errors. An app could try to recover them by retrying the I/O operation. -Raymond]

    No point. If the data pages are paged out and can't be paged in, then the swap file is in trouble. Better to fall on the sword now and take load off the swap. KERNEL_DATA_INPAGE_ERROR is coming.

    [I'm not saying it's a good idea. Just that it's something an app might try. And how about breakpoint exceptions? -Raymond]

Comments are closed.

Skip to main content