How to make a no-inline inline function (and why you might want to)


The Windows folks had a problem with the AMD64 compiler last week.  The interference package is not quite as good as the x86 one, so when it sees something like this:

 

// header.h

inline void foo(int a, int b, int c)

{

    bar(&a, &b, &c);

}

 

// function.cpp

#include "header.h"

void func()

{

    foo(1,2,3);

    foo(4,5,6);

}

 

we inline foo into func twice, generate (because they're address is taken) temporary storage for 1,2,3 & 4,5,6, but then we don't realize that, because they're scope-local address taken variables (so any global function that may be using these addresses after the end of the foo scope is broken), we should be able to pack the two sets of temp's together.  This resulted in a kernel stack overflow (very very very bad).

 

The solution is to prevent the compiler from ever inlining foo, so that the temps are never allocated in func's stack frame.  The trouble with this, is that if you remove the inline, and add a __declspec(noinline) to the foo implementation, each .C file that includes header.h gets a new copy of foo, and when you try to link, you get a multiply defined symbol error.  For data, the proper way to handle this is with __declspec(selectany), which tells the linker that if you see multiple symbols all claiming to define the same name, just pick any one of them.  Except you can't use selectany on a function definition, only on a variable definition. 

 

So, instead, you use __declspec(noinline) inline void foo(...).  While it's not intuitive (and kind of funny) what's going on, in order for inline functions to be defined in a header, they require the selectany attribute, so the compiler, when it sees the 'inline' C++ keyword, tags the function as a 'selectany' section for the linker.  Problem solved!

 

The primary reasons I can see to do this, (besides the reason the Windows Kernel folks have), is if you want to have some mechanism in place that is completely handled in headers, but you don't want to bloat your code or you want to be able to see, from disassembly, where the function is being called.  Can someone come up with other reasons?

Comments (2)
  1. Will says:

    I don’t think I understand why this example caused a stack overflow – is it because the types were actually MUCH bigger than int, and so two copies was sufficient to overflow, or is it something else.

    I know the KM stack is smaller and less growable.

  2. Kevin Frei says:

    The function was being called a lot, and I think the function where it was called was also a partially recursive function. And, yes, the Kernel stack is quite small (24K, IIRC)

Comments are closed.

Skip to main content