I like FORCEINLINE


For kernel mode code, if I have choice between using a #define or a FORCEINLINE function, the FORCEINLINE function wins every time.  #defines have their place, especially for quotifying (the # operator) or concatenating (the ## operator), but they have no place in my heart for constants or pseudo functions.


FORCEINLINEs have type checking.  On a debug build, they show up in the symbol information.  They do not have weird side affects due to replacement by expression (such as invoking the #define max(a++, b++) would have).  In KMDF, we use FORCEINLINE functions for all DDI calls and all of our structure init functions. The type checking alone is enough for me; it allows for more maintainable and correct code.


I am also intentionally leaving templates out of this for a reason.  C++ is not supported in the kernel.  For those who use C++, you are cautioned to use a safe subset of it.  Templates are far from a safe subset because there is no way to deterministically control into which section the code will be linked into.  This is important because you can accidentally have your template generated code placed into a PAGEable section, but the code cannot be PAGEable (and you BSOD very quickly 🙁 ).

Comments (14)

  1. Dr Pizza says:

    It seems to me that it would be (a) relatively easy (b) extremely useful if the compiler team made the required updates to the compiler to control the location of *all* generated code.  C++ is a much better language than C for writing code that must be reliable.  Even if you don’t use any other C++ constructs, the pair of stricter typechecking and RAII make it worthwhile.

  2. Using C++ strictly as a better C is in terms of typechecking is supported.  I would say that using a contructor/desctructor pattern for RAII is also supported because in this case you must explicitly create both functions; the danger lies in the compiler autogenerating the code for you and then that code getting placed in a random spot by the linker.  I have used this pattern in a driver before safely.

    The compiler team did make alot changes based on our feedback in the latest Visual Studio release.  But the changes by themselves are not enough, they need to be verified, validated, and regresssion tested before we can tell folks to use them.  That takes time and right now we are trying to get a product (Vista, i think a few folks have heard about it 😉 ) out the door and this is not a Vista feature, but it will see the light of day in the future.

  3. Ray Trent says:

    As much as I understand that there’s no good way to prove where the linker might choose to put template functions (or default copy constructors, or anything else), I will say that we’ve been using templates and virtual multiply-inherited classes in the kernel for almost 10 years on 150,000,000 shipped products and I’ve never seen an OCA or other indication that the linker has ever *actually* made that faux pas. And that’s all the way back to the VC++ 4.2 days.

    Of course, we don’t mark *anything* (other than DriverEntry which goes in INIT) as pagable in our driver so I don’t think there’s much opportunity for the linker to get confused.

    Microsoft probably can’t do that…

    On the other hand on the scale of many large classes of Windows kernel mode drivers, memory is free and processors are fast. I suspect you’re trading footprint for significant performance degradation… All well maintained machines eventually run a virus check, which in my experience generally will force the working set well above the available memory and page out everything you’ve marked as pagable… for no good reason I might add. And hail Eris is it painful the first time I use my machine after that happens…

  4. sayler says:

    I foud my way over here from Raymond C’s blog.  Looks very interesting… My kernel hacking experience does not lie in Microsoft-land at all and I’m very curious to hear more details like this.

    I ws curious what the semantics of FORCEINLINE are, so I googled for it, and came up with a page full of source code listings, most of which were of this variety:

    #define FORCEINLINE   __inline

    🙂  So much for my google-fu.

    Also, there has been talk recently in Linux-land about excessive inlining , and I was wondering about thoughts on this issue from your perspective.  (See http://lwn.net/Articles/166172/  for some discussion on the issue).

  5. sayler:  that’s a great article, it was a good read.   on current versions of the microsoft compiler, FORCEINLINE evaluates to __forceinline, which is basically a stronger hint at inlining then __inline is.  In KMDF we don’t have 22,000 functions though ;).

    I don’t think large/complex functions should be inlined.  I think 1-3 line accessor functions (like a Get() for a protected field) are great candidates.  For KMDF, we use FORCEINLINE for our DDIs.  Since our DDIs are in a jump table, there is no real additional overhead outside of the compile time type checking.  As an example:

    VOID

    FORCEINLINE

    WdfDeviceSetDeviceState(

       WDFDEVICE Device,

       PWDF_DEVICE_STATE DeviceState

       )

    {

       ((PFN_WDFDEVICESETDEVICESTATE) WdfFunctions[WdfDeviceSetDeviceStateTableIndex])(WdfDriverGlobals, Device, DeviceState);

    }

  6. DrunkCod says:

    I really don’t like FORCEINLINE simply because it in reality doesn’t forceinline 🙁 I disocvered this the hard way while in (C++) trying to create a type safe _alloca wrapper.

    Sadly enough since forceinline really doesn’t force inline the call headaches really did ensue. So it’s really that much better than __inline or the traditional C++ const/inline constructs.

  7. DrunkCod: FORCEINLINE is not an absolute. From talking to the compiler team, it is a much stronger suggestion to inline then __inline is, but still not an absolute.   On a debug build, it is never honored.  For a retail build, there are situation/usages when inlining will not occur.  For instance, if the candidate function has an SEH block or C++ exception handling, the inlining will not occur.  Also, if there are varargs, inline will also not occur.  There are other limitations as well, I don’t remember them all offhand.

    In your case of wrapping _alloca, I can totally see why the compiler would not let you do this.  _alloca grows the stack for the allocation, so by the time your inline funciton returned, the _alloca memory is gone.  In this case a #define might be better.

  8. DrunkCod says:

    Yes, Im well aware of how alloca works, and the behaviour you describe growing the calling functions stack was why I wanted to forceinline it.

    And intrestingly enough VC++ does indeed honor the inline for it during release builds that’s whats caused my headache to begin with.

    so yeah, sure a macro would solve the problem but it would kill all intent of creating the typesafe wrapper to begin with.

    (the relevant code)

    #ifdef _MSC_VER

    #include <malloc.h>

    #define alloca _alloca

    #define MUSTINLINE __forceinline

    #else//assume GCC

    #include <alloca.h>

    #define MUSTINLINE __attribute__((always_inline))

    #endif

    namespace beer

    {

    MUSTINLINE void* stack_alloc(size_t n){ return alloca( n);}

    template <typename T>

    MUSTINLINE T* stack_alloc(size_t n){ return reinterpret_cast<T*>( stack_alloc( n));}

    }

    I so wanted it to work 🙁

    Does the compiler team have any intent for actually us the power to really really force inlining? I tried making it by fiddling around with intrinsics but to no real avail…

  9. I would assume that the compiler folks will try their hardest, but I think getting an absolute inlining attribute might be near impossible given the NP completeness of the problem.

    Even if you got your class above to be inlined, there is no guarantee that the stack space will not be reused.  Remember that the stack space used by the _alloca is gone as soon as the function "returns", even if inlined.  instead of a templacized class, you can have a new operator new(), e.g.

    class Foo {

    PVOID operator new(size_t N, PVOID *p) { return p);

    }

    Foo* pFoo = new (_alloc(sizeof(FOO)) Foo();

    this way the _alloc is at least scoped properly, even though the abstraction leaks a bit.

  10. carmencr says:

    Nice to see someone is doing some kernel blogging.  I wish I had time to keep writing for my blog, but that has not been in the cards for a long time now.

  11. mattd says:

    doron,

    This might be a stupid question but can you explain why the DDI function pointers are stored in a jump table. What does this indirection provide.

  12. matt:  we made it a jump table so that we could resolve dependencies at runtime vs having hard coded imports at load time.  There is not a single import to wdf01000.sys in a KMDF client driver and this is an explicit design decision.  It could be that we need to service KMDF with a new binary (with a different name) and the lack of direct explicit imports lets us do this.

  13. I think that the C preprocessor is a very powerful tool, but I like to limit

    my use of #defines.  I…