Why can’t I use the linker to delay-load a function from kernel32?


For some time (I am too lazy to look up when it was introduced), the Visual Studio linker has supported a feature known as delay-loading. But why can't you use this feature to delay-load a function from kernel32? It would be very handy: If you write

if (CurrentWindowsVersionSupportsKernelFunctionXyz())
{
  Xyz(...);
}

the program fails to load on versions of Windows which do not support the function Xyz because the Win32 load rejects loading a module that contains unresolved references. On the other hand, if you could mark kernel32 as delay-loaded, then the code above would work, since the call to Xyz would be redirected to a stub that calls GetProcAddress. Since the GetProcAddress is performed only when the code path is hit, the loader won't complain at load time. But if you try to delay-load kernel32, the linker gets upset at you. Why won't it let me delay-load kernel32?

The linker delay-load feature operates on the DLL level, not on the function level. When you put a DLL on the /DELAYLOAD list, the linker changes all calls to functions in that DLL into calls to linker-generated stubs. These stubs load the target DLL, call GetProcAddress, then resume execution at the target function.

Since the delay-load feature operates on the DLL level, if you put kernel32 on the delay-load list, then all calls to functions in kernel32 turn into calls to stubs.

And then you are trapped in this Catch-22.

When a function from kernel32 gets called, transfer goes to the stub function, which loads the target DLL (kernel32) to get the target function. Except that loading the target DLL means calling LoadLibrary, and finding the target function means calling GetProcAddress, and these functions themselves reside in kernel32.

Now you're trapped. To load kernel32, we need to call LoadLibrary, but our call to LoadLibrary was redirected to a stub which... calls LoadLibrary.

Sure, the linker folks could have added special casing for kernel32, say, having a list of core functions like InitializeCriticalSection which are never delay-loaded and always go directly into kernel32. But that's really out of scope for the /DELAYLOAD feature, whose purpose is not to make it easier to call functions which might not be there, but rather to assist in application startup performance by avoiding the cost of loading the target DLL until a function from it is called. If there were functions that went directly into kernel32, then the stated purpose of delay-loading fails: that import of InitializeCriticalSection forces kernel32 to be loaded when the module is loaded, completely contrary to the aim of delay-loading to avoid loading kernel32 at module load time.

Now, it's certainly a nice feature to be able to perform delay-loading on a per-function level, in order to make it easier to write code which changes behavior based on the current version of Windows, but that's a different problem from what the /DELAYLOAD switch was created to solve.

Comments (20)
  1. Shaka says:

    It isn’t also true that the whole idea of delay load kernel32.dll is a bit futile since kelnel32.dll is always loaded for a win32 process even before che program code (and any delay-load code) run because, for example, user mode thread (and so the main thread) starts their live in a function of kernel32.dll ?

  2. Robert says:

    Why can’t those people use weak symbols?

    [Those who do not understand history are doomed to repeat it. -Raymond]
  3. Crescens2k says:

    Well, this is just using a means to do something wasn’t designed to do it.

    Of course, the biggest stupidity here is trying to delay load the library which is the home for the delay load mechinism.

    The most obvious problem is what actually happens if you manage to call a function which doesn’t exist? Ie, you manage to call one of the transacted NTFS functions on Windows XP. Even though this is a bug, (not checking version or forgot to put an if block in the right place) how does delay loading handle the failure. It can’t be as nice as doing it using the correct means.

    What this looks like is that someone is too lazy to set up the function pointer and GetProcAddress (with maybe some LoadLibrary for libraries not loaded with the process). I just wish people would not be so lazy.

  4. Arno says:

    Raymond, do you have any idea whether it is worthwhile to delay-load a DLL that is already loaded, just to save on resolving the imports? Basically, not saving the LoadLibrary but at least saving the GetProcAddress?

    [That’s a different problem from what the /DELAYLOAD switch was created to solve. -Raymond]
  5. Crescens2k says:

    You would seriously initialise your function pointers statically? -_-;

    I may be a weird one but I stay away from any kind of static initialisation unless it is absolutely necessary. On top of that I also see even less need for cross compilation unit initialisation.

    I can’t think of any situation off of the top of my head in which you couldn’t change static initialisers into a single init call from main/WinMain.

    As for weak symbols. Win16 had those and they didn’t work well.

  6. Mike Dimmick says:

    Robert, I think you might have to qualify your performance comment with ‘on this set of processors’ or under such and such conditions. Yes, if you call the function immediately after GetProcAddress and the generated code is (for example) CALL EAX, I can see that the processor might not correctly predict the location of the jump and therefore stall for a bit, but if you’ve stored the address – which I would hope you would – the instruction is likely

    CALL [some location]

    and is no different from calling through the import address table.

    If you don’t leave the library loaded and store the result of GetProcAddress, I’m sure searching for the function’s address is a heck of a lot more costly than calling through a function pointer.

    Using LoadLibrary/GetProcAddress adds a lot of overhead to a function for error checking and the like; using /DELAYLOAD allows you to program as if the function is always there and handle the case where it’s not out-of-line. I’ve considered implementing delay-load for Windows CE where it is, alas, not available.

  7. Robert says:

    @Mike Dimmick

    Just checking: you don’t need the delay in delay-loading; just on-demand loading of functions?

    It seems somewhat strange to me that many people are using the delay loading in the same way in which they would use weak symbols and it doesn’t cause the problems which weak symbols caused.

    [Instead, it causes other problems (like security vulnerabilities). More on that on November 11 of this year. Stay tuned! -Raymond]
  8. Philip says:

    Of course, you actually can delay load a function from kernel32 — you just have to do some magic with the PE after it’s been built. The loader handles this case perfectly when you have one set of static imports under the name "kernel32.dll" and one set of delay-load imports under the name "kernel32.dll". The static imports are processed first, and the others are processed whenever a function is hit.

    The trick to do this with your garden variety compiler isn’t available on the internet at the time of this writing, but can be deduced from a careful reading of the PE spec and a nifty post-build tool of your devising.

  9. Robert says:

    """

    [Instead, it causes other problems (like security vulnerabilities). More on that on November 11 of this year. Stay tuned! -Raymond]

    """

    Actually, I wanted to know how does using delay loading instead of weak symbols help. All the problems with late recognition of a lack of a required symbol still exist, and that was the biggest problem with weak refs, wasn’t it?

  10. Crescens2k says:

    The point here is it doesn’t.

    Delay loading was added to help reduce possible costly initialisations not be used as weak references.

    The thought for delay loading was instead of modules eating up a lot of time and processing power slowing down the application start, it would only be when the library is needed that it would be loaded into the process. The real benefit is if the delay loaded stuff is not used, in this case it wouldn’t even have to load the library in and initialise it. Also, because of the fact that delay loading is used to help speed things up, symbols are not resolved.

    This is where the problem lays. People have started to abuse this as a way of getting around the symbols needing to be resolved. So it isn’t what delay loading was designed for, it is an abuse of a feature of delay loading.

    The only supported method of conditional symbols in Win32 is LoadLibrary and GetProcAddress. In fact, the thing which is funny is with the addition of the delay load to skip the symbol resolution, you will still have to check versions for every call.

  11. Robert says:

    @Crescens2k

    But even if we use delay loading for its intended purpose it causes the same problems as those described in post about weak refs. This does the same that making all refs from a given library weak does (the situation in 16bit Windows).

    Am I correct that the time-consuming part of linking is looking up all symbols and not generating PLTs/whatnot?

  12. Dean Harding says:

    GLee[1] is a library for use with OpenGL that does all the dynamic lookup of extensions and so on. You just go if (GLEE_ARB_multitexture) { glMultiTexCoord2fARB(…) } and it does the dynamic loading of glMultiTexCoord2fARB and so on.

    There’s no reason somebody couldn’t come up with a similar library for kernel32 (or any other library, for that matter).

    Perhaps Microsoft could stop making additions to kernel32.dll and add all new functionality into a new DLL. That way, you COULD delay load that DLL… though maybe that’s more trouble than it’s worth.

    [1] http://www.elf-stone.com/glee.php

  13. Crescens2k says:

    This is why delay loading isn’t something seen as normal. The linker documentation itself states

    You should consider delay loading a DLL if:

    Your program may not call a function in the DLL.

    A function in the DLL may not get called until late in your program’s execution.

    Which means that for standard usage you shouldn’t use delay loading.

    If you look around then you will notice that delay loading is not actually used that often. Mostly it is down to the fact that a delay loaded dll doesn’t offer much when it is loaded and used right away, or often. So while it is true the weakness of delay loaded symbols is an issue, it isn’t an issue which would normally affect people.

    It is only the whole "I just want to shut it up" attitude which has caused this problem. It is exactly the same as people using casts or #pragma warning to get rid of errors or warnings in a compiler because it requires doing less work.

    So no matter what, delay loading to get around the symbol resolution is bad. But instead of setting up LoadLibrary/GetProcAddress and stubs, people would rather use the quickest method even if in the long run it would cause problems.

    For linking.

    The time consuming part depends really. For more recent versions of the linker, LTCG is the most time consuming part. In fact, generating the export table should be pretty easy since all it needs is the symbol name (already known) and it’s RVA (worked out during linking).

  14. Allen Bauer says:

    "Now, it’s certainly a nice feature to be able to perform delay-loading on a per-function level, in order to make it easier to write code which changes behavior based on the current version of Windows, but that’s a different problem from what the /DELAYLOAD switch was created to solve."

    FWIW, the most recent version of Delphi (Delphi 2010 http://www.embardacero.com/delphi) added the ability to specify delay loading of individual APIs for that exact purpose. You can mark individual APIs to be delayloaded, include ones to Kernel32.

  15. Robert says:

    @Crescens2k

    Keeping a global function pointer statically initalized to getprocaddr(something) invites all static initialization ordering-related bugs (you can’t use the function in any static initalization, because it might get called before the pointer is set up). Also, if you are *very* concerned about performance, calling a function pointer is noticeably slower than a call to a constant location and a jump to a constant location. I agree, delay loading doesn’t help with it at all but weak symbols would.

  16. Marquess says:

    "More on that on November 11 of this year. Stay tuned! -Raymond"

    Oh, that’s just cruel …

  17. Anonymous Coward says:

    Since all the delay loading does is call LoadLibrary and GetProcAddress, I’m sure that people who want to do that for a single function but are offended by seeing those extra calls in their code, can macro their way out of it.

  18. Nicholas says:

    I already have my comment in the queue and it will be posted November 12 of this year (@7am)

  19. Leo Davidson says:

    @Anonymous Coward: The problem isn’t the LoadLibrary/GetProcAddress calls being in the code. Of course you put those in a stub somewhere out of the way.

    The problem is all the manual stuff you have to do to produce those stubs. You have to find the function declarations in the SDK headers and convert them into the appropriate function pointer type declaration (which often requires a bit of working out and looking up weird API-specific #defines), put a variable somewhere, put in the calls to LoadLibrary/FreeLibrary and GetProcAddress, and finally write the actual stub itself.

    None of it is difficult but it’s quite tedious, especially when doing several functions at once.

    I don’t use delay loading myself; I do the tedious stuff instead, but I’m not surprised other people use delay loading to avoid it.

    It’d be great if VS or some other tool provided an automated way to create stubs for a given API. Just tell it the API name and it does the rest.

    (In a dream world, it’d also be nice to have some way to detect, from static analysis, where you have accidentally used APIs, structure versions, flags, etc. for later versions of Windows outside of code-blocks you’ve explicitly designated as being allowed to use them.)

  20. Maxime Labelle says:

    "More on that on November 11 of this year. Stay tuned! -Raymond"

    That is called ‘delayed posting’…

Comments are closed.

Skip to main content