Why doesn’t Win32 give you the option of ignoring failures in DLL import resolution?


Yuhong Bao asked, via the Suggestion Box, "Why not implement delay-loading by having a flag in the import entry specifying that Windows should mimic the Windows 3.1 behavior for resolving that import?"

Okay, first we have to clear up the false assumptions in the question.

The question assumes that Windows 3.1 had delay-loading functionality in the first place (functionality that Yuhong Bao would like added to Win32). Actually, Windows 3.1 behavior did not have any delay-load functionality. If your module imported from another DLL in its import table, the target DLL was loaded when your module was loaded. There was no delay. The target DLL loaded at the same time your module did.

So there is no Windows 3.1 delay-load behavior to mimic in the first place.

Okay, maybe the question really was, "Instead of failing to load the module, why not just let the module load, but set the imported function pointers to a stub function that raises an error if you try to call it, just like Windows 3.1 did?"

Because it turns out that the Windows 3.1 behavior resulted in data loss and mystery crashes. The Win32 design solved this problem by making failed imports fatal up front (a design principle known as fail fast), so you knew ahead of time that your program was not going to work rather than letting you run along and then watch it stop working at the worst possible time, and probably in a situation where the root cause is much harder to identify. (Mind you, it may stop working at the worst possible time for reasons the loader could not predict, but at least it stopped what it could.)

In other words, this was a situation the Win32 people thought about and made an explicit design decision that this is a situation they would actively not support.

Okay, but when Visual Studio was looking at how to add delay-load functionality, why didn't they implement it by changing the Win32 loader so that failed imports could be optionally marked as non-fatal?

Well, um, because the Visual Studio team doesn't work on Windows?

There's this feature you want to add. You can either add it to the linker so that all programs can take advantage of the feature on all versions of Windows, or you can add it to the operating system kernel, so that it works only on newer versions of Windows. If the feature had been added to the loader rather than the linker, application vendors would say, "Stupid Microsoft. I can't take advantage of this new feature because a large percentage of my customer base is still running the older operating system. Why couldn't they have added this feature to the linker, so it would work on all operating systems?" (You hear this complaint a lot. Any time a new version of Windows adds a feature, everybody demands that it be ported downlevel.)

Another way of looking at this is realizing that you're adding a feature to the operating system which applications can already do for themselves. Suppose you say, "Okay, when you call a function whose import could not be resolved, we will display a fatal application error." The response is going to be "But I don't want my application to display a fatal application error. I want you to call this error handler function instead, and the error handler will decide what to do about the error." Great, now you have to design an extensibility mechanism. And what if two DLLs each try to install different failed-imported-function handlers?

When you start at minus 100 points, saying, "Oh, this is not essential functionality. Applications can simulate it on their own just as easily, and with greater flexibility" does nothing to get you out of the hole. If anything, it digs you deeper into it.

Comments (14)
  1. Anonymous says:

    And to implement this, use: LoadLibraryEx & GetProcAddress.

  2. Anonymous says:

    What, you never heard of an application using a custom loader? <ducks>

  3. Anonymous says:

    If it had been added to the operating system by the first time delay-loading was thought of, sure, it would not have been usable for those who wants to be compatible with older OSes, but, as time passed, by now, it would be useable on all "downlevels" operating systems.

  4. Anonymous says:

    Having done delay loading of dlls before Visual Studio added support for it, I have no sympathy for this guy. A base class to handle loading a DLL at first use is almost trivial and you get complete control over what happens when an import isn't found.

  5. Anonymous says:

    @Rob: You're right that it is simple if you're allowed to make source code modifications.  With C++0x decltype it becomes almost trivial.  The Visual Studio loader is, of course, much more complicated because it has to load only the functions actually called and do all its magic at link time or later so it works with third-party static libraries.

  6. Anonymous says:

    I just wish that the system would give a more direct diagnostic of what the problem is when it fails to start a process due to a missing DLL import. Sometimes you do get a decent enough error message, but my recollection (dusty as it is, since this doesn't really happen too often) is that I'd usually have to jump through some hoops to figure out which DLL and/or import was causing the problem – or that the problem was because of a missing DLL or import.

  7. Anonymous says:

    Interestingly .net is rather fault tolerant, similar to the Win 3.1 behavior. But I'm not sure if this is by contract, or just an implementation detail.

  8. Anonymous says:

    BTW, I later found out about weak-linking on the Mac, which is pretty much exactly this, but as an option that is not enabled by default. Programs would test weak-linked procedure addresses to ensure they are not zero before calling them.

  9. Dean Harding says:

    GLEW for OpenGL is a nice example of a mechanism for handling "soft failure" dyanmic loading. No special features needed in the Win32 loader or VC++ linker at all…

  10. Anonymous says:

    @W .Net does not load assemblies (.DLLs) unless they are actually used. (And it does not JIT a method until it's used either). That's a very good optimization on both loading time and memory used.

  11. Anonymous says:

    @Sukru – not if they are strongly-named (and not in the assembly cache) as far as I remember..

  12. Anonymous says:

    Why was Fred App even allowed to start on Windows 3.0? Surely if it needed functions only available on Windows 3.1 it should have been marked as requiring Windows 3.1 and then the problem would never have arisen.

    [What an odd question. "Why not stop making bugs?" -Raymond]
  13. Anonymous says:

    "(You hear this complaint a lot. Any time a new version of Windows adds a feature, everybody demands that it be ported downlevel.)"

    This one is recent and funny:

    tech.slashdot.org/…/IE9-Team-Says-Our-GPU-Acceleration-Is-Better-Than-Yours

    If you read the comments on Slashdot and on the blog, the complaints range from "why isn't this in XP?" to "Microsoft is destroying the web which should be platform-independent but this DirectX-acceleration doesn't work on Linux". As if it makes a difference for the web-designer if someday his pages will be rendered through DirectX.

  14. Anonymous says:

    Uhm… I though failed delay-loads *were* non-fatal in Win32… They raise a Win32 exception that can be caught and handled, or one can even use a callback function instead…

    IIRC, only immediate loads are fail-fast.

    [I think you missed the point of the article: Delay-loads are implemented by the linker, not the operating system. -Raymond]

Comments are closed.