Link-time code generation invalidates a lot of classical assumptions about linking


The discussion of DLL imports and exports from a few years back relied heavily on what I called the classical model of linking. Even though modern linkers can be quite non-classical, they do what they can to preserve certain aspects of classical behavior because many projects still rely on them.

Recall that the classical division of labor between the compiler and the linker is that the compiler takes source code and generates machine code with a few blank spaces that says, "Hey, linker, I'm not sure what number goes here, but I want it to be the address of X. Please patch the correct number in when you figure it out where X is." The linker's job is to go find X, assign it an address, and patch the address of X into all the places that the compiler asked for.

In the Visual Studio system, one of the ways of activating a large set of non-classical behaviors is to enable Whole program optimization, also known as Link-time code generation. When this is enabled, the division of labor between the compiler and linker shifts. The compiler still takes source code, but instead of generating machine code, it merely generates a parse tree (perhaps partially-digested). The linker then takes all these parse trees, combines them together (according to classical rules), and generates machine code.

Since the code generation is done with the full parse tree of the entire program, the code generator can perform very advanced operations, like observing that a method is never overridden and can therefore be inlined. In particular, starting with Visual Studio 2012, the link-time code generator can even see that you got dllimport wrong, and it goes back in time and redeclares the function correctly before proceeding with code generation.

Non-classical linking is even more advanced than non-classical physics. Whereas special relativity lets you stretch and slow down time, non-classical linking even gives you a limited form of time travel.

Comments (31)
  1. Joshua says:

    Poor man' link time code generation:

    #include "main.cpp"

    #include "frame.cpp"

    #include "func.cpp"

    #include "account.cpp"

    #include "advancer.cpp"

    #include "audit.cpp"

    #include "cpio.cpp"

    g++ -whole-program -o my books.exe -O3 includeall.cpp

  2. Henke37 says:

    How does the linker know which format the number should be in? The compiler told it.

  3. SimonRev says:

    Good to know after all these years that the fabled Microsoft Temporal Mechanics department has started producing some results.  That time machine must be just around the corner.

  4. Ace says:

    What do we want?

    Time travel!

    When do we want it?

    It's irrelevant!

  5. JM says:

    Or, as we could also call "non-classical linking" and "whole program optimization": compilation.

    It's only magic if you still think that what they did in the 70s was pretty heavy stuff. One man's separation of concerns is another man's stone knives and bearskins.

  6. 12BitSlab says:

    @ JM

    IBM's mainframe implementation of PL/I in the 60's and 70's did some very heady memory optimizations.  You could declare variables of type flag (i.e., a bit) and the PL/I optimizer would combine them into bytes along with the proper masking for read and write access.  Along with some ASM, OS/360 ands its follow ons were written in PL/I.

  7. 640k says:

    A "whole program optimizer" could easily be developed by running a few copy commands as a pre-build task to concatenate all source before the compiler is invoked. Why was a whole new toolchain developed instead? Crazy. Even crazier is that this feature is advertised as something advanced when it's unnecessary. I guess some clueless compiler developer wanted to increase the technical debt tenfold.

  8. SimonRev says:

    Not quite 640k.  Just concatenating the sources would cause all sorts compiling issues such as Macros that would normally die at the end of one module would now live into the next.  Headers that have conditional compilation directives may be defined differently in different files could no longer be used.  static and anonymous namespace durations would be unduly extended and undoubtedly conflict.  I am sure there are a hundred other things that would also go wrong (anywhere the standard mentions "compilation unit" or whatever the current equivalent is would probably have issues).

    There would probably be some upsides as well — if the ordering of the files is defined the global variable initialization order problem becomes solved.

  9. JM says:

    @128BitSlab: to be fair, I was specifically thinking of C (and by extension C++)'s ludicrously primitive approach to modularity. As in, there isn't any beyond the use of "static", just make sure you combine source files in appropriate ways. It's certainly true that people could do better even in the 70s… but it's also true that "worse is better" is a real thing for a reason.

    I'm not sure PL/I is the best of counterarguments, though. It went a little too far in the opposite direction when compared to C… as in, never mind the cost of implementation, just throw it all in and see what sticks. This didn't exactly have a good influence on its design either… So nice bitflag optimization, shame about the rest, I suppose. :-)

  10. Myria says:

    As I understand it and heard from others, the Xbox 360 had poor backward compatibility precisely because of Link-Time Code Generation.  The original Xbox was a prime candidate for emulating games at a high level, since the NV2A calls were abstracted via DirectX.  The 360 system software could have emulated the x86, intercepted the kernel calls, and found the DirectX routines within the executable, then simulated their effects, but LTCG made this difficult.

    In release builds, the DirectX libraries for the original Xbox were compiled with /GL, as were the games by default, and it resulted in blurring of the boundaries between game code and DirectX library code.  It made the games slightly faster on the Xbox, but later, when an emulator was desired for 360, it wasn't possible to perfectly distinguish where the DirectX API calls begin.  The DirectX code got mixed into game code.

    Later XDK builds supposedly had externally-visible tables of pointers to critical functions in the XDK to avoid this problem, but it was too late for the majority of games.

    This is all from possibly-unreliable sources, but it seems logically-consistent.

  11. Billy O'Neal says:

    Myria: The Xbox was an x86 machine, and the 360 is a PowerPC machine. The Xbox used an NVidia graphics chip, the 360 used an ATI graphics chip. Etc. Linking is the least of the problems of backcompat for the 360.

  12. smf says:

    @myria

    The rumour I heard was that by linking DirectX games to the games in that way was done on purpose to make it hard to run the games on a standard windows pc. If so it worked really well as people tried and they never succeeded. OTOH the emulator for the 360 is surprisingly good, they only whitelisted games that they actually tested and they didn't test all games. The developer boxes will run all games and it seems a lot of games work quite well. Sony have a similar issue and they just put more effort into testing (and in a lot of cases patching) games (they even have game specific patches for PS1 games running on a PS2 and they sold that as hardware backward compatibility, which it isn't completely).

  13. 12BitSlab says:

    @ JM

    Largely, I agree with you.  One thing that one has to keep in mind was that the optimizations associated with PL/I were more more concerned about memory usage as opposed to fast code.  It did have optimizations for loop invariant code and things like that, but in the context of the time, memory was far more precious than was processor speed given the incredibly slow nature of I/O at the time.  Remember, the first shipment of S/360's did not have disk drives.  It was a card/batch system.

    I wrote some PL/I back in the day.  It was one of my least favorite tools to use.  I felt more comfortable with ASM than PL/I and given the choice, would go with ASM everytime.

    BTW, if I sound old, I just might be.  I voted for Lincoln — when he ran for the Senate.  He was such a nice young man.

  14. foo says:

    If it invalidates it hopefully it just reports it as a diagnostic and dies (failure or justified).

  15. loreb says:

    @Joshua

    Sqlite's amalgamation does more or less what you say, see http://www.sqlite.org/amalgamation.html

  16. alexcohn says:

    @SimonRev: the static vars are easily handled by few curly brackets:

    {

    #include "main.cpp"

    }{

    #include "frame.cpp"

    }{

    #include "func.cpp"

    }{

    #include "account.cpp"

    }{

    #include "advancer.cpp"

    }{

    #include "audit.cpp"

    }{

    #include "cpio.cpp"

    }

    But silently redefining #defines across compilation units is evil, and to the extent that Joshua's approach can help expose it (unfortunately, not completely) – it is a strong reason to prefer it over LTCG.

    But it is still a "poor man" palliative: the real thing allows working with static libraries, which is essential for large and/or distributed projects.

  17. Michael Quinlan says:

    IBM used PL/S (en.wikipedia.org/…/S) not PL/I for is systems level programming.

  18. alegr1 says:

    @Alex Cohn:

    I don't think C/C++ allows that.

  19. Joshua says:

    It would have if he did namespace { instead of {.

    I agree cross-defined macros are evil.

    [I just tried it. "Unresolved symbol _main" because main is now in an anonymous namespace. Also, nobody in main.cpp would be able to call any function in frame.cpp. -Raymond]
  20. Joker_vD says:

    One question: if I have modules written in different languages, will LTCG make cross-modular optimization possible?

  21. icabod says:

    On a large project, I often make small, single-file changes.  LTCG would re-compile that one file and link it in.  Using the "#include everything into one unit" approach would be slooooow, to the point where an actual time-machine would be quite handy.

    Pre-emptive something: I've had to do single-file changes on Release builds too.

  22. icabod says:

    Regarding the evilness of "redefining #defines across compilation units", ordinarily I would agree, but in the case of macros such as NDEBUG, it's written in the C++ standard how it should be used.  This means defining it in one unit would have an unintentional knock-on effect when using the single-unit approach… unless you remember to #undef, which I would argue the use of is even eviller (!?).

  23. SimonRev says:

    Agreed that changing #defines across files is evil, it is nonetheless done.  You still don't want #defines that are created in the .c or .cpp file (even if #defines are evil in C++, they still are used) to outlive that .cpp file.   What if some .cpp file did a #define Test(a) ASSERT(a != 0), but some other cpp file had bool Test(int const &x);  Bad practice?  Yes.  But something that the compiler team has to deal with as it is valid C++.

    More problematic are different compile options.  What if I have one file that needs to be compiled with /clr?  What if there are a few evil legacy files that still need /EHca?

  24. Evan says:

    @Joshua: "It would have if he did namespace { instead of {."

    It wouldn't be illegal C++ on its face, but it also doesn't solve the problem: you still can't repeat definitions of entities in the same file even if they are in different namespace{…} blocks. Nor does it help at all with C.

    @Alex Cohn: "But silently redefining #defines across compilation units is evil"

    Sometimes files have "local" macro definitions, because you want to use them for one specific purpose. I'd say it's not just fine but even BETTER to define those at the point of use rather than globally across projects. And while I'd prefer to see an #undef of them after, if you're near enough the bottom of the compilation unit, it's also pretty reasonable to omit the #undef.

  25. Joshua says:

    @Evan: I never noticed any of these as I by habit keep local names unique.

  26. Rick C says:

    "Also, nobody in main.cpp would be able to call any function in frame.cpp."

    To be fair, that specific issue is trivially fixable.

    [You would have to find everything with external linkage and declare it outside the anonymous namespace. Whether this is trivial or not depends on how fancy your tools are. -Raymond]
  27. Evan says:

    @Joshua: "I never noticed any of these as I by habit keep local names unique."

    Why would you put in the effort to even think about it? IMO 80% of the benefit of keeping names local is so that you don't have to.

  28. Joshua says:

    @Evan: If they're not unique they're harder to access via debugger.

  29. 12BitSlab says:

    @ Mr. Quinlan

    You are correct that PL/I was an application language and that PL/S was the systems language.  All of the shops I worked at used the term PL/I for all flavors of PL/*.  

    Contrary to popular belief, it was possible to get the PL/S compiler.  A place I worked at needed to modify the OS so we could bring up a very early ATM system that went on to dominate the market in the Midwest.  

    The original OS/360 had a lot of PL/I (F compiler) code in it for things that did not have to deal with registers and the like.  Later, that code was migrated to PL/S by IBM.

  30. Joshua says:

    OK the anonymous namespace thing doesn't really work too well if the object is to be able to build the same code either way. It has something to do with being declared in one anon namespace and called in another.

    I have a working scheme using named namespaces but it depends on specific project order.

    It's probably easier to fix conflicting names via preprocessor. This isn't as hard SD it sounds as you can find all of them on the first pass if your error count limit is high enough. They raise duplicate definition error.

    [So the way you compile your project is to compile it once to find all the name conflicts, then preprocess the source to rename the conflicts, then compile it a second time. I wonder if at this point 640k would admit that maybe this is better-fixed in the toolchain. -Raymond]
  31. Sven P says:

    Instead of #include'ing all C(++) files immediately, why not include their preprocessed versions?

    This solves the changing #defines across files and all other kinds of preprocessor related problems.

Comments are closed.