In another post, Steve Loughran asks “Why does the PDB format change every release? What is wrong with having one standard layout for debug symbols and leaving it alone?”
A good question, and one that I wondered about many years ago. The answer is that a symbol table has to be able to reproduce as much information about the program as reasonably possible. As languages adapt and add new functionality, it’s very likely that the additions to the language or the generated code can’t be faithfully reproduced by a symbol table from the preceding version’s compiler.
Here’s an example from a long time ago. One of Borland’s symbol table formats was based on somewhat of an industry standard format. Subsequently, the compiler introduced advanced optimizations that let a local variables be stored in different locations at different point in a function’s code. For instance, in the first 7 bytes of the function, ‘foo’ might exist at [EBP-4], for the next 12 bytes at ESI, and the next 10 bytes at EDI. There was simply no way to represent this information in the existing symbol table format. So why extend the format? Without doing so, the debugger would have no clue of where to find the correct value of the variable, and would end up giving the user incorrect information. This breaks one of the cardinal rules of debuggers. (Incidentally, these need to be written down somewhere, if they’re not already. Russ?)
Another example: Consider the sea change from Visual C++ 6.0 to Visual Studio.NET. Existing debug formats had almost no support for anything like the managed world where functions are compiled at runtime.
One could make an argument that symbol tables should be extensible without breaking the existing format. Some might point to an XML representation as one such means. However, I would contend that the above example I gave is a sufficient counter-argument. It’s just the nature of the game that compilers, debuggers, and symbol formats need to stay in lockstep for anything other than trivial debugging purposes.