Why does the PDB format change every release?


In another post, Steve Loughran asks “Why does the PDB format change every release? What is wrong with having one standard layout for debug symbols and leaving it alone?”

A good question, and one that I wondered about many years ago. The answer is that a symbol table has to be able to reproduce as much information about the program as reasonably possible. As languages adapt and add new functionality, it’s very likely that the additions to the language or the generated code can’t be faithfully reproduced by a symbol table from the preceding version’s compiler.

Here’s an example from a long time ago. One of Borland’s symbol table formats was based on somewhat of an industry standard format. Subsequently, the compiler introduced advanced optimizations that let a local variables be stored in different locations at different point in a function’s code. For instance, in the first 7 bytes of the function, ‘foo’ might exist at [EBP-4], for the next 12 bytes at ESI, and the next 10 bytes at EDI. There was simply no way to represent this information in the existing symbol table format. So why extend the format? Without doing so, the debugger would have no clue of where to find the correct value of the variable, and would end up giving the user incorrect information. This breaks one of the cardinal rules of debuggers. (Incidentally, these need to be written down somewhere, if they’re not already. Russ?)

Another example: Consider the sea change from Visual C++ 6.0 to Visual Studio.NET. Existing debug formats had almost no support for anything like the managed world where functions are compiled at runtime.

One could make an argument that symbol tables should be extensible without breaking the existing format. Some might point to an XML representation as one such means. However, I would contend that the above example I gave is a sufficient counter-argument. It’s just the nature of the game that compilers, debuggers, and symbol formats need to stay in lockstep for anything other than trivial debugging purposes.

 

 

 


Comments (6)

  1. I was writing a symbolic debugger for a college project sometime back. The debug interface access (DIA) is supposed to shield a developer from constant revisions of the .pdb files. I was trying to use the DIA, but in all honesty it seemed like brain surgery to me. So I scrapped that and returned back to using DBGHELP.DLL (version 5.1).

    I haven’t used the whidbey beta yet. But at some point I would worry about using DBGHELP.DLL version 5.1 with the newer .pdb files generated by whidbey. Now with benefit of hindsight it seems I might have been better off using that damn DIA. sigh 🙁

  2. The "stabs" format is claimed to be extensible, as it uses strings to describe all the interesting parts of the debug info – of course, you still have the problem that debug info analysers won’t understand the extensions, so you’ll need a newer version of the analyser to interpret newer versions of debug info.

    Having written debug information analysis code for stabs, dwarf and VAX format executables, all I can say is that I wish they all provided an API like MS do now…

  3. Sven says:

    I’m still using VC6, but unfortunately the new debug symbols from SP2 can’t be readed by VC6 anymore. My dream is that someone can extend the VC6 PDB dlls so that VC6 can work with the compiler/linker from VC7/8.

  4. Somebody asked here on the forums if you can use VS 2003 to debug .NET 2.0 (whidbey) apps. Unfortunately,