Hello, this is Jim Springfield again. I want to start explaining our plan to fundamentally change how IntelliSense and other code browsing features work for C/C++. The recent GDR for VS2005 and the changes that went into VS2008 were significant, but they don’t really change how these features are implemented. This post covers the history of these features and helps set the stage for explaining what we are trying to accomplish in VC10 (the next release after VS2008).
Much of this summary is taken from my own memory of the events and from installing all of these older versions of Visual C++ and experimenting with them in order to refresh my memory.
Capturing information about a C or C++ program’s structure has been around for a very long time in Microsoft’s products. Preceding even Visual C++ 1.0, the compiler supported generating program information through .SBR and .BSC files. (Note: The compiler in Visual C++ 1.0 was already version 8, so the command line tools had been around a while already.) The SBR files contain reference and definition information for a single translation unit that the compiler generates as it compiles. These SBR files are combined in a later step using the BSCMAKE tool to generate a BSC file. This file can then be used to look at many different aspects of a program: reference, definitions, caller-callee graphs, macros, etc.
Since the inception of the Visual C++ product, we have been parsing C++ code and storing information about it in some form for the use of the IDE. This parser has been separate from the command line compiler because many features of the IDE require code understanding and requiring a build would be an onerous burden in these cases. For instance, at many stages of editing, the code is simply not in a compilable state, so requiring a compile would not be workable. The earliest IDE used CLW (Class Wizard) files to store this information. These were structured as an INI file, which were common in 16 bit Windows before the registry was developed. These provided minimal information about where classes were located and some information about resources. These CLW files were generated using a very simple parser, which didn’t have to deal with templates or Ansi/Unicode issues. Also, special locations in files were marked with comments that couldn’t be edited. It was effective at the time for supporting the minimal requirements of Class Wizard, but it didn’t provide a lot of information about a program.
Visual C++ 4.0 saw the arrival of a new feature: ClassView. ClassView displayed information about all classes, class members, and global variables. The parser used for CLW files was not sufficient and a new parser was written and the information was stored in a new file called the NCB file. NCB was an abbreviation for “no compile browse”. It provided some information that building a BSC would provide, but not all.
Visual C++ 6.0 saw the introduction of a new parser (feacp) for generating NCB files. Internally, it was called “YCB” for “yes compile browse” although it still generated NCB files. It was called “yes compile browse” because a modified version of the actual compiler was used to parse and generate the NCB. The C++ language had been getting larger with namespaces, templates, and exceptions and maintaining multiple parsers was not desired. The CLW parser was still being used, however, to generate CLW files. VC 6.0 also saw the introduction of the first “Intellisense” features such as autocomplete and parameter info.
The NCB file is very similar to a BSC file and is based on a multi-stream format developed for PDB files. The contents of the NCB file are loaded into memory and changes are made in memory and persisted to the NCB file when shutting down. The data structures in memory and on disk are very hierarchical and most lookups require walking through the data structures. An element is represented through a 32bit handle which uses 16 bits to specify the module (i.e. file) the element came from and 16bits to represent the element within the file. This limits the number of files to 64K and the number of elements within a file to 64K. This may seem like a lot, but there are customers hitting these limits. (Note: prior to Whidbey, there was a 16K limit on the number of files as two bits were being used for some housekeeping.)
In Visual C++ .Net (i.e. 7.0) the CLW file and associated parser were finally removed and Class Wizard features were implemented using information from the NCB file. In 7.1, 8.0 (Whidbey), and 9.0 (Orcas), not much has changed. Whidbey saw the biggest change as we eliminated pre-built NCB’s for libraries and the SDK, provided better support for macros, and allowed 64K files in an NCB. There have been these incremental improvements, but the overall architecture has remained the same.
As the NCB was used for more and more features, it became a core piece of the IDE’s technology and if it didn’t function correctly, many IDE features would not work. FEACP needed to deal with large, complex dependencies between files and potentially incorrect code. When a common header file was changed in a project, all dependencies would be reparsed in order to generate correct information.
Note: FEACP would only parse the header file itself once in the context of one translation unit, but all dependent cpp files would be reparsed using information gathered during the one parse of the header. The problems this causes are collectively called the “multi-mod” problem, because it occurs when a header is used by multiple modules.
For large projects, this reparsing could take a while. Initially, this caused the IDE to freeze as the parse would happen on the foreground UI thread. This was addressed in later versions by doing the parsing on a background thread. However, there were some scenarios where the foreground UI would need the results and would need to block anyways. Also, this frequent reparsing could use a lot of CPU and memory and cause problems by using too many resources and still causing issues with the UI. This was eventually tuned to some degree by running at lower priority and delaying reparsing until a perceived idle time. Another solution has been to add three prioritized queues for work, which can allow more important work to get done first. Other problems that occurred were due to corruption of the NCB file or errors in the compiler that would cause a parse to fail early in a file and would result in no information being available from that file. There have also been issues with concurrency and locking of the NCB data in memory. Adding the ability to quickly find information based on a simple query is very difficult and requires changes to code. Extending the NCB format to add support for templates, C++/CLI, and other language features has also proven difficult.
All of these issues are exacerbated by larger, more complex projects. The number of files that may need to be reparsed can become quite large and the frequency of reparsing can be high. Also, “intermittent” failures are simply more likely to happen as the size of projects goes up. All of these problems have been looked at over time and some fixes and incremental improvements have been made, but the fundamental issues remain.
Next time, I will cover our approach to tackling these problems in VC10, which we are working on right now.