Hello, I’m Mark Hall, an architect in the Visual C++ group. I wanted to follow up on Jim Springfield’s previous blogs about the history of C++ intellisense, and some of the changes we’re making in upcoming Visual Studio 10 release. It’s been almost a year since Jim’s posts, which can be found here:
Many thanks to Jim for his archeological dig through our old products and explanation of where we’re going in “Dev10”. I’d like to add a few more details, track our progress over the last year, and offer my perspective on our investment in C++.
When we first implemented intellisense for C++, it was easy to exceed expectations, mostly because customers weren’t expecting very much. It had to run fast, but if it didn’t work for templates or a complicated overload scenario people didn’t complain. Naturally we wanted to leverage the “front-end” of our C++ command line compiler for intellisense, but it wasn’t designed with intellisense in mind. It was designed to run in 256K of RAM on 16 bit DOS. It would parse and analyze a statement at a time and immediately lower everything to an assembly-level intermediate language. Parsing and semantic analysis were completely intertwined. A good architecture for intellisense separates parsing from semantic analysis. Semantic analysis is expensive, so for intellisense you only want to do it for the one item the user is asking about. We considered writing a new C++ front-end for intellisense. Thankfully we came to our senses – C++ is far too rich and complex. It would take much longer than one product cycle for that, and we needed something more expedient. So we “derived” our intellisense compiler from our command line C++ compiler through the liberal use of ifdefs in the code. We called the intellisense compiler FEACP, for “Front End Auto Complete Parser”. Our command line compiler is called C1XX, for “C++ phase 1” (turn the X’s sideways). The ifdefs removed all the C1XX code that did any “lowering” to the intermediate language. It also removed any semantic analysis we didn’t think was essential for intellisense (initialization, member offset computation, vtable production, etc). The CPU speeds at the time were only 100 MHz, so we had to shortcut a lot of C1XX code to populate dropdown windows within 100 milliseconds. We also removed all non-essential memory usage, storing far less information about each symbol we encountered. This was especially important because FEACP had to run in the IDE process, which already consumed copious amounts of memory.
FEACP was a product success, but a testing and maintenance nightmare. There were so many strange dependencies on code and data that we had ifdef’d out – crashes were common, along with corruption of the NCB file. We paid a hefty toll in maintenance. Since FEACP and C1XX were so different, all our testing of C1XX (millions of lines of code daily) had little effect on the quality of FEACP. It had to be tested more or less independently. The lesson here is that supporting two different compilers in the same source base is only slightly smarter than supporting two completely separate compilers, and neither is a good choice for long-term maintainability (actually it was three compilers, since ANSI-C was another set of ifdefs – and truth be told it was actually four if we include the /analyze compiler).
In the years since we first released FEACP we came to realize that we’d ventured as far into “ifdef hell” as we could go. At the same time, intellisense for simpler languages like Java and C# raised user expectations for intellisense far beyond what we could support in FEACP. With 1000x faster CPU speeds and memory capacity, the (valid) assumptions we made when we created FEACP no longer held. Moreover, the multitude of ifdefs (and resulting compiler models) severely diminished our ability to add language features to C1XX, our bread and butter. Our bug counts were climbing, and the “language gap” was growing. At the same time, the number of people qualified and willing to work on a C++ compiler was shrinking. Something had to give.
With the speed and capacity of modern machines we knew one compiler could service both code generation and intellisense for C++. What we lacked was a high-level internal representation of the source code. If we had that, we could query the representation to service intellisense requests, or lower it to produce machine code. We wouldn’t have thousands of #ifdefs polluting the front-end code, and there would be just one model of compilation. Testing costs would be slashed dramatically. The barrier to entry for new compiler developers would be significantly reduced. But it wouldn’t be free – even with GHz clock speeds, you can’t run the full front-end over all the code for every intellisense request. We would have to create a new intellisense framework that could run the full front-end only on the regions of code that were necessary to produce a desired result. It would still be a lot of work, but we knew we could do it in a single product cycle. I’m happy (and relieved) to say that we did, and Dev10 is the result.
Having read this far you’re probably asking yourself, “OK, so you’ve lowered your cost of ownership. Good for Microsoft. But what’s in it for me?” The most compelling feature this brings to intellisense is accuracy. Visual C++ compiles millions of lines of C++ code daily in build labs and desktops all over the world, and does so with “five nines” of accuracy. Harnessing the command line compiler for intellisense means it will have the same level of accuracy. This is our biggest differentiator in the intellisense market for C++.
Being accurate means more than just getting the right set of members in an auto-complete dropdown – it enables other features that would be impossible or undesirable without it. For example, accuracy means that any errors encountered during the intellisense parse are real errors, and we can expose them to the user as “squiggles” in the editor window as they edit or browse code. They can fix the errors without leaving the editor. This cuts out saving files and kicking off builds to get the command line compiler to provide such diagnostics.
A future benefit of accuracy is that our data stores will be reliable enough to use for code analysis and transformations, such as assisted refactoring of source code. There wasn’t time in Dev10 to provide access to our expression-level data. Users will be able to browse the symbol store to extract symbol-level information about their source bases. In a future release we will provide user-facing APIs that provide access to accurate information about their C++ source bases. This will open up a whole new ecosystem for analysis, understanding, and productivity tools for C++ on Windows.
Dev10 is just the beginning.