I’m Bruce Forstall, a developer on the Visual C++ IDE team. I currently work on the implementation of the CLR CodeDOM API for C++/CLI, which supports the WinForms designer as well as a number of other scenarios. Previously, I’ve worked on the Itanium code generator, the Java virtual machine team, and the Cairo OS project (anyone remember that?).
I thought I’d write about an internal tool named Gauntlet that we depend on for our development work.
Before Gauntlet, we had a number of problems. People often wouldn’t do all the pre-check-in validation builds or test runs they should have, or they would do them in an environment that wasn’t up-to-date. Conflicting changes from different people made on the same day would cause builds or tests to break because of integration problems. And in Visual Studio 2005, we started supporting three different architectures (x86, Itanium, x64). This meant we needed to do a lot more per-architecture testing, and not everybody had every type of machine in their office. Gauntlet solved these problems and more.
Gauntlet is a tool designed to improve the everyday quality of the product by performing a large amount of testing of each and every change before those changes are committed to the source control system, and made visible to other developers, testers, and our daily process. Each change, from every developer, is serialized. Thus, all testing happens on each change in sequence, detecting integration problems immediately. If a change passes all testing, it gets checked in. If not, it gets rejected and no damage is done to the “live” source base. Everyone can always assume the “live” source tree is good, because Gauntlet has validated every check-in. Gauntlet also tries to simplify life for developers by automatically sending email with information about the change and updating our bug database.
The system consists of a website user-interface, a SQL server for storing pending and completed check-in information and other data such as usage metrics, a Gauntlet server process, and a large number of “slave” machines for performing builds and tests. This is a key to the system: as much as possible is run in parallel, to maximize throughput as well as increase the amount of testing that is performed. We have a number of Gauntlet systems on the team, for different groups. The largest uses about 60 machines, comprising some of each of the three processor architectures. Each check-in takes between an hour and two hours (depending on the type) and runs hundreds of thousands of test cases, not only for each of the three already-stated processor architectures, but for code generator check-ins, the x86-hosted cross-compilers for Itanium and x64 are tested, as are compilers built from the same code base for several Windows CE processor architectures.
Gauntlet originated in the DHTML team. I rolled Gauntlet out to the Visual C++ team over five years ago. Initially, the change in process was resisted, but now the system is viewed as indispensable. In that time, our original Gauntlet has processed almost 15,000 submissions. On average, about 20% of the submissions fail due to various reasons, such as a developer not testing a configuration or variation that Gauntlet tests. That means that about 3,000 faulty check-ins have been prevented from entering the “live” sources. That represents at least 3,000 bugs that didn’t have to be detected sometime later, the faulty check-in found and backed out or fixed. And that means our everyday quality has been higher.
There are several simple lessons from our Gauntlet experience. First, testing a change as much as possible before committing the change is incredibly valuable. Enforcing that testing regimen, and enriching it by greatly expanding the testing performed, makes it even more valuable. Secondly, setting up and maintaining a system this large, simply from a hardware perspective, is costly and time-consuming, but we deem it to be absolutely worth the cost.