Performance Tests

Hello everyone, I’m Asmaa Taha; I’m a test developer in the Visual C++ compiler optimization test team.

Our Team has various testing systems that automate the testing and reporting processes to convey the state of the product to our feature teams. Some of them are useful in automating the feature testing we have for the compiler (and making them go faster!). Others are useful in automating building real world code like Windows using the latest versions of our compiler in our labs. Last but not least, we have automation to measure the performance of our compiler.

In this blog I will mainly talk about the system that measures the performance of our compiler. We call this system the VCBench System-Visual C++ Compiler Benchmark1 System.

The purpose of VCBench is to measure and tune the performance of our compiler and compare it with other compilers. Two of the major benchmarks measured in VCBench are the Spec2000 and Spec2006 (see footnote 2 below) benchmarks. They report code size and execution time of a number of primarily integer algorithms as well as a number of floating point algorithms; for this reason we call these “Code Quality” benchmarks. Code Quality benchmarks measure the speed and size of binaries generated by our compiler.  We also have “Code Throughput” benchmarks that measure the time it takes our compiler to generate binaries.  These are the two major performance concerns of the Visual C++ compiler.

VCBench is run on daily builds of the VC++ compiler to monitor performance changes. In the case of a performance regression, a bug will be opened and our test and developer teams will coordinate to find and fix the performance loss.  In addition to regression prevention, VCBench is used by our developers working on new features.  The developers will iterate compiler prototype builds several times through the VCBench system in order to tune heuristics of new optimizations.  Before they can add their feature to the product they must present their performance deltas to their peers and management for sign off.  Often times there are trade-offs between compiler throughput, generated code speed and generated code size that are -sometimes heatedly- debated.

Tests are run for multiple iterations to minimize noise while still maintaining acceptable throughput, then the system reports the median.  We report the median as opposed to the mean due to the fact that performance results are generally not a normal distribution – they are skewed! Tests can run with different optimizations switches; in general we run a matrix to exercise as many of the optimization code paths as possible with the machine resources that we have. The outputs of these tests are saved to a SQL database. Results are available to developers through an internal website to make it easy to track results.

To reduce noise on the benchmarking machines, we take several steps:

1.       Stop as many services and processes as possible.

2.       Disable network driver: this will turn off the interrupts from NIC caused by broadcast packets.

3.       Set the test’s processor affinity to run on one processor/core only.

4.       Set the run to high priority which will decrease the number of context switches.

5.       Run the test for several iterations.

VCBench allows submitting private runs, so any developer can submit a custom built compiler with any configuration. The set of configuration for private submission is a super-set of the configurations that we run on the daily builds of our product. The results for these runs are inserted in the database and can be compared to any other run, either private or daily automated run. VCBench notifies the developers when their runs finish along with whether it succeeded or failed. At this point it gives the developers the option to see the performance impact of their changes.

I hope this blog gave you an idea of how performance testing for back-end compiler is done; and there will be another blog that will compare dev10 and vc6.

1.        Benchmark “A standard of measurement or evaluation.” A computer benchmark is typically a computer program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed. Computer benchmark metrics usually measure speed (how fast was the workload completed) or throughput (how many workloads per unit time were measured). Running the same computer benchmark on multiple computers allows a comparison to be made.

2.        SPEC is an acronym for the Standard Performance Evaluation Corporation. SPEC is a non-profit organization composed of computer vendors, systems integrators, universities, research organizations, publishers and consultants whose goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community will benefit from an objective series of tests which can serve as a common reference point.

3.        Noise is variation in the output if not accounted for by changes in the compiler.