Hi my name is Li Shao. I am a Senior Software Design Engineer in Test in the Visual C++ group. In this blog, our team would like to review Visual Studio 11 (VS11) Desktop application build throughput compared to Visual Studio 2010 SP1 (VS10). Jim Hogg, Mark Hall, Bill Bailey, Mohamed Magdy Mohamed, and Valentin Isac have also contributed to this blog. You can refer to the Blog by Tim Wagner if you are interested in the new Metro Style application build throughput.
Build throughput is one of the most important productivity factors for C++ developers, and so build throughput testing is a vital component of our overall performance testing for Visual C++. We have targeted build throughput tests for the compiler and linker, and we also have build throughput tests for the MSBuild build engine. For every release, we have performance tests to check the overall end to end build performance, including time spent in the compiler front-end, back-end, and linker. For those of you who may not know, the compiler front-end is the phase that parses and analyzes the source code to build the intermediate representation (IR) of the program. The compiler back-end is the phase that takes the IR as input, and performs optimizations and generates code in the form of object files. The linker takes these object files as input and assembles them into an executable file. In the case of building with /GL (compile for link-time code generation, or LTCG), code generation and optimization happen during the linker phase where the IR for the entire executable can be analyzed.
Every new release of Visual C++ contains a large amount of new technology and features available to customers in the form of source code libraries and header files. For a given compiler, the more source code you have, the longer it takes to compile. Fortunately, processor advances over the years have offset this increase in functionality. In recent releases, as processor speeds began topping out, we have invested in multi-core features, such as Multi-Proc MSBuild, and the /MP compiler switch. In VS11, we multi-threaded the backend. For builds that require a lot of optimization time, this has netted some great wins. For example, Microsoft’s SQL build is 30% faster thanks to reduced back-end times.
The increase in functionality in VS11 is among the largest delta we’ve ever shipped. As we integrated all this new technology into the product, our performance tests told us how it was affecting overall build performance versus previous releases. Even though the amount of new source code being compiled rose significantly, in most cases, build time increases were held to an acceptable delta.
However, based on our testing results, if you have an application which uses the Standard Template Library heavily, you may notice slower builds due to the increased functionality mandated by the C++ Standard. In this blog, we will analyze the build throughput of a representative application to demonstrate the improvements we made and possible slowdown you may experience.
Build Throughput Data from “real world” Applications
Our end to end throughput tests use production-level, “real world” applications. We refer to them as RWC (Real World Code). Here is the build throughput data from an internal, post-Beta version of VS11 on one of our RWC projects across different machine configurations. This particular desktop application has about 50 C++ projects, with 2.8 million lines of code (LOC). The application makes heavy use of the Standard Template Library. The “RWC” below refers specifically to this application.
|Configuration||ProcessorType||ProcessorNumber||RAM||OS (Windows 7 SP1)|
|Spec A||Intel Core I7||4 core (8 threads)||16||x64|
|Spec B||Intel Core 2 Quad||4 core (4 threads)||8||x64|
|Spec C||Intel Pentium||2 Core (2 threads)||3||x86|
Figure 1: RWC Build Throughput (MSBuild 2 Proc (/m:2) and Full Optimization (/Ox))
Figure 2: RWC Build throughput with link /LTCG (LTCG: Link Time Code Generation, MSBuild 2 Proc (/m:2), Compile with /GL, link with /LTCG)
Figure 3: RWC Compiler Back-end throughput (MSBuild 2 Proc (/m:2))
Figure 4: RWC Link time throughput (MSBuild 2 Proc (/m:2))
“Full Build time” is measured from when the build starts until the build finishes. It is the clock time that it takes for the build to finish. Compiler front-end (FE), compiler back-end (BE), and link time are the total time (accumulated time) spent in the C1.dll & C1xx.dll (FE), C2.dll (BE) and linker across all processes. Since this is a multi-proc build, and some of the projects have /MP enabled for compiler, generally there is more than one instance of the compiler running, which is why the total time spent in the tools is much larger than the “Full Build Time”.
In the non-LTCG build (Figure 1), the linker does minimal work compared with the rest of the components, therefore the linker time is hardly visible from the graph. Likewise, in LTCG build (Figure 2), all the code generation happens after linking. Since the linker eliminates all redundant template instantiations across the object files before the back-end runs, the back-end time is drastically reduced in this scenario.
Based on our data when running multi-proc MSBuild (MSBuild /M:n) on this particular application, building with 4-8 Proc can give slight throughput improvement over 2 Proc for overall build time. However, due to resource contention and project reference limitation on paralleled builds that can actually happen, a 2-Proc build is representative of multi-proc build characteristics for this application in terms of overall throughput. Therefore, we are using the 2 Proc data here to present the build throughput.
Here are some key observations based on the data:
Faster Compiler Back-end phase
In VS11, we have introduced a new feature to support creating multiple compiler back-end threads to improve build performance. The compiler back-end performs optimization and code generation on a single function at a time, allowing it to generate code for multiple functions in parallel. This can be a win, especially for code generation with optimizations enabled, as performing optimizations requires the back-end to spend more time in each function.
This throughput win can be seen in Figure 1 and Figure 3. While these are impressive improvements in compiler back-end throughput, we note that in this scenario the back end time is a small fraction of the total build time. We highlight it here for illustration purposes only.
Slower Compiler Front-end phase
We can see that there is 15% – 20% performance degradation in the VS11 front-end compared to VS10 in this real world example (Figure 1 and Figure 2). Our investigation revealed that this performance degradation is not in the compiler itself but rather is a function of the increased number of template instantiations being processed by the front-end. New functionality added to the STL, significantly increases the number of template instantiations in a given compilation. This major increase of the functionality is mandated by the new C++ Standard and is also widely requested by our customers. The increase in template types forces the front end to instantiate many more templates for a given input file, which causes the overall build performance to degrade. Of course, we haven’t shipped VS11 yet, so we continue to analyze this issue and look for ways to improve throughput.
As we mentioned earlier, the linker takes the intermediate files generated by the compiler and produces the final assembly. As a result of larger intermediate files generated by the compiler, you may also see slight slowdown in link time for both un-optimized build and optimized build (LTCG), especially on lower end machines (Spec C as we presented) due to more symbols to process and merge.
Faster LTCG Build
On a high end machine (Spec A and Spec B as we presented), our tests show faster link time in LTCG builds (Figure 4).
For a number of releases the compiler has supported Link Time Code Generation (LTCG), whereby code generation is deferred to link-time so that information on all modules for the entire image (EXE, DLL) is made available for optimization. Thus, code generation can be done in the context of the entire image, inlining across modules, application of custom calling convention, etc. Non-LTCG builds generally limit optimizations to within the context of a single object file.
With VS11, this link-time code generation can be done in multiple threads, compiling more than one function at a time, caveat dependencies as determined by the function call tree. This is a particularly good win for projects that compile into fewer, larger images. Our data shows that when building a “big EXE” Microsoft product, SQL Server, build time improved ~30% due to the work we did to improve LTCG build throughput. Note that SQL Server sources do not use template code, which is why it does not have the build throughput degradation caused by the new STL templates.
How to improve your application’s build performance?
Since build throughput is very important to C++ developer productivity, here are a few suggestions that can help you improve build throughput.
Take advantage of the CPU cores on your machine
From Figure 1 and Figure 2, you can see that although the total FE time has regressed, the regression in the overall build time is still very minimal on the particular application, especially on high-end machines (Spec A and Spec B). This is due to the effect of MultiProc build. When building in the IDE, the Build system will use the total number of cores on your machine by default. You can also modify multi-proc build by going to Tools -> Options -> Projects and Solutions -> Build and Run, changing the setting for “Maximum number of Parallel Project Builds”. If you are building on the command line, pass in /m:n (n is the numbers of processes you would like to use). The default when building with MSBuild on the command line is 1. As mentioned earlier, although building with 4 or more procs might give you better performance, for this particular application, setting MSBuild to 2 proc build (/m:2) gives the performance that is close to 4 or 8 proc build. In addition, you can also set /MP per project or per file to take advantage of compiler level multi-process build. For additional suggestions on how to further tune your build, you can take a look of this blog.
Use Pre-Compiled Headers (PCH)
C++ library headers represent a collection of living, evolving libraries, and they tend to increase in size from release to release. In VS11, for example, the windows headers increased in raw size by 13%. This is due to adding more API functions and types related to Windows 8. The new Windows headers will add more compile time when the PCH is created, not when it is used. Once built, there are many more compiles that use the PCH, whose build time are affected only a tiny amount, if at all. Proper use of precompiled headers continues to be the most effective way to reduce overall build times. If your application uses template extensively, you may also consider pre-instantiate the template types in PCH files.
A small experiment shows the difference using PCH achieves for a single project with around 200 files. Each file has around a hundred lines of code. Various library headers are included in the PCH.
Figure 5: PCH impact of minimizing the effect of size increase of library headers
Use Managed Incremental Build
If you have a managed application, you may take advantage of the managed incremental build to avoid doing a full build when the referenced assemblies have insignificant change. For more information, you can read this blog.
We have made significant throughput improvements to the Compiler Back-end and Linker build phases. If you have an application that is not a heavy user of STL and spend a lot of time in optimization, you should see the build time improvement.
The increase of template types will play a dominant role in the overall build throughput for applications with extensive templates. Applications that use the STL extensively may experience longer build times due to changes mandated by the C++11 Standard. You may also experience slightly longer build time due to the increased number of total headers files in VS11 and Windows 8.
To improve the overall build time, you can take advantage of the multiple cores on your machines to do multiproc builds. Also make sure to use Pre-Compiled headers (PCH). For managed C++ application, be sure to have managed incremental build turned on to improve incremental build performance.
Let Us Know!
We are interested in getting your build throughput performance data if you see any improvement, or slowdown beyond what you might expect from some amount of growth in header files or usage of STL templates when migrating your applications to VS11. You may reply to the blog or send email to lishao at Microsoft dot com.
In addition to capturing the overall build time, you can get compiler and linker time for each compiler/linker instance by passing /Bt to Compiler and /Time to Linker.
· When building in the IDE, you can set /Bt as the additional options for compiler and /Time as the additional options for linker. Make sure you build the application with “Detailed” verbosity. To set the verbosity, you can go to Tools -> Options -> Project and Solutions -> Build and Run, set “MSBuild Project Build Verbosity” to “Detailed”.
· For command line build, you can set _CL_=/Bt and _LINK_=/Time in the build environment and build with MSBuild /v:d option.
In your build log, you will see the time spent in C1.dll (FE), C1xx.dll (FE), C2.dll (BE) and linker for each instance of the compiler and linker. You may need to write a simple script to add up those numbers. Please let us know if you would like us to post a script that can do the work. Alternatively, you can enable MSBuild “Diagnostic” logging. It will give you the time spent in Compiler task and linker task, which is close to the compiler and linker time.
We hope that this blog can help you understand more about C++ desktop application build throughput in VS11. If you are interested in C++ Metro Style Application build throughput, which are new for VS11, you can take a look at this blog to get an overview. Note that, we continue to make improvement in C++ Metro Style application build throughput. You should see about 20% overall build throughput improvement compared to Beta in the upcoming release.
Please let us know if you have any feedback. We appreciate your input to help us improve build performance.