Speeding up the Incremental Developer Build Scenario

Overview

One of the major focus areas for Visual C++ lately has been improving developer productivity. Keeping this in mind a plethora of improvements have been introduced with Visual Studio 2015 preview (download here) aimed at the developer incremental scenario. The developer incremental scenario is one where a developer changes a single or multiple source files (while fixing bugs) and builds. This scenario for Visual C++ is roughly equivalent to the amount of time spent in linking the portable executable (.dll or .exe).  Naturally so, the features being talked about in this blog are mostly in the linker space. 

On average with the feature additions in this space we have seen roughly a 2X improvement for clean link scenarios and have added more scenarios that can now be incrementally linked or incrementally codegen’ed (when using Link Time Code Generation (LTCG)). A summary of this feature set and how to enable them is attached in the table below, but if you are interested to learn more about them then please keep reading along in the deep dive section. 

 

Feature

Description

Configuration

Usage

Incremental Linking
 for Static Libraries

Incrementally link when making edits to static libraries being consumed by other Portable Executables (.dll or .exe).

Effects all but LTCG enabled builds

Enabled by default when (/incremental) linker switch is thrown.

/Debug:FastLink

Generate the new Program Database (PDB) file when debugging to obtain fast link throughputs.

Effects all but LTCG enabled builds.

/Debug:FASTLINK (linker switch)

/Zc:inline and algorithmic improvements

The compiler no longer generates symbol information for dead code.

Effects all but LTCG enabled builds.

/Zc:inline (compiler switch)

Incremental LTCG  
(x86 targets only)

Enable incremental code-generation when working with LTCG enabled builds.

Effects only LTCG builds.

/LTCG:incremental (linker switch)

                  Table 1: Features introduced in VS2015 preview to speed up the incremental developer scenario
 

Incremental Linking for Static Libraries
(/incremental linker switch)

One of the top reasons for incremental linking to fail reported from our data analytics is when a developer makes an edit to a static library and builds the binary consuming it. As we started our effort we took a look at a variety of games being developed in-house for the Xbox One platform and it became evidently clear on why we need to support this scenario.

To give you folks an idea if you were to open up the Xbox One ‘Kinect Sports Rival (KSR)’ first party title solution in Visual Studio. You would notice roughly seventy static library projects eventually consumed by the massive Kinect Sports Rival executable binary with a PDB size of 1.8G when building for the debug configuration. Well with Visual Studio 2015 preview developers at KSR can finally take advantage of incremental linking now that it supports edits made within static libraries.

/Zc:inline and Algorithmic improvements (/Zc:inline compiler switch, 2X Faster Links)

/Zc:inline has been talked about before in previous blogs but to recap, throwing this switch instructs the compiler to no longer generate symbols for unreferenced data and functions. This not only results in the object files being smaller but also a reduced input set into the linker providing reduced link times. With /Zc:inline switch and other algorithmic improvements at play notice the drop in the clean link times for Kinect Sports Rival illustrated in the figure below. These gains are also reported on other popular benchmarks such as (Chrome, Xbox One games) and others which cannot be discussed here due to legal reasons. As a cautionary note please remember that the /Zc:inline switch only impacts optimized (non /Od and non LTCG) builds. 


                                                                           Figure 1: Clean link times with /Zc:inline

 

Fast Program Database (PDB) generation (/debug:FASTLINK linker switch, 2X Faster Links)

The Visual C++ linker for non LTCG builds spends majority of its time in generating program database file (PDB). Merging of type information, fixing up private symbol’s type indexes, and generating global symbols are major time components in PDB generation. With /DEBUG:FASTLINK the linker produced PDB doesn’t have any private symbol, and debug information is distributed among input object and library files and the linker generated PDB just serves as an indexing database.  DIA APIs have been modified to provide a seamless experience for debugging (only), and using this option provides much faster link times with little or no impact to overall debugging experience. To illustrate this further, notice the drop in full link times with the /DEBUG:FASTLINK switch thrown for a couple of benchmarks we have here in our labs. 
 

                                                                  
                                                  Figure 2: Clean link times with /Debug:fastlink 

Incremental Link Time Code Generation (iLTCG) (/LTCG:incremental linker switch, 4x faster links)

Link Time Code Generation (AKA Whole Program Optimization) produces better code quality as we have additional whole program optimization to further optimize the code leveraging the bigger picture of the entire program that is only available during LTCG. LTCG is great for code quality, and it’s the foundation for Pogo; however throughput is its downside and developers today have to wait for full clean LTCG build times even on making trivial edits. This often kills the popularity of this feature and developers today are forced away to throw these extra performance improvements away in favor of productivity.

The way LTCG works is such that the whole program analysis result is used for optimization, any change in any function of the program could affect codegen/optimization of a distant function in different module. Thus we need to recompile the entire program including modules that are not edited as long as there’s any change in the entire program. To improve throughput of LTCG while maintaining its code quality benefit, we introduced Incremental LTCG. With Incremental LTCG, we’re able to capture the exact effect of whole program optimization for an edit, and only re-compile affected functions. For those unaffected, we copy their code directly from the output of previous compilation, thus reduce the build time without sacrificing code quality. When the edit is trivial, the throughput improvement from iLTCG can be as high as 4X.

Just to illustrate this point, in the figure below you will see the build throughput gains measured by using 84 real checkins made by our very own compiler backend developers building the compiler backend (c2.dll). On average a speed up of ~3.6x was observed in this scenario. To summarize we have seen minimal impact on quality of generated code (<0.5% CQ loss on our benchmarks) but a multi-x improvement on LTCG builds. We are aiming for this feature to be hence always enabled even when shipping the product bits externally :). 
 
        
                               Figure 4: Throughput gains for compiler backend (c2.dll) using Incremental LTCG

 

What’s Next!

While the incremental developer build scenario remains a critical scenario for us, we have also done work on improving clean build times where typically majority of the time is spent in the compiler frontend. As a result of this work template heavy C++ code should now compile faster. As an example, the Unreal game engine which is template heavy compiles ~30% faster with VS2015 Preview. Sadly however we have also seen some regressions introduced as we marched towards preview mostly due to the newer conformance features. These regressions are being tracked and fixed for next developer bits.   

 

Wrap Up

This blog should give you an overview about the work we have done in VS2015 preview for improving the developer incremental scenario. Our current focus has been to look at slightly larger projects currently and as a result these wins should be most noticeable for larger projects such as Chrome and others. Please give them a shot and let us know how it works out for your application. It would be great if you folks can post before/after numbers on linker throughput when trying out these features. If you are link times are still painfully slow or you are looking for more feedback please email me, Ankit, at aasthan@microsoft.com. We would love to know more!

Thanks to Chromium developers and the Kinect Sports Rivals team for validating that our changes had a positive impact in real-world scenarios.