Better Parallelism in MSBuild 4 with YieldDuringToolExecution

Introduction

In MSBuild 4 we introduced several performance improvements, particular for large interdependent builds.  By and large they are automatic and you receive their benefit without making any changes to the way your build process in authored.  However, there are still some cases where we are unable to make the best decision.  One such case is when there is a particular external tool which is invoked as part of the build but which takes a significant amount of time.  An example of such a tool would be cl.exe, the C++ compiler.  This article discusses how to use the new yield mechanism for external tools to improve the performance of your builds.

Tool Tasks

There are a few ways MSBuild can be made to execute external, command-line tools:

  1. Write a task which derives from ToolTask.
  2. Use the Exec task to call your command.
  3. Use the XamlTaskFactory.

All of these methods ultimately use the ToolTask class in Microsoft.Build.Utilities.v4.0.dll to handle executing a command-line task and deal with the output in the MSBuild way.  Like all tasks, however, they block any other work from happening in MSBuild while they are executing.  In cases where the task is very short, such as touching a log file or copying a file from one place to another this is perfectly acceptable.  But in the original example of invoking the C++ compiler, the amount of time MSBuild itself sits idle can be lengthy and in some cases it may be a significant impediment to good parallelization of your build.

The problem has to do with the way MSBuild utilizes its worker nodes.  Whenever a project is scheduled to be built, it is assigned to one of the worker nodes.  This node will then execute that project from start to finish, and will not accept more work until the project is either finished or the project makes an MSBuild call (for instance to satisfy a project-to-project reference.)  This is in large part because a node can only execute one task at a time, as tasks must be guaranteed their environment and current directory will not be modified during execution.

However, command-line tools do not execute in-process, and therefore their environment cannot be polluted by the running of additional tasks in parallel on the same node.  We can take advantage of this behavior to let the MSBuild node execute tasks in other projects while our long-running tool completes its work.  This is done using the YieldDuringToolExecution parameter.

YieldDuringToolExecution

In order to allow MSBuild to continue building other projects while a command-line tool in one project is running is simple.  Just set the YieldDuringToolExecution parameter to ‘True’ on your long running command-line tool.  This is a boolean parameter, so any valid MSBuild expression which resolves to a boolean value will work.  Here’s an example:

 <PropertyGroup>    <YieldDuringToolExecution>true</YieldDuringToolExecution></PropertyGroup><Exec CommandLine=”Sleep 10000” YieldDuringToolExecution=”$(YieldDuringToolExecution)”/>

When the Exec task executes, normally it would sleep for 10000 seconds during which no other work on the node can proceed.  However, with yielding enabled, the Sleep command will still run but the MSBuild node will be free to do other work.  Once the Sleep command is finished, the node will resume building the project which launched it as soon as the node is free to do so.

Whether or not you should enable yielding for your ToolTasks depends on what they do.  Generally speaking if the task runs for less than one second, it’s probably not worth it to enable this since there is a small cost to give up the MSBuild node.  However, for longer tools you may see some wins, and the wins will likely be larger the more complex your build is and the more long running tasks you have in it.  Again, large interdependent C++ builds are a great example of this and they benefit tremendously from yielding being applied to the compiler.  You can investigate your build’s performance using the Detailed Summary feature of MSBuild 4.

Yielding interacts well with the /m switch in MSBuild as well.  For instance, if you have specified /m:4 to enable parallelization, MSBuild will ensure that no more than four parallel things are going on at once, whether they be regularly building projects or yielding tools.  So enabling yielding will not cause your machine to become more overloaded.  Instead your builds are likely to improve their parallelization and make better use of available CPU and I/O cycles that they would otherwise. 

We have already enabled yield semantic for several tool tasks.  These include:

  • CL, the C++ compiler
  • MIDL, the IDL compiler
  • Link, the native linker – Only when the LinkTimeCodeGeneration metadata is set to UseLinkTimeCodeGeneration

It could also be enabled for the Vbc and Csc tasks since they are ToolTasks as well, but this support is not in the Microsoft.CSharp.targets and Microsoft.VisualBasic.targets shipped with .Net 4.0.  You could easily add them yourself if you wished.  More generally, if you include Microsoft.Common.targets the YieldDuringToolExecution property will be set to true unless it is overridden with the parameter /p:YieldDuringToolExecution=false being passed to MSBuild.  We will continue to use this property as the basis for selecting the tool parameter value of the same name.

Why isn’t it automatic?

Unfortunately for MSBuild 4 we didn’t get the opportunity to make this system as automatic as we would like.  In future versions we would like to automatically yield when ToolTasks are executing if they look like they will last longer than a certain threshold.  This will also work together with additional automatic improvements in build analysis and scheduling we have planned.

Cliff Hudson - MSBuild Developer