Better Parallelism in MSBuild 4 with YieldDuringToolExecution


Introduction

In MSBuild 4 we introduced several performance improvements, particular for large interdependent builds.  By and large they are automatic and you receive their benefit without making any changes to the way your build process in authored.  However, there are still some cases where we are unable to make the best decision.  One such case is when there is a particular external tool which is invoked as part of the build but which takes a significant amount of time.  An example of such a tool would be cl.exe, the C++ compiler.  This article discusses how to use the new yield mechanism for external tools to improve the performance of your builds.

Tool Tasks

There are a few ways MSBuild can be made to execute external, command-line tools:

  1. Write a task which derives from ToolTask.
  2. Use the Exec task to call your command.
  3. Use the XamlTaskFactory.

All of these methods ultimately use the ToolTask class in Microsoft.Build.Utilities.v4.0.dll to handle executing a command-line task and deal with the output in the MSBuild way.  Like all tasks, however, they block any other work from happening in MSBuild while they are executing.  In cases where the task is very short, such as touching a log file or copying a file from one place to another this is perfectly acceptable.  But in the original example of invoking the C++ compiler, the amount of time MSBuild itself sits idle can be lengthy and in some cases it may be a significant impediment to good parallelization of your build.

The problem has to do with the way MSBuild utilizes its worker nodes.  Whenever a project is scheduled to be built, it is assigned to one of the worker nodes.  This node will then execute that project from start to finish, and will not accept more work until the project is either finished or the project makes an MSBuild call (for instance to satisfy a project-to-project reference.)  This is in large part because a node can only execute one task at a time, as tasks must be guaranteed their environment and current directory will not be modified during execution.

However, command-line tools do not execute in-process, and therefore their environment cannot be polluted by the running of additional tasks in parallel on the same node.  We can take advantage of this behavior to let the MSBuild node execute tasks in other projects while our long-running tool completes its work.  This is done using the YieldDuringToolExecution parameter.

YieldDuringToolExecution

In order to allow MSBuild to continue building other projects while a command-line tool in one project is running is simple.  Just set the YieldDuringToolExecution parameter to ‘True’ on your long running command-line tool.  This is a boolean parameter, so any valid MSBuild expression which resolves to a boolean value will work.  Here’s an example:

<PropertyGroup>
    <YieldDuringToolExecution>true</YieldDuringToolExecution>
</PropertyGroup>
<Exec CommandLine=”Sleep 10000” YieldDuringToolExecution=”$(YieldDuringToolExecution)”/>

When the Exec task executes, normally it would sleep for 10000 seconds during which no other work on the node can proceed.  However, with yielding enabled, the Sleep command will still run but the MSBuild node will be free to do other work.  Once the Sleep command is finished, the node will resume building the project which launched it as soon as the node is free to do so.

Whether or not you should enable yielding for your ToolTasks depends on what they do.  Generally speaking if the task runs for less than one second, it’s probably not worth it to enable this since there is a small cost to give up the MSBuild node.  However, for longer tools you may see some wins, and the wins will likely be larger the more complex your build is and the more long running tasks you have in it.  Again, large interdependent C++ builds are a great example of this and they benefit tremendously from yielding being applied to the compiler.  You can investigate your build’s performance using the Detailed Summary feature of MSBuild 4.

Yielding interacts well with the /m switch in MSBuild as well.  For instance, if you have specified /m:4 to enable parallelization, MSBuild will ensure that no more than four parallel things are going on at once, whether they be regularly building projects or yielding tools.  So enabling yielding will not cause your machine to become more overloaded.  Instead your builds are likely to improve their parallelization and make better use of available CPU and I/O cycles that they would otherwise. 

We have already enabled yield semantic for several tool tasks.  These include:

  • CL, the C++ compiler
  • MIDL, the IDL compiler
  • Link, the native linker – Only when the LinkTimeCodeGeneration metadata is set to UseLinkTimeCodeGeneration

It could also be enabled for the Vbc and Csc tasks since they are ToolTasks as well, but this support is not in the Microsoft.CSharp.targets and Microsoft.VisualBasic.targets shipped with .Net 4.0.  You could easily add them yourself if you wished.  More generally, if you include Microsoft.Common.targets the YieldDuringToolExecution property will be set to true unless it is overridden with the parameter /p:YieldDuringToolExecution=false being passed to MSBuild.  We will continue to use this property as the basis for selecting the tool parameter value of the same name.

Why isn’t it automatic?

Unfortunately for MSBuild 4 we didn’t get the opportunity to make this system as automatic as we would like.  In future versions we would like to automatically yield when ToolTasks are executing if they look like they will last longer than a certain threshold.  This will also work together with additional automatic improvements in build analysis and scheduling we have planned.

Cliff Hudson – MSBuild Developer

Comments (2)

  1. Example from XamlTaskFactory? says:

    I have a custom command line tool converted from vs 2008:

    .targets file:

    <UsingTask

       TaskName="VQPASSCOMP"

       TaskFactory="XamlTaskFactory"

       AssemblyName="Microsoft.Build.Tasks.v4.0">

       <Task>$(MSBuildThisFileDirectory)$(MSBuildThisFileName).xml</Task>

    </UsingTask>

    I can't get the yield to work with this.  I tried adding the YieldDuringToolExecution to the .targets file like so:

    .props file:

    <PropertyGroup>

       <YieldDuringToolExecution>true</YieldDuringToolExecution>

     </PropertyGroup>

    .targets file:

    <Target>

    ….

     <VQPASSCOMP

         Condition="'@(VQPASSCOMP)' != '' and '%(VQPASSCOMP.ExcludedFromBuild)' != 'true'"

         CommandLineTemplate="%(VQPASSCOMP.CommandLineTemplate)"

         InputsPP="%(VQPASSCOMP.InputsPP)"

         OutputShaderFile="%(VQPASSCOMP.OutputShaderFile)"

         OutputCmpXml="%(VQPASSCOMP.OutputCmpXml)"

         VQShaderCmpDir="%(VQPASSCOMP.VQShaderCmpDir)"

         AdditionalOptions="%(VQPASSCOMP.AdditionalOptions)"

         YieldDuringToolExecution="$(YieldDuringToolExecution)"

         Inputs="@(VQPASSCOMP)" />

    </Target>

    but that doesn't seem to work.  

  2. Roger says:

    Hi Michael,

    At HP within our lab we use the BuildProjectsInParallel capability of msbuild (4.0) to attempt to utilize our 24 core machines more fully and reduce a 2000 project 45 minute build to 15 minutes.

    However, we've encountered what I would label a defect in the msbuild implementation.  Specifically in the method Microsoft.Build.Backend.SchedulableRequest.DetectIndirectCircularDependency is a poor implementation that causes msbuild parallelism to not work well on large real world builds.

    MSBuild works great for 75 projects in parallel, however when 100 or more projects are thrown at msbuild (each with their internal DependsUponTargets which invoke msbuild on any of the 100 they depend on) msbuild begins to choke.

    In the sweet spot in terms of project count the build is able to complete an order of magnitude faster due to the parallelism, however with 150 projects it can take more than an order of magnitude longer than a non parallel build.

    Here's some data working with a set of 150 projects that are already built, leaving nothing to do but validate timestamps on each project output:

    – Time to build without /m on the command line: 27 seconds

    – Time to build with /m on the command line: over 10 minutes

    In fact analysis shows that almost all of the time is spent spinning inside the method mentioned above.  As the number of projects grows that method bogs down even further turning a 30 minute build into an overnight endeavor.

    MSBuild appears to have the potential to increase our lab's productivity drastically if only it could deliver on the parallelism its feature set advertises.