Stop piling on when the build breaks: Build checkin policy for Continuous Integration in Orcas

Last fall, Clark Sell wrote a blog post called Stop, the build is broken!! that introduced a checkin policy that reported errors when the build was broken.  If you are using continuous integration where every checkin starts a build, you want folks to stop and fix build breaks when they occur, rather than pile on more checkins and perhaps make the problem worse (or at least harder to sort out).

Since we’ve added support for continuous integration in Team Build for Orcas (screencasts, demo), we thought that it was a really great idea, and we’ve added a simple checkin policy in Orcas Team Build that does this (it will be in beta 1, but it is not in the March Orcas CTP).  It works differently than his does (he had was constrained to what v1 had to offer), and it only works with Orcas clients (not with TFS 2005 clients, which would see an error message about not finding the checkin policy).

Here’s what the policy does.

  1. Request from the server a list of build definitions affected by this check in

  2. For each build definition returned where the last build was not “good,” create a checkin policy error message containing the build definition’s name and the user that triggered the build.

If the policy detects a broken CI build, you’ll get a message like the following when you attempt to check in.

The last build of definition WebProjects_SimpleWebService, triggered by user buck, failed.

A “good” build is one where compilation and testing were successful.  If something goes wrong after the test phase, it’s still considered a good build.  This notion of a good build is the same as it was in v1, and it has some shortcomings.  We’re going to refine it and make it more flexible in the release after Orcas.

There’s nothing to configure for this checkin policy, so you aren’t stuck with maintaining a list of build definitions for the checkin policy to monitor.  The first step calls the same code on the server that is used by the continuous integration feature.  Based on the list of pending changes’ server paths involved in the checkin and the workspace mappings for each of the build definitions, the server is able to quickly determine which build definitions are affected by your changes.  It’s all automatic!

See Walkthrough: Customizing Check-in Policies and Notes for how to enable a checkin policy for a team project.

We’re interested in your feedback, so post a comment and let us know what you think.

Low-level details

If you want to see how it works and see a little of the new Orcas Build API, I’ll explain the details of how it works.  If you aren’t interested in the low-level details, feel free to skip this.

Here is all of the code that isn’t just “boilerplate” checkin policy code.

To prevent being called repeatedly in a short time span, it uses a timer to ensure that a minimum of 10 seconds elapse between calls.  There’s nothing special about 10 seconds, and we may even lengthen it to a minute.  The important part is that since this policy makes at least one web service call, it needs to make sure being evaluated often doesn’t cause too many web service calls and present a performance problem.

The first thing that the policy’s Evaluate() method does is get a reference to the central object in the Orcas Team Build API, IBuildServer.  Next it gets the list of pending changes that are going to be checked in.

Then it calls GetAffectedBuildDefinitions(), which does what I described in step 1 earlier.  It’s a new web service method on the Orcas server that determines which build definitions are affected by changes to a list of server paths.  Having the workspace mappings for the build definitions stored in the Orcas database, rather than in the old WorkspaceMapping.xml file, is what makes this and continuous integration efficient and automatic.  Otherwise, you’d have to manually specify what paths affect each build definition, which would be a maintenance headache.

After getting the affected build definitions, it checks to see if the artifact URI for the last build is the same as the artifact URI for the last good build.  If those are set to the same URI, the last build was good.  Otherwise, the most recent build was not a good build.  Here we also check to see whether the build type is a continuous integration build, either every checkin (Individual) or a set of checkins over some time period (Batch).

If we have any broken builds, we need to get the details for the build so that we can report who may have broken the build.  For CI builds where it is building each checkin individually, it is really the person that broke the build (assuming this is the first broken build).  For CI builds where it’s building the checkins from a period of time, such as the last 30 minutes, it might be the person who broke the build or it may not, since more than one person may have checked in.  Regardless, that’s a good person to start with when investigating the broken build.

        public override void Initialize(IPendingCheckin pendingCheckin)

m_timer = new Stopwatch();

        public override PolicyFailure[] Evaluate()
if (Disposed)
throw new ObjectDisposedException(null);

IBuildServer buildServer = (IBuildServer) PendingCheckin.GetService(typeof(IBuildServer));

// If there are any pending changes, determine whether there build definitions that are
// affected for which the last build was not a good build. Make sure that we don’t call
// this rapidly in succession.
List<PolicyFailure> failures = new List<PolicyFailure>();
PendingChange[] pendingChanges = PendingCheckin.PendingChanges.CheckedPendingChanges;
if (pendingChanges.Length > 0 &&
(!m_timer.IsRunning || m_timer.ElapsedMilliseconds >= 10000))
IBuildDefinition[] definitions = buildServer.GetAffectedBuildDefinitions(

List<Uri> brokenBuilds = new List<Uri>();
List<IBuildDefinition> brokenBuildDefs = new List<IBuildDefinition>();
foreach (IBuildDefinition definition in definitions)
// Since this policy is geared toward folks using continuous integration, only fail for build
// definitions that have CI trigger.
if (definition.LastBuildUri != definition.LastGoodBuildUri &&
(definition.ContinuousIntegrationType == ContinuousIntegrationType.Batch ||
definition.ContinuousIntegrationType == ContinuousIntegrationType.Individual))

if (brokenBuilds.Count > 0)
// Look up the broken builds to see who triggered them.
IBuildDetail[] buildDetails = buildServer.QueryBuildsByUri(brokenBuilds.ToArray(), null,

// Create a failure for each broken build, skipping any build that wasn’t returned due to
// insufficient permissions or being deleted.
for (int i = 0; i < buildDetails.Length; i++)
if (buildDetails[i] != null)
String requestedFor = UserNameUtil.MakePartial(buildDetails[i].RequestedFor,
failures.Add(new PolicyFailure(ResourceStrings.Format(ResourceStrings.BuildPolicyBuildBroken,
requestedFor), this));


return failures.ToArray();

private Stopwatch m_timer;

Comments (9)

  1. Another approach is to make build breakage rare using TeamCity’s "delayed commit" feature:

    A VS add-in fires off "personal builds" for build configurations affected by your changes; if all builds (and tests) pass, your changeset gets checked in. In the meantime, you get to work on the next thing on your list, and your own machine is not bogged down by the extra verification.

    This is optional, of course, though one could devise a policy to enforce its use, e.g. for changes in specific TFS subtrees that are heavily re-used.

  2. I liked this idea when Clark discussed it last year, and I like it more now that TeamBuild is better able to support this type of working.  

    The thing that makes this nice is that check-in policies can be deliberately over-ridden when required.  It’s like the difference between a "Are you sure?" dialog and a "Error: Invalid" dialog.  People very often forget this.

    The key is that it makes people think about what they are doing before they check-in on top of a broke build, but it doesn’t completely prevent it (i.e. when they want to fix that build).

    Few comments (as usual feel free to ignore)

    1) You should def keep the result cache timer small – I think 1 minute may be pushing it – definitely not more than that.  Around about 30 seconds would be fine with me.  When the build is broke there is always a sense of urgency to get it fixed so that you are no longer blocking the team.  When the team hears the Homer Simpson like "who-ho" when the build has been fixed you can guarantee that sometimes someone will be poised with their finger over the "Check-in" button waiting for the build to go green.  Getting a policy failure after you know the build has gone green would be annoying.  The reverse is also true – I would prefer to be warned about attempting a check-in over a broke build than finding out that it was broke before I made my check-in – especially as I got used to this policy being in existence and began to rely on it to help me prevent this.

    2) There are times when I would be fine with a "dumber" version of the check-in policy that you can configure to point at a specific build .  Often there is only 1 CI build for a Team Project, therefore only a simply check to be made – reducing the server calls and workload during a policy evaluation.  Obviously there are times when things are more complicated than that and your policy covers both scenarios.

    3)  Would be good to adopt a convention when over-riding the policy to use a standard(ish) comment when attempting to fix the broken build.  Just to make it easier to figure out what is going on when looking at your policy violations.

    Getting very excited about Orcas TeamBuild :-)


  3. buckh says:

    Gunnlaugur, that is certainly the approach you would want to take if you wanted to do everything possible to prevent build breaks from occurring.  The tradeoff hinges on an organization’s view and handling of build breaks.

    We’ve certainly discussed implementing this type of approach, and there are internal teams that use this apporoach.  It’s on our list to consider for the next release.


  4. buckh says:

    Martin, thanks for the feedback!


  5. Buck Hodges says:

    The documentation for the team build 2008 object model is now available as a CHM file. Later this year,

  6. The documentation for the team build 2008 object model is now available as a CHM file. Later this year

  7. At code camp i mentioned build as one of the areas in TeamSystem with the most enhancements was Build-

  8. Buck Hodges says:

    A year ago I wrote a blog post about a new check-in policy that we added in build for TFS 2008 to prevent

  9. Zazzar KIMO says:

    Hi Buck,
    There will not be any policy warnings if the last vNext build failed for now. The build check in policy has been very useful to remind people last CI build was failed, do you know if any plan to have this available for vNext build? Or as I know, usually TFS client API is not work with vNext build. Could we use this API for vNext build by creating customize check in policy?

    Any method you suggest to achieve this ? Thanks!