Dogfooding Team Foundation Build: Infrastructure

One of the questions I received about my last post was: Why so many controllers? Which leads nicely into a description of the different pieces of hardware involved in our build process.

All of our build infrastructure runs Windows Server 2008 R2 (x64).

When a build is queued using Team Build it’s queued against a build controller. Each of our build controllers falls into one of these categories:

  • Official – used for running nightly official builds for branches.
  • Drop – used for dropping outputs from official builds.
  • Unofficial – used for running gated check-in and unofficial nightly builds.
  • Staging – used for integrating and testing changes to our build process.
  • Development – used for developing and testing changes to our build process.
  • Backup – pre-configured controllers that can be swapped in to replace a failed controller.
  • Unassigned – contains agents that haven’t yet been assigned to another controller. This allows us to run processes against the agents (such as onboarding, patching, etc.).

Our build controllers run on modest hardware (dual processor, 4 Gb RAM) and we typically find that build controllers are RAM bound not IO or disk bound (because their primary role is co-ordination of activities across the build agents).

Once a build has been queued on a build controller we allocate the resources needed for the build, including:

  • Storage Area Network (SAN) – we select a SAN based on available space in a round-robin fashion (to evenly spread load)
  • Drop Server – drop servers have a VDisk attached to them that the build’s outputs are dropped to, we use DFS to make all of the drops available from a single location
  • Build Machines – one per architecture/flavor

These resources are then prepared for the build, which means:

  • Carving a VDisk of the appropriate size and attaching it to the drop server.
  • Reimaging each of the build machines.

As mentioned above each build consumes one machine per architecture/flavor (e.g. x86 Retail) as well as a drop server. Build machines vary greatly in hardware spec with some being physical machines (up to about 8 processors with 8 Gb of RAM) and others being virtual machines. We group machines into categories based on their hardware specifications and this is used to allocate the most appropriate machines for each branch and architecture/flavor (some architecture/flavors require much fewer resources than others).

Because of the size of the source code we don’t do a full sync of it for each build (it lives on a separate drive which is not wiped by the reimaging process), instead, we scorch this drive and do an incremental sync. In situations where a build machine with a workspace for our branch isn’t available we’ll actually remap an existing workspace rather than deleting the workspace and doing a full sync.