Dogfooding Team Foundation Build: Infrastructure


One of the questions I received about my last post was: Why so many controllers? Which leads nicely into a description of the different pieces of hardware involved in our build process.

All of our build infrastructure runs Windows Server 2008 R2 (x64).

When a build is queued using Team Build it’s queued against a build controller. Each of our build controllers falls into one of these categories:

  • Official – used for running nightly official builds for branches.
  • Drop – used for dropping outputs from official builds.
  • Unofficial – used for running gated check-in and unofficial nightly builds.
  • Staging – used for integrating and testing changes to our build process.
  • Development – used for developing and testing changes to our build process.
  • Backup – pre-configured controllers that can be swapped in to replace a failed controller.
  • Unassigned – contains agents that haven’t yet been assigned to another controller. This allows us to run processes against the agents (such as onboarding, patching, etc.).

Our build controllers run on modest hardware (dual processor, 4 Gb RAM) and we typically find that build controllers are RAM bound not IO or disk bound (because their primary role is co-ordination of activities across the build agents).

Once a build has been queued on a build controller we allocate the resources needed for the build, including:

  • Storage Area Network (SAN) – we select a SAN based on available space in a round-robin fashion (to evenly spread load)
  • Drop Server – drop servers have a VDisk attached to them that the build’s outputs are dropped to, we use DFS to make all of the drops available from a single location
  • Build Machines – one per architecture/flavor

These resources are then prepared for the build, which means:

  • Carving a VDisk of the appropriate size and attaching it to the drop server.
  • Reimaging each of the build machines.

As mentioned above each build consumes one machine per architecture/flavor (e.g. x86 Retail) as well as a drop server. Build machines vary greatly in hardware spec with some being physical machines (up to about 8 processors with 8 Gb of RAM) and others being virtual machines. We group machines into categories based on their hardware specifications and this is used to allocate the most appropriate machines for each branch and architecture/flavor (some architecture/flavors require much fewer resources than others).

Because of the size of the source code we don’t do a full sync of it for each build (it lives on a separate drive which is not wiped by the reimaging process), instead, we scorch this drive and do an incremental sync. In situations where a build machine with a workspace for our branch isn’t available we’ll actually remap an existing workspace rather than deleting the workspace and doing a full sync.


Comments (5)

  1. BHardister says:

    Hi William,

    Can you provide a controller spec guideline for some stair-step number of build agents? Like a build controller needs x RAM for up to [#] agents and x RAM for up to [>#] agents.

    Thanks!

  2. willbar says:

    Hi Bob,

    This is quite dependent on the build process template you use since it depends on the size of that workflow in memory as well as it's state, objects it creates, as well as the amount of logging it does. One way to determine this would be to look at how much memory Team Build uses when idle and then measure the memory use throughout an end-to-end build. The difference in peak memory usage should be roughly reflective of the amount of RAM needed per concurrently running build plus a small overhead. Of course, in addition to this you'd need to cater for any RAM required by the OS and any other services running on the build controller.

    Thanks,

    William

  3. Gert Christiansen says:

    Hi William,

    Can you talk about how you do the Reimaging each of the build machines, and how you have plugged that into the build proces.

    It's a great blog you have going here.

    Cheers,

    /Gert

  4. Gert Christiansen says:

    Hi William,

    Can you talk about how you do the Reimaging each of the build machines, and how you have plugged that into the build proces.

    It's a great blog you have going here.

    Cheers,

    /Gert

  5. Allen Feinberg says:

    Would love to learn more about how you scorch the drive and do an incremental sync. Would be cool if you could share a code snippet.