TFS Reboot Order

Here are some steps you may want to take to around planning downtime for your Team Foundation Server h/w.

Proxy – can be rebooted at any time.  Users will pass through to the App Tier as long as the proxy is offline.

Build – Mark the build agents associated with this machine as “unavailable,” make sure no builds are in progress, then reboot.  After reboot mark the build agents to active.

App/Data Tier – if either of these are offline users will get unexpected behavior.  The error messages in TFS can be cryptic so its best to schedule downtime for these boxes.  As you approach the downtime be sure to “drain” the build machines by setting all build agents to “unavailable.”  Once in your scheduled downtime window you want to make sure you have not builds running.  Reboot the AT and or DT during your scheduled downtime window. 

I’d recommend the only required notification be the AT/DT downtime.  1 week, 2 day, 1 day from reboot is good policy.  Alternatively, some teams will just budget a 2 hour window every Sat or Sun for maintenance and just advertise that the TFS h/w may be offline during this time.  You do not have to use the time but it gets the org “trained” to know that the ops org uses that time for work. 

You can run TFS h/w with multiple 9’s of uptime if needed.  However, in practice, scheduling downtime make the operational tasks less expense since you have a fixed time each week you can do your work instead of treating downtime as a special event that required huge planning (and expense).