TFS 2010: Faster Delivery of Notifications

In my previous blog post I described the job service, new for TFS 2010. There's an overhead in queuing a job, having the job agent acquire it, creating an instance of the associated plugin and releasing the job. To make things more efficient we delay the notification job and batch the notifications. (See the description of QueueDelayedJob in the previous post.) While your server is under high usage the notification job is running every two minutes (by default), sending emails and making SOAP calls. When your server goes idle the notification job stops getting delay-queued.

 

This delay is new to TFS 2010. TFS 2008 sent emails and made SOAP calls immediately. So if you're simply wondering "is something wrong with my system? Is it running slowly?" - don't worry, a two minute delay is expected. If however you rely on immediate notifications and find a two minute delay unacceptable, read on.

 

The notification delay can be set in the TF registry. Here's how you can set it to 30 seconds using Powershell:

 

[Reflection.Assembly]::Load("Microsoft.TeamFoundation.Client, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a")

 

# Modify the TFS configuration server URL as necessary.

$configServer = new-object Microsoft.TeamFoundation.Client.TfsConfigurationServer "https://localhost:8080/tfs/"

 

# Get the TF registry service.

$tfsRegService = $configServer.GetService([Microsoft.TeamFoundation.Framework.Client.ITeamFoundationRegistry])

 

# Set the notification delay to 30 seconds. All collections will use this delay unless they override this value in the collection hive.

$tfsRegService.SetValue("/Service/Integration/Settings/NotificationJobDelay", 30)

 

You can take the notification delay lower to that. You can take it to 15 seconds or even zero seconds if that's necessary. I'm not warning that you shouldn't make this change, but you are making your system slightly less efficient. If you have several user actions a second (checkins, work item changes, etc) you may want a delay of at least a few seconds. That way the notification job will find several notifications to deliver each time it runs and there's less overhead per user action.

 

There are some TF registry settings our server OM monitors for changes. This is not one of them. You will need to restart the TFS WebApp to reread this setting. You can do an iisreset or recycle the Microsoft Team Foundation Server Application Pool in IIS Manager. (Touching the web.config also seems to work in practice.)

 

There's a related setting to mention:  /Configuration/JobService/DefaultDelayedJobDelay. If NotificationJobDelay isn't set, DefaultDelayedJobDelay's value is used. If DefaultDelayedJobDelay isn't set, it defaults to 120 seconds. (Yes there is a default default delayed-job delay value. I know it’s a horrible name, but that's what it is!)

 

We see DefaultDelayedJobDelay as big dial that you tune to your environment. If you need jobs to run immediately, you crank it closer to zero. If you need to handle lots of user activity on low-end hardware, you crank it higher so that more work gets batched and jobs run less often. But we know administrators will need fine-grained control over some jobs, hence optional settings like NotificationJobDelay.

 

 Note that you can set NotificationJobDelay or DefaultDelayedJobDelay at the team project collection-level or the configuration server-level. Our code first attempts to read the setting at the collection hive. It falls back to checking the configuration hive. Here's the order in which we check for the notification job delay:

  1. NotificationJobDelay at the collection-level. (Assuming you're dealing with a collection-level action like a checkin or a work item modification.)
  2. NotificationJobDelay at the configuration server-level.
  3. DefaultDelayedJobDelay at the collection-level. (Assuming you're dealing with a collection-level action like a checkin or a work item modification.)
  4. DefaultDelayedJobDelay at the configuration server-level.
  5. Default to 120 seconds.