TFS Performance. Episode 1 - The Phantom Baseline

Here are some words of wisdom from our “go to” Engineer for TFS:

My name is Brad Peterson and I am an Escalation Engineer from the TFS support team. My office is in Issaquah Washington. This is my very first of hopefully many blog posts!

Team Foundation Server performance comes up frequently around the support water cooler, and the first question that comes up is “What is good performance?” and then “What is bad performance?” It isn’t really a question that can be answered quantitatively in my mind because there are so many factors involved. Think about how many variables are involved. Here are a few just off the top of my head for the Application Tier:

1. Application Tier Processor Speed

2. Application Tier Memory

3. Other applications running on the Application Tier, including AntiVirus

4. Network Card performance

5. Network Performance

6. Disk Performance

7. Virtualization

8. Etc….

With just the factors above, it becomes impossible to quantify what a server should be capable of. Brian Harry has posted hardware guidelines that relates a hardware configuration to a number of users. That is a clear place to start in evaluating whether you should achieve acceptable performance. So, what is acceptable performance?

Often when customers call Support about performance they have no idea whether the server is really slow, they usually only know that a user has complained about a particular action, or the nebulous “it just seems slower today”. I think a key first step to take is to establish a performance baseline. You can get a pretty good set of numbers to use for comparison purposes by looking at the TFSActivityLogging database.

(In TFS 2008, the install sets command logging (into the TfsActivityLogging database) on by default. In TFS 2005, command logging is turned off by default. To turn it on, see the topic here: Global Web.Config File Settings in Team Foundation Server Components )

Note: There a SQL job that truncates this database (TfsActivityLogging Administration Job). It defaults to 2 weeks worth of data, but if you want more or less, you can edit the job and change the @maxAgeDays parameter passed to prc_PruneCommands to change the number of days that you want to store data for. This database can grow pretty quickly so make sure you have sufficient disk space. You do not want to EVER run out of disk space on the drive storing SQL databases.

There are a couple of things to keep in mind when looking at data in this database. The data here is length of time that the web method took to run. With that in mind, we do not include the time from the client to the Application Tier. Here are some quick notes on the fields in the table tbl_Command.

StartTime

The StartTime field is the datetime from the server in UTC time. For more than you would ever want to know about UTC, check out:

<en.wikipedia.org/wiki/UTC>

If you want to know what the current UTC time is (to figure the number of hours to adjust):

How to convert UTC time to local time

ExecutionTime

The ExecutionTime field stores the duration in Microseconds, so to convert the ExecutionTime to seconds divide by 1,000,000

ExecutionCount

First of all, this field does not exist in TFS 2005. For a given Web Method, it may execute multiple times for this one call. A couple of examples are the ReadIdentity call. This is done along with the hourly sync and would be called for each user. The “Get” command is generally set to 1 but there will be a corresponding set of “Download” commands with ExecutionCount counts relating the number of files downloaded. With TFS 2005, you will see many rows for Download (one for each file).

IdentityName

Who was “running” the command.

IPAddress

This may be useful but often companies may have Proxies and other devices that force traffic through a device and the ipaddress reported is the same for many users. Think reverse proxy that traffic goes through when traffic is coming from outside the firewall.

UserAgent

One of my favorites (but not for performance purposes). It usually has the name of the application and version of that application that is accessing TFS. Useful for questions such as “Has everybody upgraded to SP1?” or “What applications are accessing my server”.

Command

What command was executed.

Application

The TFS application area is the relevant web method call coming from.

Status

Success of the call. Value is 0 if successful and -1 if not successful. Don’t get too excited, a value of -1 is pretty common and usually not that interesting. To get more information about it look at the tbl_Parameter and look at the related records based on CommandID.

Setting a baseline.

OK, so what query should I run to get a baseline for performance?

As with any database, there is a countless number of interesting queries you could run.

Grant Holliday and has done a lot of work in this area:

TFS Performance & Excel 2007 Heat Map « Grant Holliday

Announcing TFS Performance Report Pack

If I was to want one query that would give me a good beginning breakdown of performance on a server, I start here (Excuse my ugly sql) to get a general idea of the past 2 weeks (TFS 2008):

Select Application, SUM(ExecutionCount) AS Total_Executions,

cast(cast(sum(ExecutionTime)/1000000 as decimal(10,2))/SUM(ExecutionCount) as decimal(10,2) ) as avg_sec_per_exec

from tbl_Command nolock group by Application

The results of this query give you a starting point to look at for a general baseline.

What should the numbers returned look like? They vary quite a bit. The avg_sec_per_exec we have seen for servers that are running acceptably are under a second for all except the warehouse which is a little more than a second.

In future episodes, I will try to illustrate how you can drill down in your server from the the baseline information to help you troubleshoot your performance questions.