We have just completed our testing for TFS 2008 scalability and are ready to publish the final recommendations on server sizing and hardware configurations. If you want to compare this to the TFS 2005 recommendations, you will find them here.
Ultimately making capacity recommendations is a little like throwing darts at a board. The problem is that no two teams are the same. They use different processes, have different usage patterns, have different sized applications, are organized differently, etc. When we make estimates on things like how much load an average user puts on the system, we base that largely on what we observe in our own use of our internal TFS installation. It’s not perfect and it changes over time. If you read the details below, I’ll spell out all of the assumptions we made.
Quite a few things have changed since TFS 2005.
- Hardware has progressed and prices have changed.
- We’ve made an amazing number of performance improvements to TFS.
- We’ve reassessed the average amount of load that a user puts on the system.
- We’ve reevaluated the data size that teams of various sizes generate.
The net result though is that our recommendations, while more conservative, afford more users on similarly sized hardware.
Before I go into any gory detail, I’ll spell out the configurations we tested and the results we got.
There’s several things to note about this.
- There’s fewer configs than we published in TFS 2005 – We found that the extra 2 configs really didn’t add much value given the current hardware market.
- All of the user ratings are higher than for similar configs in TFS 2005 – as I said, we did a lot of performance work :). And these improvements are in spite of the fact that we raised the load per user significantly.
- The hardware configs don’t match – Unfortunately in the intervenening 2 years (almost), we’ve had changes in the hardware in our lab and this is what we had available. You will note that we generally increased the memory recommendations and that’s based on our experience over the past couple of years.
- We added a TFS proxy for the higher end configurations – Proxies offload some of the download activity from the TFS server. The performance benefit isn’t huge but many of our larger installations use them so we’ve added them to the mix.
How we arrived at the recommendations
For a good background on the general approach we use to determine TFS’s scaling abilities, read http://blogs.msdn.com/bharry/archive/2005/10/24/how-many-users-will-your-team-foundation-server-support.aspx. While the numbers in that post are out of date, the methodology is still accurate.
Load per user
The biggest change between TFS 2005 and TFS 2008 is that we changed the assumption for the amount of load an average user puts on TFS. We measure this on our own DevDiv TFS server by looking at load patterns and dividing by the number of “active” users. When we shipped TFS 2005, an average user in DevDiv used approximately 0.1 requests per second (in other words, an average of 1 request every 10 seconds during peak usage hours). That number has gone up quite a bit in the intervening year and a half or so. Why? Well it’s hard to know for sure but I can speculate on a few things.
- We’ve moved to a much more branch intensive development methodology. Every feature is now developed in a separate branch and merged when it is done. This has yielded quite a lot more activity around creating, deleting and merging branches.
- There are more automated tools built for TFS now. TFS is used much more widely now and many more processes and add-on tools have been developed around it. Automated tools often put substantially more load on the system than people do.
The end result is that we are now using 0.15 requests per second per user. That’s a 50% increase over the number that we used to compute TFS 2005 capacity. So just to maintain the same user recommendation, TFS 2008 has to be 50% faster on the same hardware.
Another key change is that we’ve reassessed the amount of data that corresponds to various team sizes. We’ve done a survey of usage by different teams to determine how big their databases are on average. The result, in some cases, is almost a 10X increase in the size of the databases we tested with. This also, of course, causes TFS to have to work harder to accomplish the same throughput on the same hardware. Here are the sizes we used for TFS 2008:
These numbers are based on teams at the higher end of each range. They are also based on the amount of data accrued over about a 2 year period. Of course all teams are different and your numbers may be higher or lower but at least you know what assumptions we used.
An example of how these data size assumptions affect the performance of TFS. Look at the Avg workspace size column. This is the number of files that users typically work with on teams of that size. When our load testing simulates a version control “get” operation, it is getting that many files. So a get on a 3,600 person team is a 20 times larger operation than a get on a 250 person team.
The last substantial change we made was to the hardware configurations. Some of this was deliberate – for example, we decided to start officially including 8 proc data tier numbers because, with the advent of multi-core machines (particularly quad core), an 8 proc machine is no longer an outrageously expensive machine. In fact the 8P machine we tested on was actually a quad core dual proc machine.
As I mentioned above, we also added TFS proxies to the two larger configs. We did this because many of our larger customers use proxies and we use them internally quite a lot. In fact, we’ve set up proxies even on the same LAN for our highest demand users. For example, our build lab has its own proxy because it does approximately 75 full gets of a several million file tree every day. It probably adds up to 3 or 4 million file downloads a day. In our simulation, we configured half of the users to use the proxy. This doesn’t actually mean that half of their load went to the proxy because it only handles downloads. Downloads are comparatively inexpensive and all other load goes straight to the TFS server.
Some of it was not deliberate. The hardware availability in our lab changes and the drive arrays and machines we used last time had been used for something else. So we picked machines that were generally close to what we tested last time. The only thing I regret is that we didn’t have higher performance drive arrays to test. The 3,600 user configuration should have been a SAN and the 2,200 user configuration should have at least been a SCSI array instead of a SATA2. I suspect the differences wouldn’t have been huge but the higher capability I/O systems would have provided better performance and been more realistic to what someone would use in a production environment.
The end result is that our hardware configurations for TFS 2008 allow for more users for similar hardware than our recommendations for TFS 2005. Our recommendations are based on a substantially more conservative estimate of how much load a user puts on the system. I’d estimate that between the increased request load, increased data size, etc, the estimates for TFS 2008 assume about double the load per user.
TFS 2008 is more than twice as fast as TFS 2005 and can support extremely large teams. Of course, even larger teams can deploy multiple servers and scale to any size they need.
I’m interested in your own stories about your experience with TFS 2008 performance if you have them. Please feel free to share.