Determining the amount of space required to store files and work items

Buck Hodges

Recently someone asked how to estimate the space required for the databases on a Team Foundation Server.  To do that, we can take a look at our own dogfood server to see where the space goes.  With that information and some knowledge of how the system works, we can come up with a useful way to estimate the approximate space required by the TFS databases.

Version Control

If we start with the version control database, we currently have 1.7 M files and folders, so that works out to about 77 KB per item (size of the version control database divided by the number of items).  That’s averaged over all of the content for all of the versions for every item ever checked into dogfood (starting point is the import of code on Dec. 10, 2004) and all of the workspace information.  You can use sp_spaceused in a SQL query window on your data tier to calculate numbers for your own server.

If we look at just the file content table alone (tbl_Content), it’s 60 KB per item (only about 10% of the items are folders, so we’ll stick with round numbers).  That leaves approximately 17 KB per item.  That 17 KB is dominated by the local version table (tbl_LocalVersion), which records what items and versions users have in their workspaces and where those items reside on the local disks.  The local version table accounts for 15 KB of that 17 KB (again taking the data and index sizes and dividing by the total number of versioned items in the repository).  So, the growth of tbl_Content and tbl_LocalVersion will account for nearly all of the growth of the version control database.

There are 1,780 workspaces and about 1.7 M files and folders on our dogfood system.  To get those statistics for your own server, you can use the following URL, replacing “apptiername” with the name of your TFS server (application server).

TFS 2005 and 2008:

http://apptiername:8080/VersionControl/v1.0/administration.asmx/QueryRepositoryInformation

 

For TFS 2010 and newer you must also specify the name of the team project collection (replace CollectionName):

http://apptiername:8080/tfs/CollectionName/VersionControl/v1.0/Administration.asmx?op=QueryRepositoryInformation

The response will be an XML response that looks something like the following.  The QueryRepositoryInformation web service method is part of the version control administration web service.  You can more information about in the blog post, TFS Source Control administration web service.

 

  <?xml version=”1.0″ encoding=”utf-8″ ?>
  <AdminRepositoryInfo xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance xmlns:xsd=”http://www.w3.org/2001/XMLSchema UserCount=”88 GroupCount=”9 WorkspaceCount=”18 ShelvesetCount=”29 FileCount=”151 FolderCount=”18 MaxChangesetID=”33 PendingChangeCount=”13 xmlns=”http://schemas.microsoft.com/TeamFoundation/2005/06/VersionControl/Admin/03 />

For stats on our own dogfood server, check out John Lawrence’s latest post (the numbers I used are based on data from the dogfood server at the time of this writing).

Not every file and foler in the system is in every workspace.  If you take the number of rows in the local version table and divide it by the number of workspaces, you can determine the average number of files and folders in each workspace.  On our dogfood server, that works out to 27,536.

The growth per item added to version control will depend on how many workspaces have it mapped (that’s the local version table growth) and the compressed size of each version of the item (that’s the content table).  Note that content size is further reduced because TFS stores compressed “deltas” where old versions are just the differences from later versions.

Each new row in the local version table adds about 520 bytes (one row gets added for each workspace that gets the newly added item, and the size is dominated by the local path column).  If you have 100 workspaces that get the newly added item, the database will grow by 52 KB.  If you add 1,000 new files of average size (mix of source files, binaries, images, etc.) and have 100 workspaces get them, the version control database grows by approximately 112 MB (60 KB * 1,000 + 520 * 1,000 * 100).

Work Item Tracking

There are two work item databases, one for work items and one for work item attachments.  The work item attachment database on our dogfood server has lots of screenshots and log files that are associated with the work items, so it’s fairly large, per item.  So, if you average it out as above, each work item uses 163 KB on average, including attachment space.  Attachment space is about 75% of the average per work item.

Space Estimation

Given the analysis above, it’s easy to come with a formula to estimate the database space require as more items are added.

# of files/folders * avg. content size + 520 * # of workspaces * avg. # of files in workspace + # of work items * average size of work item

Substituting the averages from our dogfood server it, we get the following.

# of files/folders * 60 KB + 520 * # of workspaces * avg. # of files in workspace + # of work items * 163 KB

Substituting the current totals into the formula yields the following.

1.7 M * 60 KB + 520 * 1,780 * 27,536 + 67,460 * 163 KB = 138.4 GB

The actual sum of the dogfood database sizes is 173.7 GB, which includes free space in the database and additional areas, such as the warehouse, that can’t be easily counted by using the number of files, workspaces, and work items.  Dividing the actual size by the estimated size yields a factor 1.25 to account for other areas not included in the formula.

Conclusion

The final space estimation formula is as follows, providing an simple way to determine approximately how much space will be required as a system grows.

1.25 * (# of files/folders * avg. content size + 520 * # of workspaces * avg. # of files in workspace + # of work items * average size of work item)

Brian Harry read an early version of this post and provided valuable feedback along with work item analysis and overall space usage.

[UPDATE 8/30/07]  I’ve attached a spreadsheet that someone put together that is based on the final formula.

[UPDATE 8/27/2010]  I’ve added the URL to get to the info for TFS 2010 and newer.  For customers using SQL 2008 Enterprise or newer, TFS uses SQL data compression, so the version control data size will be somewhat smaller than predicted here.

CapacityPlanning.xls

0 comments

Leave a comment

Feedback usabilla icon