We’ve recently been having some problems with client hangs in our dogfood TFS environment. It has taken quite a while and quite a long breadcrumb trail to diagnose the problem. I thought I’d share it with it in case you have seen similar things or want to take preventative steps against it.
As I said, it usually manifests itself as a hang (most often while doing a “Get”). It can also manifest itself as a “server temporarily unavailable” error message. The problem turns out to be a couple of related bugs in IIS where connections are not properly closed during a process recycle. This can cause the client to hang waiting for a response from the server. The good news is that both of the issues have been fixed in Win2K3 SP2. I haven’t extensively tested SP2 with TFS yet but I believe it will work well and it will address this issue.
Most people won’t notice these issues much if at all because they only happen with process recycles. We’ve bit hitting an unusual number of them lately. We are working on getting the build lab up on TFS. It does about 100 simultaneous gets of about 1 million files. Right now this causes the server to run out of memory and the process to recycle. That’s been causing this bug on and off for the past few weeks. As I say the bug fix is in Win2K3 SP2 and we’re just getting ready to checkin a solution to enabling the server to handle very large numbers of simultanous gets.
So, the short of it is, you should be thinking about upgrading to Win2K3 SP2 in the near future.