Build agent is not able to establish connection with team foundation server and fails with “Unable to connect to the remote server” error.

Today I observed an interesting problem on one of our customer setup and thought that it will be good to share with you all so that if you encounter it, you don’t have to spend lot of time figuring out what’s going on. The setup was like this: -

1. On a virtual environment, testing and workflow capabilities are enabled.

2. Testing capability is in ready state, but workflow capability is not.

3. Workflow capability remains in configuring state for around 10-15 minutes with machine level message saying the following.

TF266001: Team Foundation Server is configuring the build agent. The build service is attempting to connect to the following Team Foundation Server application-tier: https://aseemb-tfs10.mydomain.com:8080/tfs/DefaultCollection;

4. Later on workflow capability goes to Error state with machine level error message says the following.

The build service is unable to connect to the following Team Foundation Server application-tier: https://aseemb-tfs10.mydomain.com:8080/tfs/DefaultCollection . Exception type: System.Net.WebException. Exception message: Unable to connect to the remote server

Since the error message says “unable to connect to remove server”, it is quite evident that the build agent on the virtual machine is not able to connect to the team foundation server. So to understand the problem, we did the following.

1. Tried pinging the team foundation server machine from within the virtual machine, but it seemed to work.

ping aseemb-tfs10

2. We thought that since build agent tries to connect to team foundation server using FQDN (aseemb-tfs10.mydomain.com), so may be the machine is not able to resolve the FQDN. So we tried pinging the FQDN from within the virtual machine, but that one also seemed to work.

ping aseemb-tfs10.mydomain.com

3. We have seen that sometimes virtual machine is able to resolve the target machine name (thus ping succeeds) but still the connection attempts fails due to firewall, IPSec problems etc. In those cases telnet connection from the machine to the target machine fails. So we tried doing a telnet (on windows 7/vista/win2k8 machines, telnet is not installed by default and you have to install it by turning on/off windows features as mentioned here)  from within the virtual machine to the team foundation server machine, but that one also worked. :(

telnet aseemb-tfs10.mydomain.com 8080

So there is no machine name resolution issue, no firewall issue but still the build agent is not able to connect to team foundation server.

4. We looked at the ‘application’ event logs on the virtual machine with source ‘TfsBuildServiceHost’ and observed that there are lot of warnings with message saying similar to what was observed in the machine level error message mentioned earlier.

The build service is unable to connect to the following Team Foundation Server application-tier: https://aseemb-tfs10.mydomain.com:8080/tfs/DefaultCollection . Exception type: System.Net.WebException. Exception message: Unable to connect to the remote server

5. On the virtual machine, we opened an instance of IE and tried to open one of the web service hosted by the team foundation server to check whether it is a generic connectivity issue or something specific to build agent.

On IE address bar, tried opening this url https://aseemb-tfs10.mydomain.com:8080/tfs/DefaultCollection/Services/v3.0/LocationService.asmx. IE failed to browse this service which meant that any application on this virtual machine is not able to make a http connection to the team foundation server and the problem is not specific to build agent.

6. Now we looked at the IE proxy settings and checked whether they are enabled or not. (Tools -> Internet Options -> Connections -> LAN Settings -> Use a proxy server for your LAN (Check whether this setting is enabled or not)). In our case it was enabled, so we checked whether for http connection these settings are correct or not (Advanced –> Http). And guess what, they were not correct and were pointing to some non-existent proxy server. We disabled the proxy settings and then tried repairing the workflow capability. And guess what, it worked and workflow capability went to ready state!!