Reconnecting a “disconnected” lab agent

There is a huge troubleshooting guide at blogs.msdn.com/lab_management/pages/troubleshooting.aspx.  I was running into a problem about my test agent always being “disconnected”.  The particular error I had was

"TF276055: The machine is not ready to run tests because of the following error: Unable to connect to the controller on <controller_name:6901>. The agent can connect to the controller but the controller cannot connect to the agent because of following reason: A connection attempt failed because the connected party did not respond properly after a period of time, or established connection failed because connected host has failed to respond ."

I looked at this particular entry: blogs.msdn.com/lab_management/pages/troubleshooting.aspx#e3_8.  I found I couldn’t ping my VM from other machines.  On my host, I had a bizarre IP address for it.  I also had the wrong IP address on my TFS server and controller.  After using an “ipconfig /release” and “ipconfig /renew” two-step on the lab VM, I realized I still couldn’t ping from the controller or my host OS, but they were both pinging different addresses—neither of which was available on my lab VM.  I finally flushed the DNS and got past the problem.  If you have to do this, run the following on your controller and any machines connecting to your unreachable lab machine:

1. Click the Microsoft Windows 7/Vista/Windows Server Start logo in the bottom left corner of the screen
2. Click All Programs
3. Click Accessories
4. RIGHT-click on Command Prompt
5. Select Run As Administrator
6. In the command window type the following and then hit enter: ipconfig /flushdns

Try pinging again (assuming you see the DNS resolver cache is cleared).  If you can ping the machine, you should be in business.  You may want to telnet to the 6901 agent port.

I’d bookmark that troubleshooting guide, as mismatched credentials, firewall issues, or other problems can surprise you. 

The machine in question was a test lab VM.  I had been upgrading my entire environment from release candidate to the released version.  I’d recently moved it to new hardware. To get that environment up and running, I

  • Upgraded the test agent
  • Upgraded the lab agent
  • Upgraded the team build service
  • added the build account to the local administrators group on the lab so that it could manage services and deploy code.

After a few false starts and a few mistakes, all my build servers are up and running, and I can run builds.  I should be able to get the lab deployment workflow up and running again shortly.