TFS11 Beta – TfsBuildServiceHost.2012 service stopping unexpectedly


We noticed several early adopters running into an issue with the build service in TFS11 Beta stopping unexpectedly. There is no event log entry other than it stopped and restarting it seems to work fine. The root cause seems to be connectivity to the TFS11 application tier. If the build machine can’t connect to the AT for more than the 5 minute timeout, an exception causes it to stop. It acutally stops normally, so the service failure configuration doesn’t restart it. We also forgot to log the error to the event log, so there’s no explanation for the stoppage.

The only workaround is to restart the service.

To make this a little easier, I wrote a simple console application that will check the service every five minutes and ensure that it is still running. Unfortunately, this application has to run as an administrator because of the security around services. Simply create a Windows Console Application in VS, add a reference to System.ServiceProcess for the project,
and replace Program.cs with this code…

 using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.ServiceProcess;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace KeepAliveBuildService
{
class Program
{
static void Main(string[] args)
{
String serviceName = args.Length > 0 ? args[0] : "TfsBuildServiceHost.2012";
while (true)
{
// Make sure the service is up
EnsureServiceIsStarted(serviceName);
// Wait five minutes
Thread.Sleep(new TimeSpan(0, 5, 0));
}
}

static bool EnsureServiceIsStarted(String serviceName)
{
try
{
var service = new ServiceController(serviceName);
LogMessage(String.Format("{0} Status {1}", serviceName, service.Status));
if (service.Status != ServiceControllerStatus.Running)
{
LogMessage(String.Format("Starting {0}", serviceName, service.Status));
service.Start();
return true;
}
}
catch (InvalidOperationException)
{
LogMessage(String.Format("No service with the name {0} could be found.", serviceName));
}
catch (Exception ex)
{
LogMessage(ex.ToString());
}

return false;
}

static void LogMessage(String message)
{
String source = "KeepAliveBuildService";
String log = "Application";
if (!EventLog.SourceExists(source))
{
EventLog.CreateEventSource(source, log);
}

EventLog.WriteEntry(source, message);
Console.WriteLine(message);
}
}
}

You can run the program interactively on the desktop (remember to run as Admin) or you can create a Windows Service torun the application for you.Please note that this is only a problem for the TFS11 Beta Build Service. We have since fixed the problem and it should not appear in future versions

Sorry about the bug 🙁

Comments (3)

  1. James Manning says:

    I haven't actually tried this, but off-hand I would guess that you could just reconfigure the build service to restart on failure?

    I think you can do so with sc.exe failure and/or sc.exe failureflag ?

    technet.microsoft.com/…/cc742019.aspx

    technet.microsoft.com/…/cc742011.aspx

  2. Hi James,

    Good to see you are still reading my blog 😉

    We do set the restart on failure settings during install. Unfortunately, in this case we actually handle the error and close normally. Not a good thing. So, the server manager assumes that we exited normally and doesn't restart the service. Hence the need for something else to restart the service.

    Thanks,

    Jason

  3. Martin Green says:

    Jason,

    We aren't experiencing the build service stopping – but we are having problems with build services "Running for 0 Seconds" and no logging.  Take a look @ social.msdn.microsoft.com/…/6b4c4feb-6770-4a10-b173-c61e14a5ef0c and social.msdn.microsoft.com/…/d0b7e372-4ae6-42bf-b774-98d1b7e39f51 for details.  

    It's very frustrating because trace logging doesnt seem to work.