Windows Azure Role instances status showing busy/unhealthy due to reuse of Process ID

In this blog I am planning to write a known issue with Windows Azure SDK component, which causing the one of the Role instances to show busy/unhealthy status intermittently, even though entire Windows Azure Infrastructure Process tree build correctly, and running. Please note that this scenario will be applicable, only when you try to install the multiple executable from Windows Azure Web/Worker Role as start-up task.

 

SYMPTOM

  1. Intermittently one of the Role instances in the Web/Worker role will shows busy/unhealthy status in the Windows Azure Portal
  2. Once you remote desktop into the Guest VM (Virtual Machine were you have deployed your Web/Worker Role), and review the Windows Azure infrastructure process tree using Process Explorer, you can see that Windows Azure Process tree build as correct and running as expected.
  3. On WebRole application , you will be able to browse the pages with local DIP [Deployment IP, which is the ip address for the VM] with out any issue.
  4. WaHostBootstrapper.log file, which located under C:\Resources\Directory\<roleinstance name>.DiagnosticStore directory, and you can find below entry .

 

 <snip>
[00002620:00001056, 2013/07/04, 05:12:35.984, 00040000] Registering client with PID 2352.
[00002620:00001056, 2013/07/04, 05:12:35.984, 00040000] Client XXXX.YYYYY.exe (2352) registered. <======
[00002620:00001056, 2013/07/04, 05:12:41.671, 00040000] Registering client with PID 792.
[00002620:00001056, 2013/07/04, 05:12:41.671, 00040000] Client XXXX.ZZZZZ.exe (792) registered.
[00002620:00001056, 2013/07/04, 05:12:43.937, 00040000] Registering client with PID 3260.
[00002620:00001056, 2013/07/04, 05:12:43.937, 00040000] Client powershell.exe (3260) registered.
[00002620:00002576, 2013/07/04, 05:12:51.077, 00040000] Program "E:\approot\bin\XXXXagent\UnifiedBootstrap.bat" exited with 0. Working Directory = E:\approot\bin
[00002620:00002576, 2013/07/04, 05:12:51.077, 00100000] <- GetDebugger=0x1
[00002620:00002576, 2013/07/04, 05:12:51.077, 00100000] <- GetRoleHostDebugger=0x1
[00002620:00002576, 2013/07/04, 05:12:51.077, 00100000] <- WapXmlReadContainerId=0x1
[00002620:00002576, 2013/07/04, 05:12:51.077, 00100000] <- WapGetVirtualAccountName=0x1
[00002620:00002576, 2013/07/04, 05:12:51.077, 00040000] Executing base\x64\WaIISHost.exe .
[00002620:00002576, 2013/07/04, 05:12:51.077, 00040000] Role host process PID: 2352. <=================
[00002620:00003444, 2013/07/04, 05:12:51.187, 00040000] Registering client with PID 3560.
[00002620:00003444, 2013/07/04, 05:12:51.187, 00040000] Client XXXX.UUUUU.m.exe (3560) registered.
[00002620:00001056, 2013/07/04, 05:12:51.421, 00040000] Registering client with PID 3696.
[00002620:00001056, 2013/07/04, 05:12:51.421, 00040000] Client XXXX.UUUUU.m.exe (3696) registered.
[00002620:00001056, 2013/07/04, 05:12:54.968, 00040000] Registering client with PID 3232.
[00002620:00001056, 2013/07/04, 05:12:54.968, 00040000] Client WaIISHost.exe (3232) registered.
[00002620:00001056, 2013/07/04, 05:12:54.968, 00040000] Client process WaIISHost.exe is the role host.
[00002620:00001056, 2013/07/04, 05:12:54.968, 00040000] Role host process registered.
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client DiagnosticsAgent.exe (2736).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] Failed to connect to client DiagnosticsAgent.exe (2736).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] <- CRuntimeClient::OnRoleStatusCallback(0x000000000062F5D0) =0x800706ba
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client DiagnosticsAgent.exe (2764).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Client reported status 0.
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client RemoteAccessAgent.exe (2304).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] Failed to connect to client RemoteAccessAgent.exe (2304).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] <- CRuntimeClient::OnRoleStatusCallback(0x000000000062F6F0) =0x800706ba
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client RemoteAccessAgent.exe (2432).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Client reported status 0.
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client APMInstaller.exe (3956).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] Failed to connect to client APMInstaller.exe (3956).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] <- CRuntimeClient::OnRoleStatusCallback(0x0000000000634BD0) =0x800706ba
[00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client XXXX.UUUUU.m.exe (2324).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] Failed to connect to client XXXX.UUUUU.m.exe (2324).
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] <- CRuntimeClient::OnRoleStatusCallback(0x0000000000634C30) =0x800706ba
    [00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Getting status from client XXXX.YYYYY.exe (2352). <====
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] Failed to connect to client XXXX.YYYYY.exe (2352). <====
[00002620:00001892, 2013/07/04, 05:12:54.999, 00100000] <- CRuntimeClient::OnRoleStatusCallback(0x0000000000634C60) =0x800706ba
    [00002620:00001892, 2013/07/04, 05:12:54.999, 00040000] Client process 2352 is the role host. <=================
[00002620:00001892, 2013/07/04, 05:12:54.999, 00080000] Failed to contact the role host process. Treat role as unhealthy. <======
<snip>

 

WORKAROUND

Rebooting the VM Should fix this problem. Also please note that Microsoft will address this issue in the future release of Windows Azure SDK package.

 

References :