Help with Windows Azure role stuck in Initializing/Busy/Stopped state

I have seen a number of issues where the Windows Azure role is stuck when you deploy it and you can’t get it started. There are a few great posts about this problem at:

One of the problems outlined here shows up because an exception is thrown in the OnStart function inside the WebRole.cs file in your Windows Azure WebRole. If an exception happens here, it becomes an unhandled exception because there is no try/catch and so the process that it is running in will be shut down. There is a great description of this at CLR Inside Out- Unhandled Exception Processing In The CLR.

So one way to help this situation is to add a try/catch inside the OnStart function. Below is such an example but I took it a step further and added code in the catch that will log the exception in the Event Log. That way we can see what is happening and investigate the problem.

Note: If the exception is caused by the Diagnostics code, you can’t use that to get the event log data, but you would be able to remote into the machine and look at the event log on the machine. The steps for that are here.

Here is what that function would look like:

    1: public override bool OnStart()
    2: {
    3:     try
    4:     {
    5:         // For information on handling configuration changes
    6:         // see the MSDN topic at https://go.microsoft.com/fwlink/?LinkId=166357.
    7:  
    8:     }
    9:     catch (Exception e)
   10:     {
   11:         #region "Handle Exception"
   12:         System.Text.StringBuilder message = new System.Text.StringBuilder("\r\n\r\nException Occurred:\r\n\r\nappId=");
   13:  
   14:         String appId = (string)AppDomain.CurrentDomain.GetData(".appId");
   15:         if (appId != null)
   16:             message.Append(appId);
   17:  
   18:         message.AppendFormat("\r\n\r\ntype={0}\r\n\r\nmessage={1}\r\n\r\nstack=\r\n{2}\r\n\r\n",
   19:             e.GetType().FullName,
   20:             e.Message,
   21:             e.StackTrace);
   22:  
   23:         string webenginePath = System.IO.Path.Combine(System.Runtime.InteropServices.RuntimeEnvironment.GetRuntimeDirectory(), "webengine.dll");
   24:  
   25:         if (!System.IO.File.Exists(webenginePath))
   26:         {
   27:             throw new Exception(String.Format(System.Globalization.CultureInfo.InvariantCulture,
   28:                 "Failed to locate webengine.dll at '{0}'. This module requires .NET Framework.",
   29:                 webenginePath));
   30:         }
   31:  
   32:         System.Diagnostics.FileVersionInfo ver = System.Diagnostics.FileVersionInfo.GetVersionInfo(webenginePath);
   33:         string _sourceName = string.Format(System.Globalization.CultureInfo.InvariantCulture, "ASP.NET {0}.{1}.{2}.0",
   34:             ver.FileMajorPart, ver.FileMinorPart, ver.FileBuildPart);
   35:  
   36:         if (!System.Diagnostics.EventLog.SourceExists(_sourceName))
   37:         {
   38:             throw new Exception(String.Format(System.Globalization.CultureInfo.InvariantCulture,
   39:                 "There is no EventLog source named '{0}'. This module requires .NET Framework.",
   40:                 _sourceName));
   41:         }
   42:  
   43:         System.Diagnostics.EventLog Log = new System.Diagnostics.EventLog();
   44:         Log.Source = _sourceName;
   45:         Log.WriteEntry(message.ToString(), System.Diagnostics.EventLogEntryType.Error);
   46:         #endregion
   47:     }
   48:  
   49:     return base.OnStart();
   50: }

I added a region so you can minimize the catch code and just focus on your code inside the try. Hope this helps people to be able to handle problems in this function better and not have it keep the Role from starting up.

One last thing, I put this under the ASP.NET Source in the Event Log. So that is what you would look for in regards to entries. You can change this to something else, but keep in mind that if you add your own event source, the process can’t use it until it is restarted. There is an example doing that here. This is the same code that is shown in Unhandled exceptions cause ASP.NET-based applications to unexpectedly quit in the .NET Framework 2.0.