Azure: Why did my role crash?

Article
08/02/2010

One thing you might encounter when you start your development on Windows
Azure> is that there is an insane number of options available for number of options
for logging. You can view a quick primer here.
One of the things that I like about it is that you don’t necessarily need to learn
a whole new API just to use it. Instead, the logging facilities in Azure integrates
really well with the existing Debug
and Trace Logging API in .NET. This is a really nice feature and is done
very well in Azure. In-fact to set-up and configure it is all of about 5 lines
of code. Actually it’s four lines of code with one line that wraps:

  1: public override bool OnStart() 
  2: { 
  3: DiagnosticMonitorConfigurationdmc = 
  4: DiagnosticMonitor.GetDefaultInitialConfiguration(); 
  5: dmc.Logs.ScheduledTransferPeriod= TimeSpan.FromMinutes(1); 
  6: dmc.Logs.ScheduledTransferLogLevelFilter= LogLevel.Verbose;
  7: DiagnosticMonitor.Start("DiagnosticsConnectionString",dmc); 
  8: }

One specific item to note is the ScheduledTransferPeriod property of the Logs property.
The minimum value you can set for that property is the equivalent of 1 minute.
The only downside to this method of logging is that if your Azure role crashes within
that minute, whatever data you have written to your in-built logging will be lost.
This also means that if you are writing exceptions to your logging, that will be lost
as well. That can cause problems if you’d like to know both when your role crashed
and why (the exception details).

Before we talk about a way to get around it, let’s review why the Role might crash.
In the Windows
Azure world, there are two primary roles that you will use, a Worker and Web role.
Main characteristics and reasons it would crash are below:

Role Type	Analogous	Why would it crash/restart?
Worker Role	Console Application	Any unhandled exception.
Web Role	ASP.NET Application hosted in IIS 7.0+	Any unhandled exception thrown on a background thread. StackOverflowException Unhandled exception on finalizer thread.

As you can see, the majority of the reasons why an Azure role would recycle/crash/restart
are essentially the same as with any other application – essentially an unhandled
exception. Therefore, to mitigate this issue, we can subscribe to the AppDomain’s
UnhandledException Event. This event is fired when your application experiences
an exception that is not caught and will fire RIGHT BEFORE the application
crashes. You can subscribe to this event in the Role OnStart() method:

  1: public override bool OnStart()
  2: {
  3: AppDomainappDomain = AppDomain.CurrentDomain;
  4: appDomain.UnhandledException+= 
  5: new UnhandledExceptionEventHandler(appDomain_UnhandledException);
  6: ...
  7: }

You will now be notified right before your process crashes. The last piece to
this puzzle is logging the exception details. Since you must log the details
right when it happens, you can’t just use the normal Trace or Debug statements.
Instead, we will write to the Azure storage directly. Steve
Marx has a good blog entry about printf in the cloud. While it works, it
requires the connection string to be placed right into the logging call. He
mentions that you don’t want that in a production application. in our case,
we will do things a little bit differently. First, we must add the requisite
variables and initialize the storage objects:

  1: private static bool storageInitialized= false;
  2: private static object gate= new Object();
  3: private static CloudBlobClientblobStorage;
  4: private static CloudQueueClientqueueStorage;
  5:  
  6: private void InitializeStorage()
  7: {
  8: if (storageInitialized)
  9: {
  10: return;
  11: }
  12:  
  13: lock (gate)
  14: {
  15: if (storageInitialized)
  16: {
  17: return;
  18: }
  19:  
  20:  
  21: //read account configuration settings
  22: varstorageAccount = 
  23: CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
  24:  
  25: //create blob container for images
  26: blobStorage= 
  27: storageAccount.CreateCloudBlobClient();
  28: CloudBlobContainercontainer = blobStorage.
  29: GetContainerReference("webroleerrors");
  30:  
  31: container.CreateIfNotExist();
  32:  
  33: //configure container for public access
  34: varpermissions = container.GetPermissions();
  35: permissions.PublicAccess= 
  36: BlobContainerPublicAccessType.Container;
  37: container.SetPermissions(permissions);
  38:  
  39: storageInitialized= true;
  40: }
  41: }

This will instantiate the requisite logging variables and then when the InitializeStorage()
method is executed, we will set these variables to the appropriate initialized values.
Lastly, we must call this new method and then write to the storage. We put this
code in our UnhandledException event handler:

  1: void appDomain_UnhandledException(object sender,UnhandledExceptionEventArgs e)
  2: {
  3: //Initialize the storage variables.
  4: InitializeStorage();
  5:  
  6: //Get Reference to error container.
  7: varcontainer = blobStorage.
  8: GetContainerReference("webroleerrors");
  9:  
  10:  
  11: if (container!= null)
  12: {
  13: //Retrieve last exception.
  14: Exceptionex = e.ExceptionObject as Exception;
  15:  
  16: if (ex!= null)
  17: {
  18: //Will create a new entry in the container
  19: //and upload the text representing the 
  20: //exception.
  21: container.GetBlobReference(
  22: String.Format(
  23: "<insertunique name for your application>-{0}-{1}",
  24: RoleEnvironment.CurrentRoleInstance.Id,
  25: DateTime.UtcNow.Ticks)
  26: ).UploadText(ex.ToString());
  27: }
  28: }
  29: 
  30: }

Now, when your Azure role is about to crash, you’ll find an entry in your blog storage
with the details of the exception that was thrown. For example, one of the leading
reasons why an Azure Worker Role crashes is because it can’t find a dependency it
needs. So, in that case, you’ll find an entry in your storage with the following
details:

  1: System.IO.FileNotFoundException:Could not load file or assembly 
  2: 'GuestBook_Data,Version=1.0.0.0, Culture=neutral, 
  3: PublicKeyToken=f8a5fcb6c395f621'or one of its dependencies. 
  4: Thesystem cannot find the file specified.
  5:  
  6: Filename: 'GuestBook_Data, Version=1.0.0.0, Culture=neutral, 
  7: PublicKeyToken=f8a5fcb6c395f621'

You can find some other common reasons why an Azure role might crash (especially when
you first deploy it) can be found at Anton Staykov’s excellent blog:

Hope this helps.

Enjoy!

Azure: Why did my role crash?

Additional resources