Azure@home Part 10: Worker Role Run Method (continued)

This post is part of a series diving into the implementation of the @home With Windows Azure project, which formed the basis of a webcast series by Developer Evangelists Brian Hitney and Jim O’Neil. Be sure to read the introductory post for the context of this and subsequent articles in the series.

In the last post, we started looking at the primary method of every worker role implementation:  RunRun is typically implemented as an infinite loop, and its manifestation in Azure@home is no exception.

    1:  public override void Run()
    2:  {
    3:      // set up the local storage, including Folding@home executable
    4:      FoldingClient.SetupStorage();
    5:   
    6:      // poll for the client information in Azure table every 10 seconds
    7:      while (true)
    8:      {
    9:          ClientInformation clientInfo = FoldingClient.GetFoldingClientData();
   10:          if (clientInfo != null)
   11:          {
   12:              FoldingClient.LaunchFoldingClientProcess(clientInfo);
   13:              break;
   14:          }
   15:   
   16:          Thread.Sleep(10000);
   17:      }
   18:  }

I covered the implementation of SetupStorage (Line 4) already – this is where the Folding@home client executable is copied to the local storage within the Azure virtual machine hosting the WorkerRole instance.  Once that’s done, the main loop takes over:

  • checking to see if a client has provided his or her name and location via the default.aspx web page (Line 9), and then
  • launching work units through the Folding@home client to carry out protein folding simulations (Line 12).

If you’re wondering about the break statement at Line 13, it’s there to enable the Run loop to exit – and therefore recycle the role instance. 

Why?  The LaunchFoldingClientProcess method (Line 12) also includes a loop and will continually invoke the Folding@home client application – until there’s an uncaught exception or the client record is removed externally.   These are both somewhat unexpected events for which a reset is rather appropriate.  It’s not the only way to handle these situations, but it was a fairly straightforward mechanism that didn’t add a lot of additional exception handling code.  Remember this is didactic not production code!  

GetFoldingClientData Implementation

Recall that the workflow for Azure@home involves storing client information (Folding@home user name, latitude and longitude) in a table aptly named client in Azure storage (this was described in Part 4 of this series).   GetFoldingClientData is where the WorkerRole reads the information left by the WebRole:

    1:  internal static ClientInformation GetFoldingClientData()
    2:  {
    3:      ClientInformation clientInfo = null;
    4:      try
    5:      {
    6:          // access client table in Azure storage
    7:          CloudStorageAccount cloudStorageAccount =
    8:              CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    9:          var ctx = new ClientDataContext(
   10:              cloudStorageAccount.TableEndpoint.ToString(),
   11:              cloudStorageAccount.Credentials);
   12:   
   13:          // return the first (only) record or nothing
   14:          clientInfo = ctx.Clients.FirstOrDefault<ClientInformation>();
   15:      }
   16:      catch (Exception ex)
   17:      {
   18:          Trace.TraceWarning(
                  String.Format("Exception when polling for client data: {0} | {1}", 
                                 ex.Message, ex.StackTrace));
   19:          clientInfo = null;
   20:      }
   21:   
   22:      return clientInfo;
   23:  }

ClientInformation is the same class – extending TableServiceEntity – that I described earlier in this series (Part 3), so the code here should look rather familiar.   There are really three distinct scenarios that can occur when this code is executed (keep in mind it’s executed as part of a potentially infinite loop):

  1. Record in client table exists,
  2. Record in client table does not exist, or
  3. Client table itself does not exist.

Record in client table exists

A record will exist in the client table after the user submits the default.aspx page (via which are provided the Folding@home user name and location).  At this point, Line 14 above will return the information provided in that entity – the entity will not be null – and so the Run loop (reprised below) can proceed passing in the client information to the LaunchingFoldingClientProcess method (Line 12).  As we’ll see later, that method also checks the client table to determine if it should continue looping itself.  (The loop below will actually terminate if LaunchFoldingClientProcess ever returns – as mentioned in the callout above.)

    6:      // poll for the client information in Azure table every 10 seconds
    7:      while (true)
    8:      {
    9:          ClientInformation clientInfo = FoldingClient.GetFoldingClientData();
   10:          if (clientInfo != null)
   11:          {
   12:              FoldingClient.LaunchFoldingClientProcess(clientInfo);
   13:              break;
   14:          }
   15:   
   16:          Thread.Sleep(10000);
   17:      }



Record in client table does not exist

In this scenario, the line below will return a null object (‘default’) to clientInfo, so the Run loop will simply sleep for 10 seconds before checking again.  This is essentially a polling implementation, where the WorkerRoles are waiting for the end user to enter information via the default.aspx page.

   14:          clientInfo = ctx.Clients.FirstOrDefault<ClientInformation>();

 

Client table itself does not exist

The WorkerRole is strictly a reader (consumer) of the client table and doesn’t have the responsibility for creating it – that falls to the processing in default.aspx (as described in an earlier post).  There’s a very good chance then that the WorkerRole instances, which start polling right away, will make a request against a table that doesn’t yet exist.  This is where the exception handling below comes in:

   16:      catch (Exception ex)
   17:      {
   18:          Trace.TraceWarning(
                  String.Format("Exception when polling for client data: {0} | {1}", 
                  ex.Message, ex.StackTrace));
   19:          clientInfo = null;
   20:      }

The exception handling is a bit coarse: the exception caught in this circumstance is a DataServiceQueryException, with the rather generic message:

An error occurred while processing this request.

You have to dig a bit further, into the InnerException (of type DataServiceClientException), to get the StatusCode (404) and Message:

 <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="https://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <code>ResourceNotFound</code>
  <message xml:lang="en-US">The specified resource does not exist.</message>
</error>

which indicate that the resource was not found.  The “resource” is the client table, as can be seen in the RESTful RequestUri generated by the call to FirstOrDefault<ClientInformation> in Line 14:

https://snowball.cloudapp.net/client()?$top=1

The exception message and stack trace are written to the Azure logs via the TraceWarning call in Line 18 (and the configuration of a DiagnosticMonitorTraceListener as described in my post on Worker Role and Azure Diagnostics). 
 

LaunchFoldingClientProcess Implementation

The entire implementation of LaunchFoldingClientProcess is provided below for context, and then beyond that I’ll break out the major sections of the implementation, eventually repeating each segment of this code:

    1:  internal static void LaunchFoldingClientProcess(ClientInformation clientInfo)
    2:  {
    3:   
    4:      // write the configuration file with user information
    5:      FoldingClient.WriteConfigFile(clientInfo.UserName, clientInfo.Team, 
                                         clientInfo.PassKey);
    6:   
    7:      // get path to the Folding@home client application
    8:      LocalResource foldingIo = RoleEnvironment.GetLocalResource("FoldingClientStorage");
    9:      String targetPath = string.Format(@"{0}client", foldingIo.RootPath);
   10:      String targetExecutable = string.Format(@"{0}client\{1}", foldingIo.RootPath,
   11:          RoleEnvironment.GetConfigurationSettingValue("FoldingAtHome_EXE"));
   12:   
   13:      // get progress polling interval (default to 15 minutes)
   14:      Int32 pollingInterval;
   15:      if (!Int32.TryParse(
   16:          RoleEnvironment.GetConfigurationSettingValue("AzureAtHome_PollingInterval"),
   17:          out pollingInterval))
   18:          pollingInterval = 15;
   19:   
   20:      // setup process
   21:      ProcessStartInfo startInfo = new ProcessStartInfo()
   22:          {
   23:              UseShellExecute = false,
   24:              FileName = targetExecutable,
   25:              WorkingDirectory = targetPath,
   26:              WindowStyle = ProcessWindowStyle.Hidden,
   27:              Arguments = "-oneunit"
   28:          };
   29:   
   30:      // loop while there's a client info record in Azure table storage
   31:      while (clientInfo != null)
   32:      {
   33:   
   34:          // start a work unit
   35:          using (Process exeProcess = Process.Start(startInfo))
   36:          {
   37:   
   38:              while (!exeProcess.HasExited)
   39:              {
   40:                  // get current status
   41:                  FoldingClientStatus status = ReadStatusFile();
   42:   
   43:                  // update local status table (workunit table in Azure storage)
   44:                  if (!status.HasParseError)
   45:                  {
   46:                      UpdateLocalStatus(status);
   47:                      UpdateServerStatus(status, clientInfo);
   48:                  }
   49:   
   50:                  Thread.Sleep(TimeSpan.FromMinutes(pollingInterval));
   51:              }
   52:   
   53:              // when work unit completes successfully
   54:              if (exeProcess.ExitCode == 0)
   55:              {
   56:                  // make last update for completed role
   57:                  FoldingClientStatus status = ReadStatusFile();
   58:   
   59:                  if (!status.HasParseError)
   60:                  {
   61:                      UpdateLocalStatus(status);
   62:                      UpdateServerStatus(status, clientInfo);
   63:                  }
   64:   
   65:                  // re-poll table (if empty, this provide means to exit loop)
   66:                  clientInfo = GetFoldingClientData();
   67:              }
   68:              else
   69:              {
   70:                  Trace.TraceError(String.Format(
   71:                   "Folding@home process has exited with code {0}", exeProcess.ExitCode));
   72:   
   73:                  // this will leave orphan progress record in the Azure table
   74:              }
   75:          }
   76:      }
   77:  }



Creating Configuration File

    4:      // write the configuration file with user information
    5:      FoldingClient.WriteConfigFile(clientInfo.UserName, clientInfo.Team, 
                                         clientInfo.PassKey);

The Folding@home console client application (Folding@home-Win32-x86.exe) can prompt for the information it needs to run – user name, team number, passkey, size of work unit, etc. – or it can run using a configuration file, client.cfg.  Since the executable is running in the cloud, it can’t be interactive, so the WriteConfigFile code sets up this configuration file (in local storage).  You can read more about the console application’s configuration options on the Stanford site.  In the interest of space, I’m leaving out the WriteConfigFile implementation, but you can download all the code from the distributed.cloudapp.net site for self-study.

Constructing Path to Folding@home application

    7:      // get path to the Folding@home client application
    8:      LocalResource foldingIo = RoleEnvironment.GetLocalResource("FoldingClientStorage");
    9:      String targetPath = string.Format(@"{0}client", foldingIo.RootPath);
   10:      String targetExecutable = string.Format(@"{0}client\{1}", foldingIo.RootPath,
   11:          RoleEnvironment.GetConfigurationSettingValue("FoldingAtHome_EXE"));
   12:   

This section of code constructs the full path to the Folding@home client application as installed in local storage on the VM housing the WorkerRole instance.  Code here should look quite similar to that of my previous blog post.  Note that FoldingAtHome_EXE  (Line 11) is a configuration variable set in the ServiceConfiguration.cscfg file and points to the name of Stanford’s console client (default: Folding@home-Win32-x86.exe).

Setting Reporting Interval

   13:      // get progress polling interval (default to 15 minutes)
   14:      Int32 pollingInterval;
   15:      if (!Int32.TryParse(
   16:          RoleEnvironment.GetConfigurationSettingValue("AzureAtHome_PollingInterval"),
   17:          out pollingInterval))
   18:          pollingInterval = 15;

Worker Role in Azure@home

Recall from the architecture diagram (left), that progress on each work unit is reported to the workunit table in your own Azure storage account (labeled 7 in the diagram) and to the distributed.cloudapp.net application (labeled 8). 

Using development storage, you may want to report status frequently for testing and debugging. During production though, you’re charged for storage transactions and potentially bandwidth (if your deployment is not collocated in the same data center as distributed.cloudapp.net).  Add to that the fact that the progress on most work units is rather slow – some work units take days to complete – and it’s easy to conclude that the polling interval doesn’t need to be subsecond!   In Azure@home, the configuration includes the AzureAtHome_PollingInterval value (in minutes), and that’s the value being configured here.  If the parameter doesn’t exist, the default is 15 minutes (Line 18).

Setting Process Info

   20:      // setup process
   21:      ProcessStartInfo startInfo = new ProcessStartInfo()
   22:          {
   23:              UseShellExecute = false,
   24:              FileName = targetExecutable,
   25:              WorkingDirectory = targetPath,
   26:              WindowStyle = ProcessWindowStyle.Hidden,
   27:              Arguments = "-oneunit"
   28:          };
  

With the various configuration parameters set – either in code or the client.cfg file – the Folding@home executable has enough information to run unattended.  The snippet of code above sets up the ProcessStartInfo; note the use of targetExecutable and targetPath (Lines 24 – 25) that were obtained from local storage a bit earlier in this routine.  Arguments (Line 27) are command-line switches passed to the executable; in this case –oneunit tells the Folding@home application to exit after each work unit, versus starting a new one.

Getting Loopy

Up next is a nested loop covering the rest of the implementation, with pseudo-code something like this:

 while there’s data in the client table
    start a new Folding@home process
       while the process is still running
          read the status file (unitinfo.txt)
          report progress
          sleep for designated period
       continue
       if process ended successfully
          read the status file
          report progress
          check if there's still a row in client table
       else
          write exception data to log
       end
    continue
continue

The outer loop checks that there’s still a row in the client table (initially put there by submitting default.aspx within the WebRole).  The Azure@home implementation doesn’t expose a way to delete this row once it’s there, but we included the condition to allow for a ‘backdoor’ mechanism to cleanly stop the loop (by deleting the client row through some external means).

   30:      // loop while there's a client info record in Azure table storage
   31:      while (clientInfo != null)
   32:      {

Within the loop, the process is started and an inner loop executed (Line 38 below) so long as the process is still running.   This is a polling loop, with the interval determined by the AzureAtHome_PollingInterval configuration parameter.  On each iteration, progress is reported to the workunit Azure table (Line 46) and to the distributed.cloudapp.net service (Line 47).  We’ll look at their implementations in the next post of this series.

   34:          // start a work unit
   35:          using (Process exeProcess = Process.Start(startInfo))
   36:          {
   37:   
   38:              while (!exeProcess.HasExited)
   39:              {
   40:                  // get current status
   41:                  FoldingClientStatus status = ReadStatusFile();
  42:   
  43:                  // update local status table (workunit table in Azure storage)
  44:                  if (!status.HasParseError)
  45:                  {
  46:                      UpdateLocalStatus(status);
  47:                      UpdateServerStatus(status, clientInfo);
  48:                  }
  49:   
  50:                  Thread.Sleep(TimeSpan.FromMinutes(pollingInterval));
  51:              }
  52:  ...

The process may exit because it’s finished a work unit (recall, we supplied the –oneunit parameter to the Folding@home client process) or because of some internal error, in which case the exit code (Line 54 below) will be non-zero.   Assuming successful completion of the work unit, Lines 57 - 63 repeat the progress reporting of the previous loop, thus providing the final status report of 100% completion.

If the process was aborted, the exit code is recorded in the Azure logs (Lines 70-71), but note from the subsequent comment that the corresponding progress record in the workunit table remains, stuck at whatever percentage completion it had logged prior to the unexpected termination of the process.   Of course, we could have removed the record, moved it to an ‘aborted process’ table, or implemented some other business rule as well.

   35:          using (Process exeProcess = Process.Start(startInfo))
               
                   // lines removed for brevity
 
   53:              // when work unit completes successfully
   54:              if (exeProcess.ExitCode == 0)
   55:              {
   56:                  // make last update for completed role
   57:                  FoldingClientStatus status = ReadStatusFile();
   58:   
   59:                  if (!status.HasParseError)
   60:                  {
   61:                      UpdateLocalStatus(status);
   62:                      UpdateServerStatus(status, clientInfo);
   63:                  }
   64:   
   65:                  // re-poll table (if empty, this provide means to exit loop)
   66:                  clientInfo = GetFoldingClientData();
   67:              }
   68:              else
   69:              {
   70:                  Trace.TraceError(String.Format(
   71:                   "Folding@home process has exited with code {0}", exeProcess.ExitCode));
   72:   
   73:                  // this will leave orphan progress record in the Azure table
   74:              }
   75:          }
   76:      }
   77:  }

In the next post, we’ll delve into how progress is reported both to the local instance of Azure storage and to the main Azure@home application (distributed.cloudapp.net).