Azure@home Part 7: Asynchronous Table Storage Pagination

This post is part of a series diving into the implementation of the @home With Windows Azure project, which formed the basis of a webcast series by Developer Evangelists Brian Hitney and Jim O’Neil. Be sure to read the introductory post for the context of this and subsequent articles in the series.

Status.aspxIn the last post of this series, I morphed the original code into something that would incorporate pagination across Azure table storage – specifically for displaying completed Folding@home work units in the GridView of status.aspx (shown to the right).

That implementation focuses on the use of DataServiceQuery and requires some code to specifically handle continuation tokens.  In this post, we’ll look at using CloudTableQuery instead, which handles continuation tokens ‘under the covers,’ and we’ll go one step further to incorporate asynchronous processing, which can increase the scalability of an Azure application.

DataServiceQuery versus CloudTableQuery

If you’ve worked with WCF Data Services (or whatever it was called at the time: ADO.NET Data Services or “Astoria”), you’ll recognize DataServiceQuery as “the way” to access a data service – like Azure Table Storage.   DataServiceQuery, though, isn’t aware of the nuances of the implementation of cloud storage, specifically the use of continuation tokens and retry logic.  That’s where CloudTableQuery comes in – a DataServiceQuery can be adapted into a CloudTableQuery via the AsTableServiceQuery extension method or via a constructor.

RetryPolicy

Beyond this paragraph, I won’t delve into the topic of retry logic in this series.  The collocation of the Azure@home code and storage in the same data center definitely minimizes latency and network unpredictability; nevertheless, retry logic is something to keep in mind for building a robust service (yes, even if everything is housed in the same data center).  There are a few built-in retry policies you can use, including none or retrying n times with a constant interval or with an exponential backoff scheme.  Since RetryPolicy is a delegate, you can even build your own fancy logic.  Note, Johnny Halife provides some food for thought regarding scalability impact, and he even recommends using no retry policy and just handling the failures yourself.

If you’re wondering what the default retry policy is, I didn’t find it documented and actually would have assumed it was NoRetry, but empirically it looks like RetryExponential is used with a retry count of 3, a minBackoff of 3 seconds, a maxBackoff of 90 seconds, and a deltaBackoff of 2 seconds.   The backoff calculation is:

minBackoff + (2^currentRetry - 1) * random (deltaBackoff *0.8, deltabackoff * 1.2)

so if I did my math right, the attempt will be retried three times if needed: the first retry 4.6 to 5.4 seconds after the first, the second 7.8 to 10.2 seconds later, and the last 14.2 to 19.8 seconds after that.   Keep in mind, these values don’t appear to be currently documented, so presumably could change.

Query Execution Methods

You may have noticed that DataServiceQuery has both the synchronous Execute method and an analogous asynchronous pair (BeginExecute/EndExecute), so the implementation described in my last post could technically be made asynchronous without using CloudTableQuery.  But where’s the fun in that, especially since CloudTableQuery exposes some methods that more transparently handle the continuation token logic.

DataServiceQuery methods

CloudTableQuery methods

Execute Execute
BeginExecute BeginExecuteSegmented
EndExecute EndExecuteSegmented

CloudTableQuery similarly has an Execute method, which will generate multiple HTTP requests and traverse whatever continuation tokens necessary to fulfill the original query request.   As such, you don’t really get (or need) direct visibility into continuation tokens as you do via the QueryOperationResponse for a DataServiceQuery.   There is a drawback though.  In our pagination scenario for instance, a single Execute would return the first page of rows (with page size defined by the Take extension method), but there seems to be no convenient way to access the continuation token to signal to a subsequent Execute where to start getting the next page of rows.

ResultSegment<T> classBeginExecuteSegmented and EndExecuteSegmented do provide the level of control we’re looking for, but with a bit more ceremony required.  A call to this method pair will get you a segment of results formatted as a ResultSegment  instance (see right).  You can see right away that each segment maintains a continuation token (class ResultContinuation) and encapsulates a collection of Results corresponding to the query; however, that may or may not be the complete set of data! 

For sake of example, consider a page size of over 1200 rows.  As you may recall from the last post, there are occasions where you do will not get all of the entities in a single request.  There’s a hard limit of 1000 entities per response, and there are additional scenarios that have similar consequences and involve far fewer rows, such as execution timeouts or results crossing partition boundaries.  

At the very least, due to the 1000-entity limit, the ResultSegment you get after the asynchronous call to BeginExecuteSegmented will contain only 1000 rows in its Results collection (well, ideally; see callout box below), but, it will have HasMoreResults set to true, because it knows the desired request has not been fulfilled: another 200 entities are needed. 

That’s where GetNext (and its asynchronous analogs) come in.  You use GetNext to make repeated requests to flush all results for the query, here 1200 rows.  Once HasMoreResults is false, the query results have been completely retrieved, and a request for the next page of 1200 rows would then require another call to BeginExecuteSegmented, passing in the ContinuationToken stored in the previous ResultSegment as the starting point. 

Think of it as a nested loop; pseudocode for accessing the complete query results would look something like this

 token = null
repeat
   call Begin/EndExecuteSegmented with token
      while HasMoreResults
           call GetNext to retrieve additional results 
      set token = ContinuationToken
until token is null

There’s a bug in the Storage Client (I'm using Version 1.2) where the entity limit is set to 100 versus 1000, so if you are running a tool such as Fiddler against the scenario above, you’ll actually see 12 requests and 12 responses of 100 entities returned.  The logic to traverse the segments is still accurate, just the maximum segment size is off by an order of magnitude.

What’s essentially happening is that the “server” acknowledges your request for a bunch of data, but decides that it must make multiple trips to get it all to you.  It’s kind of like ordering five glasses of water for your table in a restaurant: the waiter gets your order for five, but having only two hands (and no tray) makes three trips to the bar to fulfill your request.  BeginExecuteSegmented represents your order, and GetNext represents the waiter’s subsequent trips to fulfill it.

If all this is still a bit fuzzy, stick with me, hopefully it will become clearer as we apply this to the Azure@home sample.

Updating Azure@home

The general structure of the new status.aspx implementation using CloudTableQuery will be roughly the same as what we created last time; I’ll call the page statusAsync.aspx from here on out to avoid confusion. 

AsyncPageData class

As before, session state is used to retain the lists of completed and in-progress work units, as well as the continuation token marking the current page of results.   Last time the class was PageData, and this time I’ll call it AsyncPageData:

 protected class AsyncPageData
{
    public List<WorkUnit> InProgressList = new List<WorkUnit>();
    public List<CompletedWorkUnit> CompletedList = new List<CompletedWorkUnit>();
    public ResultContinuation ContinuationToken = null; 
    public Boolean QueryResultsComplete = false; 
}

The only difference from PageData is the use of the ResultContinuation class to track the continuation token, versus individual string values for the partition key and row key.  That continuation token is captured from a ResultSegment described earlier. 

RetrieveCompletedUnits

If you review the refactoring we did in the last post, you’ll see that the logic to return one page of data is already well encapsulated in the RetrieveCompletedUnits method, and in fact, the only code change strictly necessary is in that method.  Here’s an updated implementation, which works, but as you’ll read later in this post is not ideal

    1:  protected void RetrieveCompletedUnits()
    2:  {
    3:      if (!pageData.QueryResultsComplete)
    4:      {
    5:          // select enough rows to fill a page of the GridView
    6:          // GridView.PageSize < 1000 or UI paradigm will fail
    7:          Int32 maxRows = GridViewCompleted.PageSize;
    8:   
    9:          // add one if first page, to force a page 2 indicator
   10:          if (pageData.ContinuationToken == null)
   11:              maxRows++;
   12:   
   13:          // set up query
   14:          var qry = (from w in ctx.WorkUnits
   15:                      where w.Progress == 100
   16:                      select w).Take(maxRows).AsTableServiceQuery<WorkUnit>();
   17:                  
   18:          // use an event to synchronize
   19:          using (System.Threading.ManualResetEvent evt = 
                   new System.Threading.ManualResetEvent(false))
   20:          {
   21:              // asynchronously fetch next page of data 
   22:              qry.BeginExecuteSegmented(pageData.ContinuationToken, (ar) =>
   23:                  {
   24:                      var response = (ar.AsyncState as CloudTableQuery<WorkUnit>)
                               .EndExecuteSegmented(ar);
   25:   
   26:                      // add first segment of data
   27:                      pageData.CompletedList.AddRange(
   28:                          from wu in response.Results
   29:                          select new CompletedWorkUnit(wu));
   30:   
   31:                      // continue fetching segments to complete page
   32:                      while (response.HasMoreResults)
   33:                      {
   34:                          response = response.GetNext();
   35:                          pageData.CompletedList.AddRange(
   36:                              from wu in response.Results
   37:                              select new CompletedWorkUnit(wu));
   38:                      }
   39:   
   40:                      // set continuation token for next page request
   41:                      pageData.ContinuationToken = response.ContinuationToken;
   42:                      evt.Set();
   43:                  }
   44:                  , qry
   45:              );
   46:   
   47:              // wait until async retrieval is complete 
   48:              evt.WaitOne();
   49:          }
   50:   
   51:          // end of data reached if there's no continuation token
   52:          pageData.QueryResultsComplete = pageData.ContinuationToken == null;
   53:      }
   54:  }

The first 16 lines are nearly identical; the only substantive modification is the casting of the query in Line 16 to a CloudTableQuery (from the default of DataServiceQuery).

From there, though, it diverges considerably!  Let’s take a look at it in sections:

  • Lines 19 – 20:  Since we’re planning to make an asynchronous call to retrieve the data while the ASP.NET page lifecycle is underway, we need to block until we have the data back, and here I’m using a event to handle that.  RetrieveCompletedUnits blocks at Line 48 until the asynchronous callback has completed and set the event in Line 42.

  • Lines 22 – 45 technically comprise one line of code.  The BeginExecuteSegmented call accepts two parameters:

    • a continuation token marking where the next page starts (or null for the very first call), and
    • a callback, here a lambda function, that should execute when the asynchronous retrieval is complete.

Within the callback, the original query from the AsyncState (cf. Line 44) is reconstituted to obtain the response (Line 24).
 
From there, it’s fairly straightforward: Lines 27 - 29 take the data returned in the response, massage it from a WorkUnit to a CompletedWorkUnit, and add it to the collection of completed work units being stored in the session state object, pageData

Depending on the page size (and other circumstances), that initial set of results may not be complete; for instance, a “page size” of 1200, would return at most 1000 entities in that first batch of response.Results, so Line 27 would see only those 1000 rows.   The response reference (a ResultSegment) though realizes it still needs to get 200 more rows to meet the request, so HasMoreResults is set to indicate that. 

Retrieving those additional rows to fulfill the request is the job of the loop in Lines 32 – 38.  Once the ResultSegment indicates it has no more results (i.e., HasMoreResults is false), processing is complete.  To unblock the main thread and let RetrieveCompletedUnits continue, the event is set in Line 42.

  • Line 48 is where the main thread blocks until the event is set (in Line 42, by the asynchronous callback specified for BeginExecuteSegmented).
  • As results are being fetched, the ContinuationToken in the ResultSegment is getting updated appropriately and stored in the session variable, pageData, on Line 41
  • In the last line of this routine, Line 52, a boolean is set indicating that all of the data has been fetched from Azure table storage and is currently in memory.  Subsequent paging requests will traverse the in-memory cache to save storage transactions (granted, at the sacrifice of data currency).

Now this certainly works, but we can do better. 

The ASP.NET Page Lifecycle Revisited

Recall in my last post, I laid out a graphic showing the page lifecycle.  That lifecycle all occurs on an ASP.NET thread, specifically a worker thread from the CLR thread pool.  The size of the thread pool then determines the maximum number of concurrent ASP.NET requests at any given time.  In ASP.NET 3.5 and 4, the default size of the thread pool is 100 worker threads per processor.

Now consider what happens with the implementation I just laid out above.  The page request, and the execution of RetrieveCompletedUnits, occurs on a thread pool worker thread.  In Line 22, an asynchronous request begins, using an I/O Completion port under the covers, and the worker thread is left blocking at Line 48 just waiting for the asynchronous request to complete.  If there are (n x CPUs) such requests being processed when a request comes in for a quick and simple ASP.NET page, what happens?  Chances are the server will response with a “503 Service Unavailable” message, even though all the web server may be doing is waiting for external I/O processes – like an HTTP request to Azure table storage – to complete.  The server is busy… busy waiting!

So what has the asynchronous implementation achieved?  Not much!  First of all, the page doesn’t return to the user any quicker, and in fact may be marginally (though probably imperceptibly) slower because of the threading choreography.   And then, as described above, we haven’t really done anything to improve scalability of our application.  So why bother?

Well, the idea of using asynchronous calls was the right one, but the implementation was flawed.  In ASP.NET 2.0, asynchronous page support was added to the ASP.NET page lifecycle.  This support allows you to register “begin” and “end” event handlers to be processed asynchronously at a specific point in the lifecycle known as the “async point” – namely between PreRender and PreRenderComplete, as you can see below:

ASP.NET Asynchronous Page Lifecycle

 

Since the asynchronous I/O (green boxes) is handled via I/O completion ports (which form the underlying implementation of the familiar BeginExecute/EndExecute implementations in .NET), the ASP.NET worker thread can be returned to the pool as soon as the I/O request starts.  That worker thread is now free to service potentially many more additional requests versus being forced to sit around and wait for completion of the I/O on the current page.  When the asynchronous I/O operation is complete, the next available worker thread can pick right up and resume the rest of the page creation.

Now here’s the key part to why this improves scalability: the thread listening on the I/O completion port is an I/O thread versus a worker thread, so it’s not cannibalizing a thread that could be servicing another page request.  Additionally, a single I/O completion port (and therefore I/O thread) could be handling multiple outstanding I/O requests, so with I/O threads there isn’t necessarily a one-to-one correspondence to ASP.NET page requests as there is with worker threads.

To summarize: by using the asynchronous page lifecycle, you’re deferring the waiting to specialized non-worker threads, so you can put the worker threads to use servicing additional requests versus just waiting around for some long running, asynchronous event to complete.


Implementing ASP.NET Asynchronous Pages

Implementing ASP.NET asynchronous pages is much easier than explaining how they work!   First off, you need to indicate the page will perform some asynchronous processing by adding the Async property to the @ Page directive.  You can also set an AsyncTimeout property so that page processing will terminate if it hasn’t completed within the specified duration (defaulting to 45 seconds).

   <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="StatusAsync.aspx.cs" 
       Inherits="WebRole.StatusAsync" EnableViewState="true" Async="true" %>

Setting Async to true is important!   If not set, the tasks registered to run asynchronously will still run, but the main thread for the page will block until all of the asynchronous tasks have completed, and then that main thread will resume the page generation.  That, of course, negates the benefit of releasing the request thread to service other clients while the asynchronous tasks are underway.

There are two different mechanisms to specify the asynchronous methods: using PageAsyncTask or invoking AddOnPrerenderCompleteAsync.  In the end, they accomplish the same things, but PageAsyncTask provides a bit more flexibility:

  1. by enabling specification of a callback method if the page times out,
  2. retaining the HttpContext across threads, including things like User.Identity,
  3. allowing you to pass in state to the asynchronous method, and
  4. allowing you to queue up multiple operations.

I like flexibility, so I’m opting for that approach.  If you want to see AddOnPrerenderCompleteAsync in action, check out Scott Densmore’s blog post which similarly deals with paging in Windows Azure.

Registering Asynchronous Tasks

Each PageAsyncTask specifies up to three event handlers: the method to call to begin the asynchronous task, the method to call when the task completes, and optionally a method to call when the task does not complete within a time-out period (specified via the AsyncTimeout property of the @Page directive). Additionally, you can specify whether or not multiple asynchronous task can run in parallel. 

When you register a PageAsyncTask, via Page.RegisterAsyncTask, it will be executed at the ‘async point’ described above – between PreRender and PreRenderComplete.  Multiple tasks will be executed in parallel if the ExecuteInParallel property for a task is via the constructor.

In the current implementation of status.aspx, there are two synchronous methods defined to retrieve the work unit data, RetrieveInProgressUnits and RetrieveCompletedUnits.  For the page to execute those tasks asynchronously, each invocation of those methods (there are several) must be replaced by registering a corresponding PageAsyncTask

In the Page_Load code, for instance,  

   27:          cloudClient.CreateTableIfNotExist("workunit");
  28:          if (cloudClient.DoesTableExist("workunit"))
  29:          {
  30:              Page.RegisterAsyncTask(new PageAsyncTask(this.BeginRetrieveInProgressUnits, 
                      this.EndRetrieveInProgressUnits, this.OnTimeout, null, true));
  31:              Page.RegisterAsyncTask(new PageAsyncTask(this.BeginRetrieveCompletedUnits,  
                      this.EndRetrieveCompletedUnits, this.OnTimeout, null, true));
  32:          }
  33:          else
  34:          {
  35:              System.Diagnostics.Trace.TraceError("
                      Unable to create 'workunit' table in Azure storage");
  36:          }

Lines 30 and 31 show the registration of two new PageAsyncTasks.  One task handles the retrieval of the in-progress units, and the other handles the retrieval of the paginated, completed units.   Each task specifies a complementary Begin/End method pair of delegates to implement the asynchronous invocation pattern.


Begin Retrieval Event Handler Implementation for In-Progress Work Units

    1:  protected IAsyncResult BeginRetrieveInProgressUnits(object sender, EventArgs e, 
                                                           AsyncCallback cb, object state)
    2:  {
    3:      var qry = (from w in ctx.WorkUnits
    4:                  where w.Progress < 100
    5:                  select w).AsTableServiceQuery<WorkUnit>();
    6:      return qry.BeginExecuteSegmented(cb, qry);
    7:  }

The number of in-progress units is bounded by the number of worker role instances (a maximum of 19 for typical Azure accounts) so no paging logic is implemented here.  The query is set up in Lines 3 - 5, just as it was for the synchronous implementation discussed in the previous post.   The only difference here is casting the query to a CloudTableQuery via the AsTableServiceQuery extension method.  Doing so provides access to additional methods, including BeginExecuteSegmented which fires off the actual asynchronous query to Azure table storage.


End Retrieval Event Handler Implementation for In-Progress Work Units

    1:  protected void EndRetrieveInProgressUnits(IAsyncResult ar)
    2:  {
    3:      var qry = ar.AsyncState as CloudTableQuery<WorkUnit>;
    4:      ResultSegment<WorkUnit> segment = qry.EndExecuteSegmented(ar);
    5:   
    6:      // continue fetching all segments
    7:      pageData.InProgressList.AddRange(segment.Results);
    8:      while (segment.HasMoreResults)
    9:      {
   10:          segment = segment.GetNext();
   11:          pageData.InProgressList.AddRange(segment.Results);
   12:      }
   13:  }

When the results from the BeginExecuteSegmented invocation are ready, the end retrieval handler kicks in.  In Line 3, the original query is reconstituted from the state object passed to BeginExecuteSegmented and the first segment of results is accessed in Line 4.

As mentioned earlier, each ResultSegment includes an IEnumerable<T> collection of Results.  Those results though may not be complete.  Due to the 1000-entity limitation and internal partitioning of data, additional requests – via GetNext – may be required to return all of the data.  The boolean flag, HasMoreResults, indicates whether additional invocations are indeed required.  Lines 7 - 10 above handle this iteration, accumulating all of the results into a collection stored as part of the session state in pageData.

A ResultSegment also includes a ContinuationToken property marking the current position in the partition and row space; that value is important for paging scenarios as we’ll see next.

Begin Retrieval Event Handler Implementation for Completed Work Units

    1:  protected IAsyncResult BeginRetrieveCompletedUnits(object sender, EventArgs e, 
                                                          AsyncCallback cb, object state)
    2:  {
    3:      // select enough rows to fill a page of the GridView
    4:      // GridView.PageSize < 1000 or UI paradigm will fail
    5:      Int32 maxRows = GridViewCompleted.PageSize;
    6:   
    7:      // add one if first page, to force a page 2 indicator
    8:      if (pageData.ContinuationToken == null)
    9:          maxRows++;
   10:              
   11:      // set up query
   12:      var qry = (from w in ctx.WorkUnits
   13:                  where w.Progress == 100
   14:                  select w).Take(maxRows).AsTableServiceQuery<WorkUnit>();
   15:   
   16:      // execute the query (with continuation token, if present)
   17:      if (pageData.ContinuationToken == null)
   18:          return qry.BeginExecuteSegmented(cb, qry);
   19:      else
   20:          return qry.BeginExecuteSegmented(pageData.ContinuationToken, cb, qry);
   21:  }

Much of the code above is similar to that of the synchronous implementation described in the previous post.   A CloudTableQuery is constructed in Lines 12 - 14, and then executed conditionally in Lines 17 - 20

Since this implementation accommodates paginating the display of completed work units, each page of GridView data corresponds to one invocation of BeginRetrieveCompletedUnits.  If it’s the first page, it’s a fresh query and there’s no continuation token, so the statement on Line 18 is executed.  For subsequent pages, the query needs to return results beginning where the previous page left off.  That’s where the continuation token comes in, so it’s passed in as a parameter in Line 20.


End Retrieval Event Handler implementation for Completed Work Units

    1:  protected void EndRetrieveCompletedUnits(IAsyncResult ar)
    2:  {
    3:      var qry = ar.AsyncState as CloudTableQuery<WorkUnit>;
    4:      ResultSegment<WorkUnit> segment = qry.EndExecuteSegmented(ar);
    5:   
    6:      // continue fetching segments to complete page
    7:      pageData.CompletedList.AddRange(
    8:          from wu in segment.Results
    9:          select new CompletedWorkUnit(wu)
   10:      );
   11:   
   12:      while (segment.HasMoreResults)
   13:      {
   14:          segment = segment.GetNext();
   15:          pageData.CompletedList.AddRange(
   16:              from wu in segment.Results
   17:              select new CompletedWorkUnit(wu)
   18:          );
   19:      }
   20:   
   21:      // set continuation token for next page request
   22:      pageData.ContinuationToken = segment.ContinuationToken;
   23:      pageData.QueryResultsComplete = pageData.ContinuationToken == null;
   24:  }

The pattern for retrieving result segments is identical to what we just saw for in-progress units: cycle through invocations of GetNext until HasMoreResults is set to false. and add the results to the in-memory collection pageData.CompletedList.    To keep track of where we are for subsequent page requests, the ContinuationToken is saved to session state in Line 22, and in Line 23 a flag is set to indicate that all results can now be served from the session cache.

Keep in mind there’s no transaction isolation in Azure table storage.  As you’re paging through data and new records have been added, you’ll see those ‘new’ records as you loop through the result segments, even if that data was not there when you initially issued the query.  Likewise, in the paging implementation described here, if you back up through pages, only the session cache is consulted, so data that was added since the page was first retrieved will not be seen until the entire web page is refreshed.


Timeout Event Handler implementation

One key benefit of using PageAsyncTask is being able to specify what happens if the tasks don’t complete within a reasonable amount of time – that time being expressed in the AsyncTimeout property of the @ Page directive, or defaulting to 45 second if not explicitly set. 

I opted for a pretty simplistic, but far from subtle, implementation here – the entire page background will be a bright red if the page generation is not completed within the allotted time.  You’d probably want something a bit less garish, but I went this route to keep non-essential changes to the original status.aspx page to a minimum.

 public void OnTimeout(IAsyncResult ar)
{
    if (!this.ClientScript.IsClientScriptBlockRegistered(this.GetType(), "changeColor"))
        ClientScript.RegisterClientScriptBlock(this.GetType(), "changeColor",
            "<script>document.body.style.backgroundColor = \"#FF0000\"</script>");
}

 

Wrapping it up

I’ve bundled the complete code for the implementation of the asynchronous page retrieval as an attachment to this blog post.  I also wanted to provide a few additional resources that I found invaluable in understanding how asynchronous paging in general works in ASP.NET:

Asynchronous Pages in ASP.NET 2.0, Jeff Prosise
ASP.NET Thread Usage on IIS 7.0 and 6.0, Thomas Marquardt
Performing Asynchronous Work, or Tasks, in ASP.NET Applications, Thomas Marquardt
Multi-threading in ASP.NET, Robert Williams
Improve scalability in ASP.NET MVC using Asynchronous requests, Steve Sanderson

For the next post, I’ll be moving on (finally!) to the Web Role implementation in Azure@home.

StatusPagingAsync.zip