Caching to and Serving Files from WebRole's Local Storage

UPDATE:   Sample solution now available on Codeplex.  It is under AzureFileCache.  It still has a lot of rough edges and needs work, but might serve as a good starting point.

 I’ve been working with Symon Communications on a Proof-of-Concept for porting some of their applications to Azure in the simplest terms that means moving the SQL data to SQL Azure and some of their web sites to a Windows Azure web and (or) worker roles.  However, keeping it this simple could really complicate maintenance of a site.  Consider that every time an update to file based content contained in the site is needed a new package must be deployed to the cloud (by the way, that deployment gets billed as 1 compute hour).  So, with that in mind let’s review the two simplest and most obvious options:

Solution

Upside

Downside

Put all content in SQL Azure

· No storage cost above the database size

· No per read/write transaction cost

· Can update content fairly easily

 

· Limited size

· Inefficient compared to serving from local disk

· Bottleneck resource for multiple instances (not factoring in web cache)

· Can’t link directly to resource from web browser

Put all content into storage

· Doesn’t suffer from the SQL Azure size constraints

· Can update content easily

· Can directly link within the web browser

· Depending on how it is implemented could introduce network latency

· Per read/write transaction cost

· Bottleneck resource at a high load

 

Keep in mind that the cost per transaction for storage is $0.000001/transaction which is pretty small, but under high throughput on the internet it could quickly add up.  So, to optimize you might do something like put thumbnails or layout graphics into SQL Azure, but other content into Azure Storage and then work with the balance of what resources belong where and how they get served from the web.  The problem still remains that this doesn’t help alleviate the challenge of predictably scaling.  If I were to cache the files kept in SQL Azure in memory then I mitigate the performance implications of fetching all content from the database, but I am still left with the problem of the content that is kept in Azure Storage.  So, I could do the same there, that is, cache that content in memory.  I definitely would have to start keeping track of memory pressure and such, but I do think that in memory cache is a good option for a number of things.  That being said, I’d prefer if I could just have the files on the local file system like I would do when traditionally hosting my web application, but I don’t want the hassle of a redeploying it everytime that I make a content update.  Fortunately, when you configure a webrole you can set aside some local storage.  It is here that I think might be the best option for locally caching files that are served frequently and are prone to change more often than a web application feature update or bug fix will be deployed.  This could be images, style sheets, video, or anything else that I might want to update outside of code versioning and (or) that I want to massively and predictably scale.  By doing this sort of caching in the local storage I get a number of benefits:

· Less latency than fetching off server

· Mass scale across the instances

· No need to redeploy/upgrade to freshen the content

· No per request for the file charge (having reduced the requests down to those needed to freshen the file system cache)

That being said, I have complicated the matter a bit, because now I will have to deploy and manage the resources and content for my apps differently as most of these types of files would live somewhere else and not in the webrole.  Additionally, I will need to figure out what the rules/policies are for determining what gets put in the webrole local storage versus remaining in Azure Storage.  One more complicating factor is that I will have to manage the lifetime of the content that is cached locally.  The last complication is that I must develop a means by which to categorize and deliver the content from the webrole.  Ideally, this caching would happen asynchronously at start up with an initial load of content that is known to be needed after which further caching would happen on a per request basis.

Following here is a quick sample of the code needed to load a local storage location of a web role with content from Azure Storage and then a subsequent example of the code needed to create an HttpModule to serve that content back as it is requested by the web browser.  For my simple example that I’ll put here I’m going to simply load the content during Application_Start.  The setup will be as follows:

· Using dev fabric and dev storage

· images are kept as blobs in a container named “icons”

· We’ll configure the webrole to have 10 MB of local storage named “IconsStorage”

The first step is to configure the webrole.  Open the properties page of the webrole by double clicking on the role in the solution explorer.  In the properties page click on the Local Storage tab of the window and click Add Local Storage.  In the page type “IconsStorage” in the Name field and type 10 in the Size field.

With that in place, open the global.asax.cs file and add as follows:

Add using statements

using Microsoft.WindowsAzure.ServiceRuntime;

using Microsoft.WindowsAzure.StorageClient;

using Microsoft.WindowsAzure;

 

Edit Application_Start method

This is the part of the code that is used to setup the storage related objects and prepare to fetch them:

//create StorageAccount object pointit to the correct endpoints and providing credentials

CloudStorageAccount StorageAccount = new CloudStorageAccount(new StorageCredentialsAccountAndKey("devstoreaccount1", "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="),

            new Uri(@"https://127.0.0.1:10000/ "),

            new Uri(@"https://127.0.0.1:10001/ "),

            new Uri(@"https://127.0.0.1:10002/ "));

//create the blob client and use it to create the container object.

CloudBlobClient BlobClient = StorageAccount.CreateCloudBlobClient();

//Note that here is where the container name is passed in order to get to the files we want

CloudBlobContainer BlobContainer = new CloudBlobContainer(StorageAccount.BlobEndpoint.ToString() + "devstoreaccount1/icons", BlobClient);

//create the options needed to get the blob list

BlobRequestOptions options = new BlobRequestOptions();

options.AccessCondition = AccessCondition.None;

options.BlobListingDetails = BlobListingDetails.All;

options.UseFlatBlobListing = true;

options.Timeout = new TimeSpan(0, 1, 0);

//retrieve the list of files/blobs in the container. There are ways to fetch by prefix, but in this instance we want them all

System.Collections.Generic.IEnumerable<IListBlobItem> Blobs = BlobContainer.ListBlobs(options);

Now we get a hold of the local storage that we configured for the webrole and then fetch the blobs and save them to the local storage:

//get local resource

LocalResource CacheFolder = RoleEnvironment.GetLocalResource("IconsStorage");

 

//iterate over the collect. grab the files and save them locally.

foreach (IListBlobItem item in Blobs)

{

//evaluating in the debugger does not seem to work well, but console.writeline is fine

Console.WriteLine(item.Uri.ToString());

Microsoft.WindowsAzure.StorageClient.CloudPageBlob pageblob = new CloudPageBlob(item.Uri.ToString());

string[] SplitUri = item.Uri.ToString().Split(new char[] { '/' });

string path = CacheFolder.RootPath;

pageblob.DownloadToFile(path + SplitUri[SplitUri.Length - 1]);

}

That is all that is needed to fetch the files from Azure Storage to the webrole local storage.  Of course, these examples are all hard-coded and for a production system this would likely be setup via some configuration for which I would likely use SQL Azure or Azure Storage setting the connection string to do the initial configuration lookup via Settings tab on the properties page of the webrole.  The next part is the implementation of an ASP.Net Module to handle the requests for resources that IIS would normally handle by simply return the file from the file system.  In my test app I added an ASP.Net Module named LocalStorageCacheModule.cs.  As in the previous sample, you’ll need to add the using statements to the top of the file.  Next, expand the IHttpModule Members and add the following code to the Init() method:

context.BeginRequest += new EventHandler(context_BeginRequest);

 

If you type this out intellisense should kick in and simplify the creation of the method.  Before we add the code to fetch the resource from the webrole’s local storage location you will need to make a decision on how to identify that the resource being requested is one that the module should handle and in which storage location the code should look.  In this example it is pretty easy, because it is hard-coded and there is only one in use.  I will use the presence of “localresource” in the URL as the indicator that the code should handle the fetch and return of the resource.  This could be done as some combination of URL and querystring values, but since we are dealing with one it is easy enough to designate that URLs coming in that look like https://127.0.0.1/localresource/[filename] will be ones that this module will handle.  Navigate to the newly created method void context_BeginRequest(object sender, EventArgs e) and add the following inside the body of the function:

//ensure that this is responding to the proper sender

if (sender is HttpApplication)

{

//cast the object in order to use it

HttpApplication context = sender as HttpApplication;

               

//look for the URL indicator that you arbitrarily decided would be associated with content

//that you have saved in the webrole's local storage. Note that you could have multiple

//storage locations and thus need to associate multiple URLs to each of their own storage

//locations.

//use lcase and look for the indicator

if (context.Request.Url.ToString().ToLower().IndexOf("localresource") > 0)

{

//need to tease out the file name to use it to fetch it from local storage

string[] SplitUrl = context.Request.Url.ToString().Split(new char[] { '/' });

string filename = SplitUrl[SplitUrl.Length - 1];

                    //get the local storage folder as we need it to get the path to the resource

                  LocalResource CacheFolder = RoleEnvironment.GetLocalResource("IconsStorage");

                    string fullpath = CacheFolder.RootPath + filename;

                    //set the content type so that it is handled properly by the browser

               context.Response.ContentType = "image/jpeg";

                    context.Response.AddHeader("content-disposition", "filename=" + filename);

                    //send the file back to the browser and end the response

                    context.Response.WriteFile(fullpath);

                    context.Response.End();

}

}

 

 And that is it; not too much code to get some local caching of resource in Azure Storage thus allowing us to scale more broadly and without additive file operation transaction charges.  As I mentioned before, there would need to be some more complexity for a real world implementation, but this sample should be easy enough to modify to add in working from a set of configuration properties.  Additionally, the same could be done to fetch resource from a database table such as SQL Azure and store it locally.  I’m considering add that code and to my working sample to get it working from configuration settings and packaging it up and placing it on codeplex, but whether I’ll get to it will remain to be seen for the present time.  I'll try to run some tests and post the results to see the difference between direct to Azure Blob Storage, this implementation, this implementation using TransmitFile, and lastly using BinaryWrite.

If you like it, Shout it!