Ruby and Windows Azure Local Storage

Recently I’ve been hashing out an idea with Brian on creating a sample multi-tenant application, one of the features we’ve talked about is that it should let users upload and share photos. So this morning I’ve been thinking about storage. Windows Azure Storage Services are pretty well covered for Ruby (as well as other languages: https://blog.smarx.com/posts/windows-azure-storage-libraries-in-many-languages,) but there’s another storage option for non-relational data that I think gets overlooked. I’m talking about local storage.

Local Storage

Local storage is a section of drive space allocated to a role. Unlike Windows Azure Blob storage, which has to be accessed via REST, local storage shows up as a directory off c:\ within the role. Before you get all excited about being able to do fast file IO instead of REST, there are a couple drawbacks to it that make local storage undesirable for anything other than temp file storage:

  • It isn’t shared between multiple role instances
  • It isn’t guaranteed to persist the data*

*While you can set local storage to persist when a role recycles, it does not persist when a role is moved to a different hardware node. Since you have no control over when this happens, don’t depend on data in local storage to persist. If you've set it to persist between recycles and it does, great; you lucked out and are running on the same hardware you were before the instance recycled, but don't count on this happening all the time.

Local storage is great for holding things needed by your instance while it's running, like your Ruby installation, gems, application code, or for holding temp files like uploads that haven’t been pushed over to blob storage yet. In fact your application code lives in, and is ran from, local storage. But for long term persistance of data you should use Windows Azure Storage like tables and blobs.

How to Allocate

If you happened to look through the .NET code for the deployment options mentioned in Deploying Ruby (Java, Python, and Node.js) Applications to Windows Azure, or checked properties on the web role entry in Solution Explorer, you probably noticed some entries for local storage. Here’s a screenshot of the local storage settings from the Smarx Role as an example, which allocates local storage for Git, Ruby, Python, Node.js, and your web application:

Local Storage section in WebRole properties 

You can also modify this in the Service Definition file (ServiceDefinition.csdef), which contains the following XML:

  <LocalResources>
 <LocalStorage name="Git" cleanOnRoleRecycle="true" sizeInMB="1000" />
 <LocalStorage name="Ruby" cleanOnRoleRecycle="true" sizeInMB="1000" />
 <LocalStorage name="Python" cleanOnRoleRecycle="true" sizeInMB="1000" />
 <LocalStorage name="App" cleanOnRoleRecycle="true" sizeInMB="1000" />
 <LocalStorage name="Node" cleanOnRoleRecycle="true" sizeInMB="1000" />
 </LocalResources>

Note that the total amount of local storage you can allocate is determined by the size of the VM you’re using.  You can see a chart of the VM size and the amount of space at https://msdn.microsoft.com/en-us/library/ee814754.aspx.

Unfortunately the Service Definition file is packed into the pre-built deployment packages provided by Smarx Role or AzureRunMe, so you’ll need Visual Studio and the source for these projects to change existing allocations or allocate custom stores. You also need some way to make the local storage directory visible to Ruby, as it's allocated at runtime and the directory path is slightly different for each role instance. This can be accomplished by storing the path into an environment variable, which your application can then read. We'll get to an example of this in a minute.

So why is this important?

One thing that I think gets overlooked or forgotten during application planning is the space used by transient temporary files. For example, file uploads have to go somewhere locally before you can send them over to blob storage. Most applications read the path assigned to the TEMP or TMP environment variable, and while this generally works, there's a gotcha with the default value of these two for Windows Azure; the directory they point to is limited to a max of 100MB (https://msdn.microsoft.com/en-us/library/hh134851.aspx.) You can't increase the max allocated for this default path either, but you can allocate a larger chunk of local storage and then modify the TEMP and TMP environment variables to point to this larger chunk.

Matias Woloski recently blogged about this at https://blogs.southworks.net/mwoloski/2011/08/04/not-enough-space-on-the-disk-windows-azure/ and provided an example of how to work around the 100MB limit by pointing the TEMP & TMP environment variables to a local storage path he’d allocated (tempdir is the name of the local storage allocation in his example.) You should be able to use the same approach by modifying the AzureRunMe, Smarx Role, or whatever custom .NET solution you're using to host your Ruby application.

As an example, here's how to accomplish this with the Smarx Role project (https://smarxrole.codeplex.com/):

  1. Download and open the project in Visual Studio. Double click on the WebRole entry in Solution Explorer. You'll see a dialog similar to the following:

  2. Click the Add Local Storage entry, and populate the new field with the following values:

    1. Name: TempDir
    2. Size (MB): 1000 (or whatever max size you need)
    3. Clean on role recycle: Checked
  3. Select File, and then select Save WebRole to save your changes.

  4. Back in Solution Explorer, find Program.cs and double click to open it. Find the line that contains the string "public static SyncAndRun Create(string[] args)".  Within this method, find the line that states "if (RoleEnvironment.IsAvailable)" and add the following three lines immediately after the opening brace following the if statement.

     string customTempLocalResourcePath = RoleEnvironment.GetLocalResource("TempDir").RootPath;
     Environment.SetEnvironmentVariable("TMP", customTempLocalResourcePath);
     Environment.SetEnvironmentVariable("TEMP", customTempLocalResourcePath);
    

    The if statement should now appear as follows:

     if (RoleEnvironment.IsAvailable)
     {
     string customTempLocalResourcePath = RoleEnvironment.GetLocalResource("TempDir").RootPath;
     Environment.SetEnvironmentVariable("TMP", customTempLocalResourcePath);
     Environment.SetEnvironmentVariable("TEMP", customTempLocalResourcePath);
    
     paths.Add(Path.Combine(RoleEnvironment.GetLocalResource("Node").RootPath, "bin"));
     paths.Add(RoleEnvironment.GetLocalResource("Python").RootPath);
     paths.Add(Path.Combine(RoleEnvironment.GetLocalResource("Python").RootPath, "scripts"));
     paths.Add(Path.Combine(
     Directory.EnumerateDirectories(RoleEnvironment.GetLocalResource("Ruby").RootPath, "ruby-*").First(),
     "bin"));
     Environment.SetEnvironmentVariable("HOME", Path.Combine(RoleEnvironment.GetLocalResource("Ruby").RootPath, "home"));
     localPath = RoleEnvironment.GetLocalResource("App").RootPath;
     gitUrl = RoleEnvironment.GetConfigurationSettingValue("GitUrl");
     }
    
  5. Select file, and then select Save Program.cs.

  6. Build the solution, and then right click the SmarxRole entry in Solution Explorer and select Package.  Accept the defaults and click the Package button.

  7. You will be presented with a new deployment package and ServiceConfiguration.cscfg.

Once you've uploaded the new deployment package, your web role will now be using the TempDir local storage that you allocated instead of the default TEMP/TMP path.  You can use the following Sinatra web site to retrieve the TEMP & TMP paths to verify that they are now pointing to the TempDir local storage path:

 get '/' do
 "Temp= #{ENV['TEMP']}</br>Tmp= #{ENV['TMP']}"
end

If you are using the default TEMP/TMP path, the return value should contain a path that looks like:

 C:\Resources\temp\b807eb0fa61741d9b2e77d2d686fce98.WebRole\RoleTemp\

Once you've switched to the TempLib local storage, the path will look something like this:

 C:\Resources\directory\b807eb0fa61741d9b2e77d2d686fce98.WebRole.TempDir\

* The GUID value in the paths above will be different for each instance of your role.

Summary

While there's a lot of information on using Windows Azure Storage Services like blobs and tables from a Ruby application, don't forget about local storage. It's very useful for per-instance, runtime data, and it's probably where your data is going when you accept file uploads. If you rely on the TEMP & TMP, be sure to clean up your temp files after you're done with them, and it's probably a good idea to estimate how much temp storage space you'll need in a worst case scenario. If you're going to be anywhere near the 100MB limit of the default TEMP & TMP path, consider allocating local storage and remapping the TEMP & TMP environment variables to it.