Querying Azure Perf Counter Data with LINQ

For a recent project I needed to collect performance counter data for the Web Roles that I had running.  There are various blog and forum posts and some documentation on how to accomplish this, but I was still getting a little caught up on how to get the data from the table in a useable form without a lot of brute force programming.  I finally decided to just quit trying to find an answer from someone else and go get it myself.  I will walk through my journey learning about the collection of Performance Countes in Azure and show how to implement the needed classes and use them to retreive and use data from the Azure Storage table that holds the performance counter data.

Of note, when I was first dealing with the creation of the performance counters one of the things that I got caught up trying to understand was the magic behind the creation of the Azure Storage Table as a landing place for the performance counter data that I was collecting.  I guess I never expect anything for free, but in this case that is exactly what I received since the framework creates the table for you as part of the cloud services API when you configure the performance counters.  Starting with Development Storage is always the best practice both from user experience during debugging and a cost control perspective.  Once I ran the project locally I could use the Server Explorer to browse my local storage and can see that it had created a table in Storage in which it could start placing the performance counter data.

I note the table named WADPerformanceCountersTable as I’ll need to know that as I look under the hood.  What I need to do is get to the data that is stored in that table.  My goal is to use the StorageClient to access the data stored in that table.  I’d like to do this in a programmer friendly way by using an object to represent that data and LINQ to access it.  To that end I open up SQL Management Studio and open up the Development Storage database, because I’d like to see what data is stored in the table.  The first thing I do is find the row that defines the table in the TableContainer table:

I see the entry for WADPerformanceCountersTable and the SchemaXml.  If I click the hyperlink presented to me in query results window and get the schema for what I’d get back from queries to that table:

If I scroll to the right in that window I can find the type of each of the columns that would be presented as the table.  Just a little more investigation so I feel more complete in my understanding.  I take a look at the TableRow table and select a random row out of it that is for the TableName = ‘WADPerformanceCountersTable’.  Pulling out the data field I find an entry like this one:

<Properties>

  <EventTickCount>634117372552460000</EventTickCount>

  <DeploymentId>deployment(62)</DeploymentId>

  <Role>WebRole1</Role>

  <RoleInstance>deployment(62).TestDeployViaREST.WebRole1.0</RoleInstance>

  <CounterName>\Processor(_Total)\% Processor Time</CounterName>

  <CounterValue>94.50037</CounterValue>

</Properties>

With that information in mind and a diet soda in hand I add a new C# code file to a helper library that I’m building.  Next, I create two classes to help me get to the data in a way that will be easy to retrieve and consume.  Intellisense, rather the lack thereof, reminds me that I need to add a reference to System.Data.Services.Client.  The first one to declare is the entity that represents the properties that make up the schema for the table.  I do this by inheriting from TableServiceEntity and defining the appropriate types for the data in the table.  As I mentioned earlier, the types are found in the schema for the table that is found in the TableContainer table.  I’m sure that there is documentation somewhere as to the schema for WADPerformanceCountersTable, but I had already found it so I just went with that.  Truncating some of the text in each of the elements so that it is easier to see the type, here is the schema xml that is the same as above:

<Properties>

  <PartitionKey xmlns="https://schemas.microsoft.com/ado/2007/08/dataservices" />

  <RowKey xmlns="https://schemas.microsoft.com/ado/2007/08/dataservices" />

  <EventTickCount … p2:type="Edm.Int64" />

  <DeploymentId … p2:type="Edm.String" />

  <Role … p2:type="Edm.String" />

  <RoleInstance … p2:type="Edm.String" />

  <CounterName … p2:type="Edm.String" />

  <CounterValue … p2:type="Edm.Double" />

</Properties>

Using this information I copy the names and types of the properties to define my entity class for the data as:

public class PerformanceData: Microsoft.WindowsAzure.StorageClient.TableServiceEntity

{

   public Int64 EventTickCount { get; set; }

   public string DeploymentId { get; set; }

   public string Role { get; set; }

   public string RoleInstance { get; set; }

   public string CounterName { get; set; }

   public double CounterValue { get; set; }

}

To go along with the entity I’ll need a TableServiceContext object to fetch the entities for me. I define that as:

public class PerformanceDataContext: TableServiceContext

{

   public IQueryable<PerformanceData> PerfData

   {

   get{return this.CreateQuery<PerformanceData>("WADPerformanceCountersTable");}

   }

   public PerformanceDataContext(string baseAddress, StorageCredentials credentials)

            : base(baseAddress, credentials)

    {}

}

My class is declared as PerformanceDataContext. In the getter for the PerfData property I return a DataServiceQuery object using the CreateQuery, templated from my entity type (PerformanceData), and passing the name of the table in storage (WADPerformanceCountersTable) that I wish to query. Now I’ve got the code in place that I need in order to get to the performance counter data. The next step is to actually get the data and subsequently perform some discrete test against it in order to determine what action should be taken. To this end I add a method to the WorkerRole class that I’m using and name it TestPerfData(). At the top of the method I create all of the objects that I’ll need to write a little bit of LINQ and get back the data I want.

var account = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");

var context = new PerformanceDataContext(account.TableEndpoint.ToString(), account.Credentials);

var data = context.PerfData;

The account is using the configured settings and for the moment it is point at Development Storage. The second line of code is where the previous bit of coding starts to come into the picture. I instantiate the context class that I declared earlier and pass to it the endpoint and credentials that are available via the account object. I need the IQueryable object that is returned from the getter for PerfData as that object is the target for my initial LINQ statement. Here I run into a little trouble as it seems that the LINQ processor DataServiceQuery object doesn’t quite handle some of the valid operations that I need for this part of the code (thanks to David Browne for figuring that out). My intention is to test the average CPU utilization for the records available and based on that value scale the service up or down by increasing or decreasing the instance count in the Web Role configuration. To that end I would write something like:

var AvgCPU = from d in data

  where d.CounterName == @"\Processor(_Total)\% Processor Time"

  select d.CounterValue.Average();

However, as I mentiond before, this client object doesn’t seem to handle this and forces me to go a little more brute force and first materialize the results of a general selection into a list object. Once that is done I perform the operations to filter and average the value. Thus, my single statement turns into two:

List<PerformanceData> selectedData = (from d in data

where d.CounterName == @"\Processor(_Total)\% Processor Time"

select d).ToList<PerformanceData>();

double AvgCPU = (from d in selectedData

where d.CounterName == @"\Processor(_Total)\% Processor Time"

select d.CounterValue).Average();

The first query materializes all of the data into a list and in the second statement I perform the aggregate function. The where clause is redundant and only needed in one of the two statements, but I left it in for my own visual clarity. At the point that the DataClient can properly handle the aggregate function only one statement will be needed (at least one without a bit of ugly nesting).

That is it!