Time Series Reporting Stored Procedure - part 2 of 3

This post continues part 1 of the series.  In this part of the series, I will walk through the code of the stored procedure and explain what it does and why - and maybe expound on what could be as well.

First, when creating an Analysis Services stored procedure, you generally need to add a reference to AdomdServer.  This shows up in the references section of your project as "msmgdsrv" (which I supposed stands for Microsoft managed server - as if!) and you add a using clause that looks like this (you will also want a clause for System.Data while you're at it):

using Microsoft.AnalysisServices.AdomdServer;

The next thing you want to do is to mark the code "SafeForPrepare."  This indicates that the code will not make any changes if executed during a prepare statement.  Usually, during a prepare you validate the parameters and return an empty result set, whatever that means for your function.  For our function it means an empty data table.  In any case, the header for our function with the SafeForPrepare property set looks like this:

[SafeToPrepare(true)]
public DataTable TimeSeriesReportDMV(string ModelName,
string ColumnName,
string SeriesName,
int NumHistoricalPoints, int NumPredictions)

Now let's go through each phase of the code and see how things work and why they are there - I'll leave out bits and pieces, but you can get all of that from the attached TSSprocs.cs file.

Validation

In the validation section I needed to validate that all the parameters were semantically correct based on the state of the server at the time.  For instance, if the query were for model X, I not only needed to validate that model X existed, but also that model X was a time series model that didn't have nested tables, for instance.

Model Validation

Here I validate that the model exists and is the right type.  Notice that access to the object model in server ADOMD.Net is through the Context object, which is globally accessible to all stored procedures.  The Context object contains all information about the connection, session, and database in which the stored procedure is running.

MiningModel model = Context.MiningModels.Find(ModelName);
if (model == null)
throw new SystemException("Model not found");
if (model.Algorithm != "Microsoft_Time_Series")
throw new SystemException("Specified model is not time series");

Predictable Column Validation

Here I validate that the column chosen for forecasting is actually a predictable column in the model.

MiningModelColumn PredictableColumn = model.Columns.Find(ColumnName);
if (PredictableColumn == null)
     throw new SystemException("Column not found");
if (!PredictableColumn.IsPredictable)
     throw new SystemException("Specified column not predictable");

Find Key Columns

Here I loop through the columns to find the key columns of the model by inspecting the columns content field.  I also check to ensure that the model doesn't have nested tables by checking if the Key Time column has a parent.

MiningModelColumn KeyColumn = null;
foreach (MiningModelColumn column in model.Columns)
{
if (column.Content == "Key Time")
{
KeyTimeColumnName = column.Name;
if (column.ContainingColumn != null)
throw new SystemException("Function does not support time series with nested tables");
}
if (column.Content == "Key")
{
KeyColumn = column;
KeyColumnName = column.Name;
}
}
if (KeyTimeColumnName == "") // This could only happen with malformed time series models, which shouldn't be possible
throw new SystemException("Something really bad happened");

Validate Series Name

Here I use the key columns value collection to see if I can find the series that was specified.  Of course, if the model is of the wrong type and does not have a key column, the code will throw a null value exception - probably should have checked for that as well to give a better error message.  Also, I probably should have checked that there isn't a key column if the series name was left blank - that could produce some odd results as well.

if (SeriesName != "")
{
count = 0;
foreach (MiningValue value in KeyColumn.Values)
{
if (value.ToString() == SeriesName)
count++;
}
if (count != 1)
throw new SystemException("Invalid series name");
}

Validate Counts

The last step of the validation is to simply ensure that there have positive numbers for the integer inputs.  Since I used the convention that "NumHistoricalPoints=0" implies that all historical data should be returned, 0 is allowed as well.

if (NumPredictions < 1 || NumHistoricalPoints < 0)
throw new SystemException("Negative count");

Create Result Set

Next I create a datatable and add the columns required to hold the results.  In this example, I'm only returning values, but I could have easily added columns to hold additional statistical information such as the standard deviation.  At this point, I also check the context to see if the stored procedure was called during a prepare statement.  If so, I return the empty datatable.

DataTable Result = new DataTable(SeriesName);
Result.Columns.Add("Ordinal", typeof(int));
Result.Columns.Add(ColumnName, typeof(double));
Result.Columns.Add("Predicted " + ColumnName, typeof(double));

// Return empty result set if preparing
if (Context.ExecuteForPrepare)
return Result;

Populate Historical Data

The steps involved in populating the historical data are to generate the query, execute the query and copy the results to the result set datatable, and then do some fixups.

Generating the historical query

Getting the historical data proved to be a little tricky.  Normally this is a simple .CASES query, but in this case we wanted to limit the amount of historical data returned, so I had to be a little creative.  It's easy enough to do a TOP ... ORDER BY query to get only the last n rows, but then they com back in reverse order.  To get the correct number in the correct order, I did another ORDER BY using the result of the first query as the source.  Of course, I had to create different queries depending on the structure of the model - i.e. if a series name was present.  The code builds up the query piece by piece because it was easier for me to conceive of it this way.  The completed query in its most complicated form looks like this:

SELECT * FROM
(SELECT TOP 25 * FROM
(SELECT [Quarter], [Sales] FROM [Wine Forecasting].CASES
WHERE [Series] = 'Red') AS t ORDER BY [Quarter] DESC) AS s
ORDER BY [Quarter]

Additionally I had to protect against the possibility that object names would contain "]" characters. A closing brace character would allow for the potential for SQL injection attacks. Since Analysis Services currently doesn't allow multiple statements in a single query, injection attacks are not currently possible with DMX. However, it's always better to be safe, and some future version of Analysis Services may support multiple statements. To protect against such a possibility, I used a helper function "ObjectEscape" that replaces "]" characters with "]]".

if (SeriesName != "")
{
CasesCommandText = "SELECT [" + ObjectEscape(KeyTimeColumnName)
+ "], [" + ObjectEscape(ColumnName) + "] FROM ["
+ ObjectEscape(ModelName) + "].CASES WHERE ["
+ ObjectEscape(KeyColumnName) + "] = @Series";
cmd.Parameters.Add("Series",SeriesName);
}
else
{
CasesCommandText = "SELECT [" + ObjectEscape(KeyTimeColumnName)
+ "], [" + ObjectEscape(ColumnName) + "] FROM ["
+ ObjectEscape(ModelName) + "].CASES ";
}
if (NumHistoricalPoints > 0)
{
// Change command text to retrieve the last N points
CasesCommandText = "SELECT TOP " + NumHistoricalPoints.ToString()
+ " t.* FROM (" + CasesCommandText + ") AS t ORDER BY t.["
+ ObjectEscape(KeyTimeColumnName) + "] DESC";
CasesCommandText = "SELECT * FROM (" + CasesCommandText
+ ") AS s ORDER BY [" + ObjectEscape(KeyTimeColumnName)
+ "]";

}

Executing the query and fetching the data

Executing the query was the most straightforward part of the process.  I called ExecuteReader on the command object, iterated the data that was returned and populated the appropriate column of my result table.  Notice that I did not set a connection object to the command - which for most ADO.Net programming is simply odd.  However, since I'm using server ADOMD.Net, you are already executing within the context of a query, with a session complete with a database and credentials and all that, so there's is no need for any "connection" object.  If you wanted to connect to a different Analysis Services server/instance/database from your stored procedure, you would use client ADOMD.Net.  Using both in the same procedure - confusing as hell.

In any case, in this code fragment, I zero out the "ordinal" column (I could as well put nulls) and null out the predicted value column.  I also maintain a reference to the last row for use in the next section.

object[] values = new object[3];
cmd.CommandText = CasesCommandText;
rdr = cmd.ExecuteReader();
values[0]=0;
values[2]=null;
while (rdr.Read())
{
values[1] = rdr.GetDouble(1);
lastRow = Result.Rows.Add(values);
}
rdr.Close();

Fixing up the historical data

Now that I have the historical data in my table, there's some clean up work.  First, since I'm returning ordinals instead of the actual timestamp returned by the algorithm, I need to clean that up.  I could have made it easy and started with 1, but I wanted the "time horizon" to be 0, with negative numbers indicating the past and positive numbers indicating the future, so it takes another step.  Also, to make charts look nice, I have to put the last historical value in the predicted value column so I have a complete line.

lastRow[2] = lastRow[1];
RowOrdinal = -Result.Rows.Count;
foreach (DataRow row in Result.Rows)
{
row[0] = ++RowOrdinal;
}

Executing the Prediction

Yay! we're almost done!  All that is left now is to generate the prediction query, execute it, add the results to my table and return home.

Generating the prediction query

The prediction query is much simpler than the historical query - just a PredictTimeSeries(Sales, 5) with a FLATTENED directive and optionally a WHERE clause.  The only "interesting" thing in this code is that I do reuse the same command object as before.  Rather than trying to be "clever" and remembering if there are any shared parameters I clean out the parameters collection completely on reuse.  This makes the code much more modular and easy to debug if something goes wrong.

cmd.Parameters.Clear();
if (SeriesName != "")
{
PredictionCommandText = "SELECT FLATTENED PredictTimeSeries(["
+ ObjectEscape(ColumnName) +
"], @NumPredictions ) FROM [" + ObjectEscape(ModelName)
"] WHERE [" + ObjectEscape(KeyColumnName) + "] = @Series";
cmd.Parameters.Add("Series", SeriesName);
}
else
{
PredictionCommandText = "SELECT FLATTENED PredictTimeSeries(["
+ ObjectEscape(ColumnName) + "], @NumPredictions ) FROM ["
+ ObjectEscape(ModelName) + "]";
}
cmd.Parameters.Add("NumPredictions", NumPredictions);

Executing the prediction query and going home

Again, the execution of the query is the simplest part.  In this snippet, I null out the historical values and populate the ordinal (since I know where to start) and the predicted values and then return the datatable from the stored procedure.  Analysis Services knows how to handle datareader or a datatable return values and properly remotes them back to the caller (or incorporates the results into larger queries, if applicable)

cmd.CommandText = PredictionCommandText;
rdr = cmd.ExecuteReader();
values[1] = null;
while (rdr.Read())
{
values[0] = ++RowOrdinal;
values[2] = rdr.GetDouble(1);
Result.Rows.Add(values);
}
rdr.Close();

return Result;

Installing and calling the stored procedure

Once you have compiled and built the stored procedure into an assembly you need to install it into your Analysis Services instance.  I'm not going to talk about that, it's well documented here.  You probably want to register the assembly as a server assembly so that you have access to it from all your databases, but that's up to you.  Since everything I did is internal, there are no issues with security, so the assembly can be registered in the most secure mode.

An example of how you would call the stored procedure (assuming you named the assembly the ultra-attractive name of "TSSprocs") is like this:

call TSSprocs.TimeSeriesReport("Reds Interleaved","Sales","Red",10,5)

Which provides a result that looks like this:

image

Next Up

In the third any final installment of this series (originally I thought I could do it all in just one - HAH!), I will show how to create a Reporting Services report containing a chart with the results.  It should be straightforward, but as usual, there are some gotchas.

Attached to this post is the source code for this stored procedure.  AS AN ADDED BONUS, I created a version that uses the new Analysis Services DMV's instead of the object model for validation and discovery.  Why?  No, not just because I'm a nice guy, but because that's actually how I originally wrote it.  You can use the DMV approach from anywhere, independent of whether or not you have an object model, so the code is more portable - in case you wanted to, say, use XMLA from your cell phone or something.

Anyway, till next time - enjoy!

TSSprocs.cs