StreamInsight: Understanding dynamic query composition

Article
11/22/2010

Been tied up with PASS for the past (pun intended) couple of weeks, so it’s time to get a bit caught up on writing. One of the key technical features of StreamInsight is the ability for one query to consume the output of another, enabling the system to avoid having to process events twice, and opening up a new world of flexibility. The feature that unlocks these capabilities is Dynamic Query Composition. That being said, some of the nuances of how StreamInsight constructs queries and flows events is not always obvious. As this has resulted in me making a few sub-optimal design choices in the past Smile , I figured I’d put together a little article demonstrating the subtle nuances involved when using (or not using) DQC.

For more background on DQC, please refer to:

Composing Queries at Runtime

Take the diagram below, illustrating a common query composition pattern. Given a single input stream (represented by the inputAdapter shape in the diagram), we would like to consume the stream of events from that adapter in both queryOne and queryTwo.

At first glance, this query syntax would seem to fit the bill (note that I’m using the new IObservable support for the input adapter, but using the classic adapter syntax to create the queries bound to an output adapter, to explicitly show query creation):

Code Snippet

// Convert the data source into a temporal stream
var orderStream = ordersSimple.ToPointStream(cepApp, s =>
PointEvent.CreateInsert(s.StartTime, s),
AdvanceTimeSettings.IncreasingStartTime);
// Create a query in two parts
var queryOne = from e in orderStream where e.OrderID > 50 select e;
var queryTwo = from e in queryOne.TumblingWindow(
TimeSpan.FromDays(1), HoppingWindowOutputPolicy.ClipToWindowEnd)
select new
{
OrderCount = e.Count()
};
// Convert these templates into queries, bound to an output adapter
var tracerConfig = new TracerConfig()
{
DisplayCtiEvents = false,
SingleLine = true,
TracerKind = TracerKind.Console
};
var queryOneRun = queryOne.ToQuery(cepApp, "QueryOne", "",
typeof(TracerFactory), tracerConfig, EventShape.Point,
StreamEventOrder.FullyOrdered);
var queryTwoRun = queryTwo.ToQuery(cepApp, "QueryTwo", "",
typeof(TracerFactory), tracerConfig, EventShape.Interval,
StreamEventOrder.FullyOrdered);

However, this wouldn’t produce the desired result. Instead, the StreamInsight engine will construct the query pattern seen in the diagram below – where queryTwo composes the design of queryOne, but not the runtime stream (i.e. creates two adapter instances).

This can be verified by using the debugger to check the input source of the query. For details on how to use the debugger, see my blog post here.

Open the Event Flow Debugger, and connect to your StreamInsight instance (walkthrough on using the Event Flow Debugger here).
Navigate to your queryTwo definition, and Show Query.
As seen in the diagram below, if the input adapter is an actual input adapter (and not a published stream) you have two independent streams each with their own input adapter.

In order to have the two queries compose at runtime, we need to:

Publish the results of the first query as a published stream with a strongly typed output (published schemas cannot have anonymous types)
Consume the published stream as the input source of the second query.

Updating the query syntax to take advantage of DQC looks like:

Code Snippet

// Create a simple data source (oData feed), using a non-anonymous
// type
var ordersSimple = from o in northwind.Orders
where o.OrderDate != null && o.ShippedDate != null
select new NorthwindOrderResult
{
StartTime = (DateTime)o.OrderDate,
EndTime = (DateTime)o.ShippedDate,
OrderID = o.OrderID,
ShipRegion = o.ShipRegion,
CompanyName = o.Customer.CompanyName
};
// Convert the data source into a temporal stream
var orderStream = ordersSimple.ToPointStream(cepApp, s =>
PointEvent.CreateInsert(s.StartTime, s),
AdvanceTimeSettings.IncreasingStartTime);
// Create a query
var queryOne = from e in orderStream where e.OrderID > 50 select e;
var queryOneRun = queryOne.ToQuery(cepApp, "QueryOne", "",
typeof(TracerFactory), tracerConfig, EventShape.Point,
StreamEventOrder.FullyOrdered);
// Convert the query's output into a published stream (with a
// non-anonymous type)
var queryOneStream = queryOneRun.ToStream<NorthwindOrderResult>();
// Bind the second query to the published stream
var queryTwo = from e in queryOneStream.TumblingWindow(
TimeSpan.FromDays(1), HoppingWindowOutputPolicy.ClipToWindowEnd)
select new
{
OrderCount = e.Count()
};
// Run the second query
var queryTwoRun = queryTwo.ToQuery(cepApp, "QueryTwo", "",
typeof(TracerFactory), tracerConfig, EventShape.Interval,
StreamEventOrder.FullyOrdered);

Now that we’ve updated the query structures to use DQC, let’s take another look at the query structure in the debugger. Note that the input source for the second query is now a published stream (as it is now bound to a URI source, not a direct input adapter).

There we go – we can now feed the run-time results of one query into another query using Dynamic Query Composition.

StreamInsight: Understanding dynamic query composition

Additional resources