Windows in StreamInsight: The Basics

Article
11/23/2010

If you start learning how to write StreamInsight queries, most likely one of the first concepts you come across are windows. In this article, we’d like to shed some light on the usage of windows, hoping to provide a better understanding for the benefits as well as constraints of using windows. And I will spare you any funny or half-funny puns regarding Microsoft Windows. On a side note, isn’t it striking how people always use the phrase “no pun intended” very intentionally?

In the event processing world, windows are used to define sets, which are then input to some set-based operation. Such operations are either

aggregations, like sum, count, min, max, or a user-defined aggregate, computing a single result for the input set, or
window-based operators, which can return more than one result event for an input set, like TopK or a user-defined operator.

Think of a StreamInsight window as a means to chop up the timeline into finite pieces, each including a number of events. The length of the window can be a fixed time span (hopping & tumbling windows), or determined by the events themselves (count & snapshot windows). Here is an example of a “tumbling window”, with the aggregate “sum” applied to one of the payload fields:

It is important to understand that the window and the set-based operations always need to be used together. The window with the set-based operation can be regarded as a single building block in a StreamInsight query. This is also reflected in the way these constructs are written in LINQ:

 var result = from window in input
                 .TumblingWindow(TimeSpan.FromMinutes(5),
                                 HoppingWindowOutputPolicy.ClipToWindowEnd)
             select new { sum = window.Sum(e => e.Value) };

This component – the window together with the set-based operation on top of it – receives a CepStream and produces a CepStream. The window extension method returns the type CepWindowStream, but there is nothing else to do with this type than to apply the set-based operation. We will discuss the specific window types in another posting.

People sometimes ask about different ways to “access” this window, since it must be “just a set of events in memory”, which they want to process somehow. Yes, it is a set of events in StreamInsight-controlled memory, and it actually can be accessed programmatically: in the form of user-defined aggregates (UDAs) or operators (UDOs). These extensibility interfaces let you write C# code against the events contained in a window and produce a single scalar aggregation result (in the case of a UDA) or one or more entire events (with a UDO). Both are implemented as subclasses of types provided by the StreamInsight API. The nice thing about these interfaces is that, even though they introduce imperative code into the StreamInsight event processing flow, the APIs encourage a declarative model: I receive a set of events (the window) and I produce some output for it. I don’t have to worry about when or in what order the UDA or UDO will be called at runtime. And since I receive the set of events as an IEnumerable in this interface, I can write good old LINQ to Objects against them!

Although it might be tempting for developers who haven’t become really comfortable with high-level, declarative programming paradigms such as LINQ to take the “back door” into a UDA/UDO, I would advise to first try expressing a query through built-in StreamInsight operators alone and harness the power of the StreamInsight runtime.

Regards,
Roman

Windows in StreamInsight: The Basics

Additional resources