Counting ‘Live’ Things


A somewhat common scenario in stream analytics is the need to count how many things are still “outstanding” at a point in time.  For instance, imagine that we are getting  session start/stop events with the following schema:

Timestamp

SessionID

IsStart

2pm

Abc

1

3pm

Xyz

1

3:30pm

Xyz

0

4pm

Abc

0

 

Stream Analytics QL is good at counting events (within windows).  For instance, it would be quite trivial to count how many sessions started in any given 2-hour window:

SELECT
    COUNT(*) as NumberOfStarts
FROM
    Input
WHERE
    IsStart = 1
GROUP BY
    TumblingWindow(hour, 2)

What if you wanted to count how many sessions started and how many stopped within each 2-hour window?  Well, you can no longer use the ‘WHERE’ clause, but a nice trick is to use a conditional SUM — after all, a COUNT is nothing more than a SUM where each matching row contributes 1 to the total:

SELECT
    SUM(CASE WHEN IsStart = 1 THEN 1 ELSE 0 END) as NumberOfStarts,
    SUM(CASE WHEN IsStart = 0 THEN 1 ELSE 0 END) as NumberOfStops
FROM
    Input
GROUP BY
    TumblingWindow(hour, 2)

That was easy.  But what if I wanted to figure out, at any point of time, how many sessions were actually running at that point in time?  In other words, how many sessions have already started but not yet ended? Determining this from a stream of start/stop events is not as easy.  First of all, there are a couple of problems with our question itself.

  1. How often do we want output to be produced?  The answer could be “every time a session starts or stops”, or it could be something like “every 5 minutes”.  Let’s go with the latter for now.
  2. How far back do we need to look?  After all, if there is no limit on how long sessions can run, then a session might have started last year, and still be going now!  Do we really need to search all data since the beginning of time to produce a correct result?!

    To address this issue, let’s give ourselves an extra requirement: no session can run longer than 10 hours; after that, we should stop ‘counting’ it.

Now, our question is well-defined.  But how can we craft a query to answer it?  There is a number of ways, but one particularly nifty one was contributed by our friends and customers at Unify Square (thanks guys!).

Any seasoned Stream Analytics query writer can tell that “output every 5 minutes, but look back over 10 hours” sounds like a HoppingWindow.  This is great, but how can we tell if a session is alive at the end of the window?  Easy: it is if and only if the session started within the window, but didn’t end within the window.  To put it in other words — all of the events for this session within the window were ‘starts’.  We could write that using the SUM(CASE WHEN… END) technique above, but in this case, using MIN is even simpler

WITH ActiveSessions AS
(
SELECT
    SessionId
FROM
    Input
GROUP BY
    HoppingWindow(minute, 600, 5),
    SessionId
HAVING
    MIN(IsStart) = 1
)

OK, this produced the list of all the sessions that were active for each window.  But we wanted a count of them.  This is easy — just add a COUNT over the top of the above, but this time without GROUPing BY SessionId:

SELECT
    COUNT(*) as NumActiveSessions
FROM
    ActiveSessions
GROUP BY
    SlidingWindow(ms, 1)

You may be puzzled by the somewhat puzzling SlidingWindow grouping.  We are trying to group all of the events generated by a particular HoppingWindow.  But all such events have the same timestamp — it is the end of the hopping window in question.  So even a 1ms SlidingWindow will do.

The resulting two-step query is nice and efficient.

Finally, what could you do if, instead of a single stream with the start and stop indicators as above, you were given two streams, one with the starts and one with the stops? E.g. 

Starts

Timestamp

SessionID

2pm

abc

3pm

xyz

 

Stops

Timestamp

SessionID

3:30pm

xyz

4pm

abc

 

This is easily handled with a UNION

WITH Input AS
(
    SELECT
        SessionID, IsStart = 1
    FROM
        Start
UNION
    SELECT
        SessionID, IsStart = 0
    FROM
        Stops
)

Enjoy counting live things!


Comments (0)

Skip to main content