How To: Query for ‘All’ events and ‘No’ event scenarios


All and Nothing

Azure Stream Analytics aspires to make low-latency, near-real-time stream computations as easy to use as typing a SQL query.  But while SQL (and its Stream Analytics dialect) is a powerful language, there are some “tricks of the trade” to quickly and easily expressing common stream-analytic concepts.

Today, we look at two common patterns: figuring out how to express “there were no events such that…” and “all of the events were such that…”

Nothing

Let’s start simple. How does one express:

Tell me when a Reboot event is followed by a Login event on the same machine within 30 seconds?

Easy — it’s a normal JOIN:

SELECT Reboot.machine, Login.user
FROM Reboot 
           JOIN Login
                   ON Reboot.machine = Login.machine
                   AND DATEDIFF(s, Reboot, Login) BETWEEN 0 and 30

The JOIN picks up matching pairs of events from Reboot and Login, and nothing else.

OK, but what about

Tell me when a Reboot event is not followed by a Login event on the same machine within 30 seconds

To do this, we use a LEFT JOIN.  A LEFT JOIN also picks up all matching pairs across the two streams, but with a twist: if an event from the left-hand-side stream doesn’t match anything on the right-hand side, it produces a result that pairs that left-hand-side event with a NULL.  So, every left-hand-side event is guaranteed to appear in the output at least once — either paired with one or more matching right-hand-side rows, or with a NULL.

That’s exactly what we need — we will left-join Reboots to Logins, and pick up just the ones that didn’t match!

 SELECT Reboot.machine
 FROM Reboot
             LEFT JOIN Login
                       ON Reboot.machine = Login.machine
                       AND DATEDIFF(s, Reboot, Login) BETWEEN 0 and 30
 WHERE Login IS NULL

Easy-peasy.  Let’s look at a more complex, yet very common case — finding “consecutive” events:

Tell me when two consecutive logins to the same machine within 10 minutes of each other were by the same user.

What does “consecutive” mean?  Let’s re-phrase:

Tell me when two logins occur on the same machine within 10 minutes of each other by the same user, with no other logins  o the same machine between them

The first part (two logins by the same user within 10 minutes) is a regular join, but the second part (no other logins in between) is, as we now know, a LEFT JOIN:

SELECT
            l1.user, l1.machine,
            l1.terminalId as terminal1, l2.terminalId as terminal2
FROM Login l1
            JOIN Login l2
                     ON l1.machine = l2.machine AND l1.user = l2.user
                     AND DATEDIFF(mi, l1, l2) BETWEEN 0 and 10
 LEFT JOIN Login lmiddle
                    ON lmiddle.machine = l1.machine AND lmiddle.user = l1.user
                    AND DATEDIFF(ms, l1, lmiddle) > 0
                    AND DATEDIFF(ms, lmiddle, l2) > 0
WHERE
           lmiddle IS NULL

See what we did there? We required ‘lmiddle’ to have the same ‘machine’ and ‘user’ as the other two, and to occur after ‘l1’ but before ‘l2’.  And then we required it to not actually happen!

All

OK, there goes nothing. But what if we want it all?

Tell me when all the logins that occurred on the same machine within 5 minutes before a reboot failed.

To solve this puzzle, we simply turn this sentence around:

Tell me when none of the logins that occurred on the same machine within 5 minutes before a reboot succeeded.

That’s nothing!  And we already know how to do nothing:

SELECT Reboot.machine
FROM Reboot LEFT JOIN Login
               ON Reboot.machine = Login.machine
               AND DATEDIFF(ss, Login, Reboot) BETWEEN 0 and 300
               AND Login.failed = 0

In this way, we can easily determine when all the events that occurred near another event satisfied our condition.

But what if there is no other event, and we are just trying to see if all of the events in a given window satisfied a condition?

Tell me when all Logins that occurred within a 5-minute interval on a given machine failed

We could represent that as a LEFT JOIN, too, like nothing:

Tell me when a Login fails such that no other login on the same machine in the preceding 5 minutes succeeded.

But there is an easier way to do it is with a Sliding Window aggregation.  “Aggregation?  But there is no ‘ALL’ aggregate is there?” you might ask.  No, but the exact same semantics can be achieved using MIN or MAX:

Tell me when the minimum of the ‘failed’ property of all Login events within a 5-minute interval on a given machine is 1.

Since ‘failed’ is either 0 or 1, the only way the minimum would be 1 is if all of them are 1, and so all of them failed!

SELECT machine, count(*)
FROM Login
GROUP BY SlidingWindow(mi, 5), machine
HAVING MIN(failed) = 1

This trick works even for more complex conditions using CASE.  For instance, to accomplish:

Tell me when every Login that occurs within a 5-minute interval on a given machine took more than 2 seconds to complete or failed.

Would look like this:

SELECT machine, count(*)
FROM Login
GROUP BY SlidingWindow(mi, 5), machine
HAVING MIN(
             CASE WHEN SecondsToComplete > 2 or failed THEN 1 ELSE 0 END
              ) = 1

Enjoy all and nothing!

Comments (0)

Skip to main content