Can I use multiple WITH clauses?


or How many ASes to a WITH?


By far the easiest way to organize a complex Stream Analytics query is to break it up into several parts using the WITH clause.  You can test individual parts one at a time, and the query is so much easier to read!  But a quirk of SQL syntax often trips people up, and makes them think that only a single sub-query can be used, because a query can only have one WITH statement.  Lucky for all of us, this is not true!

While a query can only have one WITH, the WITH can have as many AS clauses as it wants to, separated by commas.  For instance:

WITH
NormalReadings AS
(
      SELECT
      FROM Sensor
      WHERE Reading < 100 AND Reading > 0
),
Averages AS
(
      SELECT SensorId, AVG(Reading) as AvgNormalReading
      FROM NormalReadings
      GROUP BY SensorId, TumblingWindow(minute, 1)
),
BadAverages AS
(
      SELECT *
      FROM Averages
      WHERE AvgNormalReading < 10
)
SELECT * INTO outputAlerts FROM BadAverages
SELECT * INTO outputLog FROM NormalReadings

 

While we are at it, note that in Stream Analytics the WITH clause applies to the entire job, not just the SELECT statement that immediately follows it.  That is why the

SELECT * INTO outputLog FROM NormalReadings

above is valid. 

Convenient, eh?  Check out the docs for more details.


Comments (6)

  1. Ali says:

    Thanks for interesting Article.

    Using With clauses i'm trying to achieve a Missing value substitution scenario, but having a tough time.

    The scenario is that Data is flowing through the stream, there are records which have Null values which need  to be substituted with the running average of last e.g. 10 minutes.

    At this point im trying to use the With clause to calculate the average and Case statement to substitute where i find a Null. however, i have been unable to figure out how to join the average from With clause to the main Select clause.

    Any help will be appreciated.

  2. lev_n@msn.com says:

    Hi Ali!

    I think you were on the right path!  Here is how I would do it.  Assume that the input is 'requests', and that the property named 'value' is sometimes missing.

    WITH averages AS

    (

       SELECT AVG(value) as avg

       FROM requests TIMESTAMP BY ts

       GROUP BY SlidingWindow(ss, 5)

    )

    SELECT

       requests.ts,

       value = CASE WHEN requests.value IS NULL THEN averages.avg ELSE requests.value END

    FROM averages

       JOIN requests TIMESTAMP BY ts

       ON DATEDIFF(ms, requests, averages) = 0

  3. Ali says:

    Hi Lev,

    Thanks for your reply, following is my query which i have built by taking guidance from your reply.

    WITH MV AS

    (

      Select AVG([Sensor_1]) AS [Sensor_1]

      From [input]

      GROUP BY SlidingWindow(second, 5)

    )

    SELECT [input].[ID]

    ,[input].[Result]

    ,CASE

       WHEN [input].[Sensor_1] = 0

       THEN [MV].[Sensor_1]

       ELSE [input].[Sensor_1]

    END [Sensor_1]

    ,[input].[Sensor_2]

    ,[input].[Sensor_3]

    ,[input].[Sensor_4]

    ,[input].[Sensor_5]

    FROM [MV]

    JOIN [input]

    ON DateDiff(second, [input], [MV]) = 0

    Using following data, sometimes i get duplicated records, i think the join is doing 6×2 rather than 6×1

    [{"ID":0,"Sensor_1":3030.93,"Sensor_2":2564.0,"Sensor_3":2187.7333,"Sensor_4":1411.1265,"Sensor_5":1.3602,"Result":0},

    {"ID":1,"Sensor_1":0,"Sensor_2":2465.14,"Sensor_3":2230.4222,"Sensor_4":1463.6606,"Sensor_5":0.8294,"Result":0},

    {"ID":2,"Sensor_1":2932.61,"Sensor_2":2559.94,"Sensor_3":2186.4111,"Sensor_4":1698.0172,"Sensor_5":1.5102,"Result":0},

    {"ID":3,"Sensor_1":2988.72,"Sensor_2":2479.9,"Sensor_3":2199.0333,"Sensor_4":909.7926,"Sensor_5":1.3204,"Result":0},

    {"ID":4,"Sensor_1":3032.24,"Sensor_2":2502.87,"Sensor_3":2233.3667,"Sensor_4":1326.52,"Sensor_5":1.5334,"Result":0},

    {"ID":5,"Sensor_1":2946.25,"Sensor_2":2432.84,"Sensor_3":2233.3667,"Sensor_4":1326.52,"Sensor_5":1.5334,"Result":0}]

    Any ideas where i'm going wrong?

  4. lev_n@msn.com says:

    This will happen if two of your events have the same timestamp, or precisely 5 seconds apart.  Is that what is happening, you think?  Where do you get your timestamps from?  

  5. Ali says:

    I'm not using the timestamp. For us the time at which a record reaches Stream Analytics is its timestamp.

  6. Bhushan says:

    Thanks for the article. Its useful.

    I have written query same like this one. and for my stream analytics i have two outputs defined, 1 Azure Table storage and 2. PowerBI. Whenever I write two select statements like above  –

    SELECT * INTO outputAlerts FROM BadAverages

    SELECT * INTO outputLog FROM NormalReadings

    I'm getting the following stream analytics validation error –

    Stream Analytics job has validation errors: Job will exceed the maximum amount of Event Hub Receivers. Activity Id: '175b8fcc-cacc-42b0-8a38-26fa2cb0ef42-2015-09-24 06:40:24Z'.

    Can you please help me out ?

Skip to main content