Troubleshooting Azure Stream Analytics jobs with SELECT INTO


Azure Stream Analytics is a fully managed service to do real time processing of data with a flexible SQL-like language. You can easily construct queries to complete complex analysis. However, it also means sometimes the system can be hard to troubleshoot when it does not run as expected. Here is one trick that can help troubleshooting Azure Stream Analytics jobs.

The SELECT INTO statement

Sometimes knowing what the data looks like in the middle of the query can be very helpful. Since inputs or steps of an Azure Stream Analytics job can be read multiple times, we can write extra SELECT INTO statements to output intermediate data into storage and inspect the correctness of the data, just like “watch variables” when debugging a program. Let’s look at an example.

Example

Here we have a simple Azure Stream Analytics job. It has one stream input, two reference data inputs and an output to Azure Table Storage.

1

This query joins data from the event hub and two reference blobs to get the name and category information:

2

This job is running fine. However, no events are being produced in the output. From the Monitoring tile we can see that input is producing data, but we don’t know which step of the JOIN caused all the events to be dropped.

3

In this situation, we can add a few extra SELECT INTO statements to “log” the intermediate JOIN results as well as the data read from the input.

Let’s first add two new “temporary outputs”. They can be any sink you like. Here we use Azure Storage as an example:

4

Then let’s rewrite the query like this:

5

 

Now start the job again and let it run for a few minutes. Then we can query temp1 and temp2 with Visual Studio Cloud Explorer:

6

7

As we can see, temp1 and temp2 both have data, and the name column is populated correctly in temp2. However, there is still no data in output:

8

Now we are almost certain that the issue is with the 2nd JOIN. Let’s download the reference data from blob and take a look:

9

Aha! The format of GUID in this reference data is different from the format of the [from] column in temp2. That’s why our data wasn’t arriving in output1 as expected. Let’s fix the data format, upload to reference blob and try again:

10

And then we got the data in output with name and category nicely populated!

11

 

Conclusion

As you can see, this trick is very helpful in troubleshooting Azure Stream Analytics jobs. Beyond the scenario we showed in the example, it can also be used to troubleshoot issues when wrong data are produced by a step. Just use the “SELECT INTO” statement and you can troubleshoot your jobs like debugging a program!

 

Comments (1)

  1. Allan Mitchell says:

    Really? This is a trick? This creates loads of artefacts that then need to be cleaned up (of course they never are making things messy). This kind of debugging was sort of OK in 1992 but this is 2016. How about allowing us to attach a debugger to the job? Do you remember the StreamInsight graph where we could see the data?
    if this is a “debug” routine then putting the “debug” output in the same job as your actual proper query means that if you want to remove the “debug” step/s then you have to stop EVERYTHING , remove the code and start the job again.

Skip to main content