This blog post will explain how Stream Analytics jobs work at a high level and where to look for information for debugging your ASA job's inputs, query and output.
There are multiple areas where you may potentially see issues with ASA Jobs:
- Connecting to data sources (inputs and outputs)
- Consuming/reading and de-serializing data
- Query execution
- Writing data to output sink
Connecting to data sources (inputs and outputs)
In general all connectivity related errors are written to Operational Logs and also Output-Diagnostics as well. As shown in the below snapshots, you should check in these two places for errors.
Operations Logs Link on the Dashboard page:
In the above image, you see a few more details that may also help you in identifying some of the problems when you need details about when was the job started, when did it start processing data and when was the last output generated etc. Note that these metrics are refreshed once a minute and so sometimes it may appear as if the last output was almost a minute ago. But if you refresh the page manually, it will show the latest data for these details.
No-Output Diagnostics Information:
You will need to go to Inputs OR Outputs tab to see more details of this information.
Once you click on that resource (Input2), you will see more information such as below.
Expected behavior if you run into this issue: ASA runtime will retry and write an operational log message about that failure. Job will go to Degraded state if you run into this scenario. You will see a message in No-Output Diagnostics section and you will also see Runtime Errors metric value to go up on Monitor Dashboard.
Consuming/reading and de-serializing data
ASA currently supports couple of Input sources (Azure Blobs, Azure EventHub) and few serialization formats (JSON, CSV, AVRO). In general inputs to ASA jobs can come from devices directly. The incoming data may not be cleaned up and may not follow a pre-defined schema. When you have such data, ASA tries to translate to data schema and type needed by the query. If user defines the input with a "CREATE TABLE" statement as part of the ASA query, ASA job will drop/ignore all events that do not conform to that schema definition. If user does not use "CREATE TABLE" to define the input schema, ASA job reads the events as it is. As you can see there are many things that can go wrong here:
- The incoming data serialization format may not be understood: ASA generates operations log entry with error message related to serialization issue.
- Field type conversion issue: When Create Table is used, and a field cannot be converted to the proper type, we set the field to Null. As a result, incorrectly formatted fields may result in incorrect results. ASA increments the “Data Conversion Errors” counter when these happen.
- Missing Fields: When Create Table statement is not used in the query, missing fields are considered to be Nulls when accessed by the query. A mis-spelled field name, for example, can result in incorrect results.
In the future, we plan to surface a policy to the user asking what they want ASA job to do when it sees bad data or bad data format.
Query development: Sometimes, when you have complex queries, it may not work the first time. So, how do you debug and fix issues with your query?
The best way is to start from a simpler form of that query or part of the query, get the simple part working and then progressively add to that and build the query to what you want. That helps to narrow down the problems and fix them faster.
Query execution related issues: General issues you see here are that un-clean data may cause the query to abort and fail the job. In this case, you will see an operations log message that will help you identify the problem. Also, it is recommended to use CASE statements to write more defensive queries, check on NULL and 0 values that can cause runtime errors such as DevideByZeroException.
Examples of using CASE statement:
SELECT Ticker, CurrentPrice, CAST(EPSCurrentYear as float) EPS,
WHEN EPS<=0 THEN 0
Writing data to the output sink
Any exceptions related to writing output to Sink are categorized as Retry’able, or UserAction’able. Retry’able exceptions are transient issues that can be recovered such as network/connectivity errors. UserActionable Errors are something that requires user attention and without which job cannot proceed further. ASA retries on errors that are retryable. In some cases, it retries forever. ASA does not retry on User Actionable errors. In both cases, it emits Operations log and Diagnostics message as appropriate.
Users should set alerts on the Operational Logs so that if there was an Error/Failure, it would notify with as much information as possible so that you can act on it as needed.
Here is guidance on how to setup alerts on Operational Logs: https://code.msdn.microsoft.com/windowsazure/Receive-Email-Notifications-199e2c9a
Apart from this, you can also setup Alerts on Monitoring Metrics. Example of such alerts would look like: “If Output Events for last 15 minutes is <100 send email notification to email id: firstname.lastname@example.org”.
If your expectation is to get at least one event form this job over last 15 minutes, and when that does not happen, it will trigger the alert.