How to debug your ASA job, step by step


In a recent series of blog posts, we have gone over how diagnostic information is surfaced in Azure Stream Analytics, as well as the potential types of issues a job could encounter.  This posts brings it all together with a step by step flow for debugging issues with ASA jobs.

  1. First when I create and define the job, I verify if the Inputs and Outputs connectivity looks fine by clicking on the “Test Connection” button for each of the Inputs, Outputs.
  2. I will use the “Sample Data” button for each of the Inputs and download the Input sample data.
  3. I inspect the sample data to make sure I understand the shape of the data; the schema, and the data types.
  4. Now, on the Query tab, I will use the “Test” button to test my query and provide the sample data downloaded to test the query. This will give either an error or output if everything looks fine.
  5. I build the query progressively from simple select statement to more complex aggregates. Using WITH clause to build up the query logic, step by step.
  6. If all these steps worked fine, then I will go to Configure Tab and make sure event time related policies for the job are set as I need it.
    1. Note that this policy is not applied when you use “Test” button to test the query. This is a difference between testing in browser verses running the job for real.
    2. Once this is all done, I will start my Job to make sure it is working functionally with low load.
    3. While all this is going on, for every operation we did above, you will see operations log entries to help you with its status on whether it succeeded or failed and reason for failure.
    4. Once the job status changes to “Running”, within few seconds to a minute, you will start seeing output in the Sink data-source.
    5. If you do not see any output even after a couple of minutes, then try the following:
      1. Look at the Monitoring Metrics on Monitor tab. The metrics here are delayed by about couple of minutes as they are aggregated values over last minute+.
      2. Look at Input Events, Runtime Errors, Data Conversion Errors.
        1. If Input Events >0, conclude ASA job is able to read data. If not, then the problems may be
          1. If Timestamp By is used, make sure the events have timestamps greater than the start time.
          2. Look at the data source and see if it has valid data for this Job.
          3. Check if the Data serialization format, Encoding are as expected.
          4. If using Event Hub, the Body of the Message may be Null.
          5. For Event Hub inputs, you can use Service Bus Explorer to view the raw events to make sure they are expected
        2. If Data Conversion Errors > 0 and climbing, that means:
          1. May not be able to deserialize the events.
          2. Events schema may not match the defined/expected
            schema of the events.
          3. DataType of some of the fields in the Event may
            not be what is expected.
        3. If Runtime Errors > 0, that means ASA Job is able to receive the data but getting errors while processing the query. You can go to Ops logs and filter on “Failed” status to find all these errors
        4. If InputEvents > 0 and OutputEvents = 0, that means one of the following:
          1. Query processing resulted in zero output events.
          2. Events or its fields may be malformed, so resulted in zero output after query processing.
          3. Unable to push data to Output Sink for various reasons
        5. In all these error cases, you will see Operations Log messages that explain what is happening, except for the cases the query logic filtered out all events.
  7. In some cases, as this is a stream processing system, if every event processing generates an error, we log the first 3 error message of the same type within 10 minutes to Operations logs and then suppress the rest of the errors and write another message saying “Errors are happening too rapidly and so suppressing those errors….”. If you see one of these, try to look for the real errors before this message and see details of the message.
  8. When you do not see output going to specific output type used but rest of the things seem fine, you may want use Multiple Outputs feature OR redirect the output to different output type that is less complex (such as Azure Blobs) and see if the output shows up there.
  9. After all these, if you are not able to figure out what is going on, then go to operations log, select one of the latest entries and click Details button at the bottom of the screen and copy all the details on that page and use that info to supply to Microsoft Support.
  10. Once your job is working functionally, you should look at configuring one more knob for scalability.
    1. Go to the Scale tab and set the right amount of Scale Units that I will need. ASA uses processing engine that optimizes for speed/Latency and throughput and so it uses more memory to make the processing faster. Given that, I would give enough buffer and set the Streaming Units count more than what is needed. In general, start with 6 Streaming Units for queries not using Partition By, and figure out what is the sweet-spot and reduce it to that number later. SU %Utilization metric can help you in this case.
    2. More about Scalability and configuring your job to scale can be found here: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-scale-jobs/
Comments (0)

Skip to main content