Streaming event processing usually implies low end to end latency. While ASA is designed to process incoming events with low latency, there are several factors from both the query semantics and the underlying infrastructure that can introduce latency. In this blog post, I will explain where the additional latency is from, and how to reduce it.
At the center of ASA's query semantics is temporal transformation logic, including windowed aggregates and temporal joins. In order to perform the temporal transformation correctly and efficiently, we sort all incoming events by timestamp order. When Event Hub is used as input, we also need to merge the events from all Event Hub partitions by timestamp order. The merger essentially round robin through the Event Hub partitions reading from the partition with the lowest timestamp seen so far, and performs a merge sort. So far so good.
However, when events do not arrive at all Event Hub partitions for some reason (e.g. low event rate or uneven distribution), the merge sort algorithm has to wait for the event from the empty partition, potentially indefinitely, which halts the whole event processing pipeline. To combat the problem, we do two things.
- For queries that don't use "timestamp by," we confirm with Event Hub partition it's indeed out of events to read (as opposed to network delay). If so, we move the clock from the partition forward without waiting for new events.
- For queries that use "timestamp by," in addition to #1, we use the late arrival tolerance window specified on the query's configuration tab to figure out how to move user's clock forward. For example, if the local clock is at 2pm, and the late arrival tolerance window is 10 seconds, we will move user's clock to 1:59:50pm.
As you can see, both mechanisms can introduce delay in the processing. For #1, it's the time we check with each and every Event Hub partition on their emptiness, and for #2, it's the tolerance window specified by the user.
My experiment shows with a select * query, if I send 1 event per second to Event Hub, the end to end delay is around 6 seconds.
So, how can we reduce the latency?
Remember, at the center of the delay is the merger. While ASA team continues to improve its design to reduce latency, if you can avoid the merger in the query, you have effectively removed that source of delay. To achieve that, you need to make your query a partitioned query, by adding the "partition by" clause. For example, select * from input partition by partitionid.
The partitionid here is the partitionid from Event Hub. This effectively tells ASA to not merge events from Event Hub partitions during processing. Just note, you need to configure your Event Hub output to use partitionid as the partition key as well; otherwise, the merger is applied again. The drawback of doing this is the output events are no longer guaranteed to be sorted by timestamp, because they are processed in parallel without getting merged together.
My experiment shows with this query change, if I send 1 event per second to Event Hub, the end to end delay is reduced to 2 seconds.