Some users have reported seeing high number of out of order events in their queries, and wonder what can be done to reduce the number. In this blog post, I will explain how events get out of order, and how to reduce them by changing the query and tweaking query configurations.
First of all, because ASA applies temporal transformation when processing the incoming events (e.g. windowed aggregates, and temporal joins), we need to sort the incoming events by timestamp order. User has the choice of which timestamp to use by using the "timestamp by" clause in the query (e.g. select * from input timestamp by time, where time is a field in the event payload).
When "timestamp by" is not present, we use Event Hub's event enqueue time by default. Because Event Hub guarantees monotonicity of the timestamp on each partition of the Event Hub, and we merge events from all partitions by timestamp order, there will be no out of order events.
When it's important for you to use sender's timestamp, so a timestamp from the event payload is chosen using "timestamp by," there can be several sources or disorderness introduced.
- Producers of the events have clock skews. This is common when producers are from different machines, so they have different clocks.
- Network delay from the producers sending the events to Event Hub.
- Clock skews between Event Hub partitions. This is also a factor because we first sort events from all Event Hub partitions by event enqueue time, and then examine the disordness.
On the configuration tab, you will find the following defaults.
Using 0 seconds as the out of order tolerance window means you assert all events are in order all the time. Given the 3 sources of disorderness, it's unlikely true. To allow ASA to correct the disorderness, you can specify a non-zero out of order tolerance window size. ASA will buffer events up to that window and reorder them using the user chosen timestamp before applying the temporal transformation. You can start with a 3 second window first, and tune the value to reduce the number of events get time adjusted. Because of the buffering, the side effect is the output is delayed by the same amount of time. As a result, you will need to tune the value to reduce the number of out of order events and keep the latency low.