Part 1: Assigning Timestamps to Events
After having used the blog mainly for announcements, we’d like to get down to the real business and start posting more technical articles that will explore many different aspects of the framework. Let’s start with time, since this is a central concept in any event processing system.
StreamInsight reasons reasons about application time – the timestamps associated with events – rather than system time – wall clock time during the execution of a StreamInsight query. Everything depends on the timestamps in a stream. StreamInsight users need to consider three important categories when it comes to these timestamps:
- assigning timestamps to events;
- advancing time in a stream, and;
- synchronizing multiple streams in time.
In this post we will discuss the first task, and we will come back to the others in subsequent posts. We must note that assigning timestamps should be resolved by the input adapter author but the adapter’s temporal behavior must be understood by the query author as well.
The author of an input adapter has to supply timestamps for their events. We are discussing the simplest case of generating point events first, where a single timestamp corresponding to the point at which the event occurred needs to be assigned. It is possible that the events come from a source that already solved this problem, e.g., log entries or stock trade messages that are usually stamped with a time label, in which case this label can simply become the start time of the event.
If the event does not have a native timestamp the adapter author needs to reason about the semantics of the event stream. There are three choices here:
- events are known to occur periodically;
- events occur in a known sequence, i.e., only their relative order matters, or;
- events are timestamped according to system time as they arrive, i.e., they happen “now”.
The first two cases are similar as far as their solution is concerned: maintain a counter and manufacture timestamps based on that counter, without involving the system clock. This guarantees monotonically increasing timestamps, but the adapter author must think of a way to synchronize the clocks of multiple input streams if this is a relevant problem.
If the users would like to think of events that occur “now”, they must use a system clock that provides timestamps. Here there are a few choices:
- Use the System.DateTimeOffset.UtcNow property.
- Build a homegrown high resolution clock.
The clock exposed via System.DateTimeOffset.UtcNow is useful because it is guaranteed to be monotonically increasing. We always then to think about our computer’s clocks as being always correct, but the reality is that they tend to drift and so computers periodically synchronize their clocks over internet with a server that has a more precise clock. This means that, when looking at a small scale there are small “hiccups” in our computer’s clocks, as they attempt to correct their drift based on the synchronization algorithms. System.DateTimeOffset.UtcNow hides this problem away from the users and does its best to smooth out the aforementioned variations. It is updated infrequently however at roughly 1 msec. or 15 msec. intervals depending on the version of Windows.
What this means is that if this clock is called in a loop it can generate (potentially many) duplicate DateTimeOffset values, even though the resolution of the data type is 10ns. So, if a monotonically increasing sequence is desired, using this clock is not a good way to obtain it. The relative order of two events with duplicate timestamps is not preserved by StreamInsight.
Where higher precision timestamps are required but events are correlated based on their relative time only, there is a simple alternative to DateTimeOffset.UtcNow:
Note that such as a solution is effective only when all events are timestamped using the same mechanism on the same machine: the Stopwatch.GetTimestamp() method returns the number of ticks counted by the underlying timer mechanism. Note that on some hardware and operating system configurations, timestamps returned by this method are calculated using the system timer rather than the higher resolution performance counters. See the Stopwatch.IsHighResolution field documentation for details.
By the very nature of interval events, they either have a known duration and the problem is reduced to the case for point events or their StartTime and EndTime are supplied by the source of data. Edge events are in the same situation: either someone else supplies the values, or the events represent a signal where the occurrence of an event means that an end edge must be supplied for the previous start. So in this case we again only need to solve the problem of generating a suitable single StartTime value.