One in a million redux

I mentioned my “one in a million is next Tuesday” to my wife the other day, and she asked me “So did you include the bit about the PCs clocks?”

And it hit me that I hadn’t.  Doh!  So here it is.  It’s kinda fascinating actually.

Time on a PC is kept via counting the number of clock interrupts that have occurred.  Every PC contains a crystal that operates a clock chip that interrupts the CPU approximately every 10 milliseconds. So NT increments the system time by 10 milliseconds every time it receives a hardware interrupt.

But the problem is that the crystals used internally in the system have a failure rate of as high as 100ppm – in other words, 100 times every million clock ticks, the clock chip won’t actually generate an interrupt.  For most applications, this isn’t a significant problem – instead of the system context switching every 10 milliseconds, every once in a while, the system context switches in 20 milliseconds.

But for time, this is an utter disaster.  Given a 10 millisecond timer, there are 8,640,000 clock ticks per day.  If 100 per million clock ticks are missed, then that means that the system misses 864000 clock ticks, which is about 864 seconds.  That’s over fourteen minutes per day!

Now, in practice, the amount of drift is actually much lower, but still it can be quite significant.

So how does NT fix this?  Well, back in NT 3.1, once an hour, NT would interrogate the on-board real time clock chip (the hardware that keeps your date and time up-to-date even when your computer is powered off).  If the system time differed from the real time clock chip, then it would simply reset the system time to match the time on the RTC.  Which meant that time could jump forward or backwards significantly – so it was possible for the assert to fire in the following code:

            GetFileTime(&time1);
            GetFileTime(&time2);
            ASSERT(CompareFileTime(&time1, &time2) < 0);

Clearly this was an unacceptable situation, so something had to be done to fix it.  The fix (in NT 3.5) was to change how time was accounted for in the system.  In the old system, every clock interrupt bumped the time by 10 milliseconds.  With the change, when the system measured the time from the RTC, instead of applying the new time immediately, it calculated an adjustment to the 10 millisecond amount.  If the clock was behind, each tick might count as 11 or 12 milliseconds.  If the clock was head, each tick might count for 8 or 9 milliseconds.

This is actually pretty cool (ok, I think it’s amazingly clever), but again, there can be problems.  What if you’re using the current time and some high performance counter (like QueryPerformanceCounter)?  Then the clock drift will cause your measurements to be skewed from the real time measurements.  We actually ran into this problem in the SCP project – our clock tests were showing that the clock on the SCP chips was drifting, but we couldn’t see why it was happening – it turned out that the SCP chip clock wasn’t drifting, it was the PC’s clock that was drifting.

To allow people to compensate for this drift, a new API was added: GetSystemTimeAdjustment.  The GetSystemTimeAdjustment API allows you to determine the clock interrupt frequency (that’s the lpTimeIncrement parameter), and the adjustment that’s applied to each tick (that’s the lpTimeAdjustment parameter).

Edit: Fixed the result of CompareFileTime.