SSL and System Time

A few days ago I decided to upgrade my home machine from 1 GB of RAM to 2 GB. I've been running Vista at home since last summer and it occasionally gets cranky when it runs out of memory. After the usual problems of fiddling with hardware, everything seemed to be working. Except, Windows Update couldn't find any updates and Outlook couldn't connect to my mailbox. Of course, there wasn't any explanation of what was wrong; they just seemed to have stopped working. I thought that it might be a network problem but I could go to websites without any problem.

This lasted for about 10 minutes while I was trying to find a solution, until I went to a website that required SSL. That was when I was surprised to see that the website's certificate was expired, and enlightened when I looked at the details and saw that the certificate was expired because it hadn't been issued yet. The problem wasn't with Windows Update, Outlook, or the website at all, of course. At some point during the hardware upgrade, the onboard clock had been reset to the factory default time of several years ago.

This got me thinking about how fragile distributed systems are when clocks are involved. There is an implicit assumption in many systems that network synchronized time will take care of the problem, but clearly synchronization doesn't happen fast enough to cover for certain kinds of failure. These failures make it a bad assumption that the clock time will be reasonably accurate in between the interval of synchronization.

Ten years ago I was reading a lot about clockless and unsynchronized systems that only required a steady flow of local time and not a globally coordinated time. This approach seems to have fallen out of favor with ubiquitous Internet connectivity and network time servers. It would be interesting to get an idea of how vulnerable distributed systems are to inadvertent or malicious clock skew on the client and server. I know that many of the common transfer protocols absolutely cannot deal with more than a few seconds of skew and even local behavior like timeouts can be affected by clock changes. Application behavior is also very dependent on accurate timestamps for processing messages. However, I don't think that people pay a lot of attention to this potential weak point in their operations.