Configuring the Time Service: Max[Pos/Neg]PhaseCorrection

In the last few months of the Windows Server 2008 development, a good friend of mine was discussing a problem they have been seeing with customers. The problem, lovingly titled as the "Large Time Jump" issue involves a machine in the domain (usually the PDC) making a large jump in time, either forward or backwards. Regardless of the direction of the jump, the results are equally catastrophic.

How it all happens

Lets look at how this can happen. Some of the causes are more likely than you think. Here is a quick list:

  • Hardware changes. It is quite common for a company to have a hardware failure for a DC, even the PDC. Assume a situation where the motherboard in the PDC fails. After the hardware is all installed, the technician boots the machine back up. As hoped, the machine looks to be running just fine. However, the part that the manufacturer shipped out is considered an "after market" part, so the BIOS isn't configured correctly. By default, the BIOS date is set to the manufacture date of the motherboard - sometime in the past. Of course, when the technician gets the new part, he sees that everything looks to be in order, but he neglects to notice that the date it wrong.
  • Bad external time source. Sometimes, the time source that you are syncing the PDC with (such as a network device, like a high-end router) can get a wild hair and jump to an invalid time.
  • Bad CMOS battery. Every once in a while, a BIOS battery in a computer will fails. After all, they don't last forever. If the PDC isn't configured to sync with an external time source, it will by default use its own internal clock, which is based on the BIOS clock. A computer cannot maintain it's time across a reboot, so it stores the current time in the BIOS. If the CMOS battery has failed, the time after the reboot will be incorrect.
  • User error. It is not outside of the realm of possibility for someone to log onto the root PDC and change the clock. It sounds unlikely, but it is still possible.

Fixing the glitch

The real problem here is that in a domain environment, domain controller completely and utterly trust the time that they get from another DC. The root PDC is a special case, but it still counts.

The solution is to do a "sanity check" on the time that any domain controller gets from anywhere. In this way, you are ensuring that if a domain controller gets out of whack, it will not spread that time to other DCs. This is done by setting the MaxPosPhaseCorrection and MaxNegPhaseCorrection values.

MaxPosPhaseCorrection and MaxNegPhaseCorrection limit the allowable offset taken from a time sample. When any instance of w32time polls another machine for the time, it will determine the offset between the time source and itself. This value is known as the "Sample Offset". Before the samples is used by the time service, it will be compared to the phase correction limits. If the sample offset is greater than the phase correction limit, then sample will be thrown out and a "TOO BIG" event will be generated. The event contains all of the information about the time sample, including who sent it. The purpose of doing this is to isolate domain controllers in the network who get into a bad time state. In this way, the other DCs will log and error about the time samples being too big rather than blindly accepting it.

Knowing your limits

The next question is: What is an acceptable limit of phase correction? After much analysis and debate, we are advising a value of 48 hours. If a domain controller receives a sample that says it is more than 48 hours off, either in the future or in the past, the domain controller will throw it out. However, every customer should evaluate their own situation to be sure.

This is advised for both limits, both forward and backwards.

A packaged solution

Here is an example of a registry entry that you can merge on-demand to apply a 48 hour limit to the phase correction:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config]
"MaxNegPhaseCorrection"=dword:0002a300
"MaxPosPhaseCorrection"=dword:0002a300

As usual, If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Directory Services section of the Microsoft Technet forums.