Keeping the Domain On Time

Windows Time Service on a domain (referred to as 'Domain Synchronization' or 'Domain Sync' for short) is a huge topic. I will do my best to cover all of its aspects in this article, but some concepts won't be covered until a later date, and others still relate directly to the original RFC for NTP.

Background

As I stated in my previous post, the original reasons for developing w32time stemmed from the requirements imposed by Kerberos. In order for Kerberos to function securely, the time difference between the participating machines needs to be less than five minutes. In time, other components have come to rely on w32time, including Active Directory Replication and Windows Update. In a Windows domain, w32time needs to keep machines synchronized, and it needs to do so in a quick, efficient, and quiet manner.

Beyond NTP

The NTP protocol described in the RFC goes a long way toward designing a robust time synchronization solution. But in the end, what we are really interested in is just that: the solution. Keeping time synchronized between two machines is possible, but the solution needs to be more robust to deal with computers belonging to a domain. In particular, w32time works to answer these questions (just to name a few):

  • How do we ensure that in a large network of computers, an efficient chain of time sources is picked?
  • How do we auto-configure so that an administrator has to do a minimal amount of work to set it up?
  • How do we keep it secure and still auto-configurable?
  • How do we allow administrators to get a look at what is happening?
  • How do we alert the administrators when something goes wrong?

These questions are important, specifically in the domain scenario (as opposed to the home user scenario), since the needs of the home user and the needs of the domain user are quite different.

Designing Inside the Box

Because many components within Windows depend on w32time to keep the clock synchronized, w32time cannot take (hardly) any dependencies itself. If w32time relied on component X to do something fancy, and component X relied on Kerberos, then we would have a problem, since Kerberos relies on w32time. This would create a circular dependency and, well, that's a bad thing.

For this reason, w32time has a simplified mechanism to authenticate time syncs. More information on the authentication mechanism will be covered in a future post.

Intelligent Design 

The first issue to address is finding someone to synchronize with. Each machine needs to sync with another machine to get its time. To do this efficiently and automatically, w32time uses the domain hierarchy created with the domain itself. In the simplest frame of mind, a domain consists of the following distinct entities (aka computers):

  • Exactly one primary domain controller (or PDC-emulator)
  • Zero or more replica domain controllers (DCs)
  • Zero or more member computer (either server or workstations) 

Time Synchronization in Active Directory Hierarchy

The inner working of what a domain is and how it operates is beyond the scope of this post, but this should be enough to provide the groundwork for our discussion.

Time Source Selection

Each member of the domain follows a different set of requirements, based on its role. Lets take a look at those roles:

  • Primary Domain Controller - This machine is the authoritative time source for a domain. It will have the most accurate time available in the domain, and must sync with a DC in the parent domain (except in special cases).
  • Replica Domain Controller - This machine will act as a time source for clients and member servers in the domain. A DC can sync with the PDC of its own domain, or a DC or PDC in the parent domain.
  • Clients/Member Servers - This machine can sync with any DC or PDC of its own domain, or a DC or PDC in the parent domain

These are the default rules of where a machine can go looking for a time source. Keep in mind that there are corner cases where the rules can be bent a little. A few additional rules:

  • A machine can only look for a time source in its own domain or the parent domain. A machine will never go to a domain on a parallel level, or a "skip-level" parent domain.
  • Within a domain, a machine cannot sync with its own kind. A DC cannot sync with another DC. A client cannot sync with another client.

Also, you may have noticed that a PDC can only sync from a DC or PDC in the parent domain. Well, what if you are in the parent domain already? This is a special case, which is detailed below in the section "Special Case: The Root PDC".

The time source selection mechanism works great to enumerate the possible machines to sync from. The problem is that this usually leaves more than one machine as a possible partner. We need a way to pick the "best" one of the group, and that is what scoring does for us.

Score!

Each possible machine is given a score, based on certain criteria. Once all of the candidates have a score, w32time simply chooses the machine with the highest score. Here is what the scoring looks like:

  • 8 points if the machine is in-site
  • 4 points if the machine is "reliable"
  • 2 points if the machine is in the parent domain
  • 1 point if the machine is a PDC (or PDC emulator)

So why are these points given? Let's look at the rules individually. Machines that are in the same site as the one in question have the best chance of providing us with good time.

  • Machines that are out of site probably are disconnected physically in one way or another, and would likely introduce delay.
  • A machine that is "reliable" is pre-configured to be directly connected to a reliable time source, such as a GPS or atomic clock. These devices provide very accurate, very stable time samples. If a machine is configured to sync directly with one of these devices, a registry value can be changed to indicate that this machine will be a source of reliable time.
  • A machine higher in the forest will be closer to the root, and hence will have more accurate time than a machine in the current domain.
  • A PDC (or PDC-emulator) will be more accurate than a DC in the same domain because it is guaranteed to sync with a machine in the parent domain.

From this, we can derive a score for each machine, and then choose the machine with the highest score.

Examples

When a machine boots up, it will go looking for a time source. Depending on its role, it will be required to choose from a subset of possible machines to sync with. But how do we prioritize between the available choices? Lets take a look at the following example:

Example 1:

This example utilized the graphic above. The domains will be referred to as the "Left Domain", the "Right Domain", and the "Parent Domain".  

Computer foo has just been joined to the Left Domain as a regular client (not a DC), and it booting up for the first time on a domain. First, we need to enumerate which machines are possible as partners to sync with. We will look at each machine to see if it is a possible sync partner.

  • "Domain Controller" [Left Domain]  is a DC in the same domain, so it is a valid choice
  • "PDC Emulator" [Left Domain] is a PDC in the same domain, so it is a valid choice
  • "Domain Controller" [Parent Domain] is a DC in the parent domain, so it is a valid choice
  • "PDC Emulator" [Parent Domain] is a PDC in the parent domain, so it is a valid choice

Which machines aren't valid? Let's take a look (and find out why)

  • "Workstation" [Left Domain] is not a DC, so it is not a valid choice
  • "Server" [Left Domain] is not a DC, so it is not a valid choice
  • "Workstation" [Parent Domain] is not a DC, so it is not a valid choice
  • "Server" [Parent Domain] is not a DC, so it is not a valid choice
  • Anything in the [Right Domain] is not in the same domain, and not in the parent domain, so it is not a valid choice

Ok, sowe have our possible choices, but now we need to prioritize them to pick the best one. To do this, we will utilize the scoring system. Assuming that our entire forest is in one site, and we don't have any machines configured as "reliable":

  • "Domain Controller" [Left Domain]  Score = 8
  • "PDC Emulator" [Left Domain] Score = 8 + 1 = 9
  • "Domain Controller" [Parent Domain] Score = 8 + 2 = 10
  • "PDC Emulator" [Parent Domain] Score = 8 + 2 + 1 = 11

So there we have it. The PDC in the parent domain will be our time source. But what if the [Left Domain] was put into a separate site?

Example 2:

Assume the same scenario as the above example, except that [Left Domain] exists in a different site from the rest of the forest. We will use the same logic applied above to determine a time source.

So the [Left Domain] is in a different site. Since the first part of time source selection does not take site location into consideration, we will get the same possible machines to sync with. However, the scoring system will provide us with a different machine when all is said and done. Lets look at how the scoring would now occur:

  • "Domain Controller" [Left Domain]  Score = 8
  • "PDC Emulator" [Left Domain] Score = 8 + 1 = 9
  • "Domain Controller" [Parent Domain] Score = 2 = 2
  • "PDC Emulator" [Parent Domain] Score = 2 + 1 = 3

Because the DC and PDC in the [Parent Domain] are in a different site, they don't get the +8 to their score. This leaves us with the PDC of the current domain, with a score of 9. But what about the PDC of the [Left Domain]?

Example 3:

Assume the same scenario as Example 2, Again, we will use the same logic applied above to determine a time source.  

With the left domain in a different site from the rest of the forest, and with the PDC of the [Left Domain] being the authoritative time source for the [Left Domain], we will need to go out of site for a time source - we have no other choice. So we will look at the scores for the various eligible time sources:

  • "Domain Controller" [Parent Domain] Score = 2 = 2

  • "PDC Emulator" [Parent Domain] Score = 2 + 1 = 3

We cannot sync with any time sources in our own domain, so we only have the time sources from the [Parent Domain]. The scoring will give us the PDC of the [Parent Domain].

Plan B: Fail over

So what happens when things don't go as planned? Windows Time Service has been built to handle fail over situations from the beginning. For a generic example, assume that a client is currently synchronizing with a time source. If the time source goes away for one reason or another, the client will need to go looking for another time source.

For this reason, we use the scoring system illustrated above. The client will reassess the available time sources, score each of them, and choose the best one. Since the previous time source (which was probably the best first choice) has gone away, w32time will pick the next highest scoring time source.

Special Case: The Root PDC

The PDC for the domain at the root of the forest (the root PDC) poses a problem. Since it has no time sources that are more authoritative than it, it cannot choose a time source automatically. Thus, the administrator will need to set one up manually, or the domain will operate in a "standalone" mode. In the case of a standalone domain, the root PDC will still be the authoritative time source, but its time will come from its own clock.

Wrap Up

We have taken a look at how w32time operates in a domain at a very high level. Future posts will dive deeper into specific areas of w32time, and this will provide a groundwork for those other articles. If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Windows Vista Applications section of the Microsoft Technet forums. One way or another, questions posted there should make their way to my inbox, and I will do my very best to answer them.

References

  1. https://technet2.microsoft.com/WindowsServer/en/library/b43a025f-cce2-4c82-b3ea-3b95d482db3a1033.mspx?mfr=true