Double-hop authentication: Why NTLM fails and Kerberos works

A common scenario in SharePoint is the need to retrieve data from back-end data sources – external databases, Web services, reports via SQL Server Reporting Services (SSRS), and data cubes via Analysis Services are some common examples. Often these data sources need to be accessed as the original user so that proper authorization rules can be applied. That is, the identity of the user accessing the SharePoint site is the identity that should be used to access the next hop – the back-end data sources. This leads us to a need for a mechanism to delegate the rights to authenticate as a given client’s identity (“delegation of authentication”). This need has become a common pain point for SharePoint users (and other IIS users) because NTLM, the default authentication method for IIS and SharePoint, can’t support delegation of authentication.

Let’s step back a moment and look at the various ways a process can relate to the user calling it. These levels are used across Windows (see for example the System.Security.Principal namespace). They are:

  • Anonymous – The process has no awareness of who’s calling it.
  • Identification – The process knows who’s calling it, but can nevertheless only act as its own identity. So the process would know the user’s name, but not be able to act as the user.
  • Impersonation – The process can take on the user’s identity and access resources using that user’s authorization. Note that once the needed resources are off the process’s own machine, the process will need to prove to the next machine that it is the original client, and the process on the next machine will again need to be able to impersonate the original user. Effectively, this means that impersonation is only relevant on the local machine of the process. To authenticate as the user to another process requires the next level -
  • Delegation – The process can identify as the user to network resources, allowing external processes to impersonate (or perhaps even further delegate) the original user.

Why NTLM fails

Unfortunately, when a client authenticates using NTLM, the front-end server cannot authenticate as the client to another (next-hop) server. This is because of how NTLM authentication works: the server sends the client a challenge, which the client combines with its password hash, computing another hash based on this combination and sending it back to the server. The only way to verify that the client’s returned value is correct and thereby authenticate it is by performing the same calculation on the server side, which requires access to the client’s real password (well, a hash of it, but for our purposes it’s the same thing). For security purposes the domain controller doesn’t hand out this password, which means the service server has to pass along the client’s authentication hash to the DC for verification. The DC then lets the front-end server know that all is okay with the client’s authentication.

(N.B. : Note the need for a round trip to the DC for authentication when using NTLM, something Kerberos doesn’t require.)

Note the bold print above. NTLM depends directly on the base password of the client for each authentication, and we can’t (or at least won’t) just pass that around to servers for their own use. This is why NTLM can’t be used to give a process the power of delegation of user identity – the process cannot authenticate to the next hop without the user’s original password, something which it can’t get. As a result, the process can only authenticate to the next tier as itself or NULL (anonymous).

There are a couple ways to improve this situation and give the server process a way to authenticate to the next hop as the original user. One way would be to give the process the actual username and password it needs – this is the approach of the Single Sign-On service (a topic for another day perhaps). Another common approach is using Kerberos instead of NTLM for Windows authentication.

Why (and how) Kerberos succeeds

What does Kerberos offer over NTLM? Well, most importantly for our discussion, it doesn’t depend on the original user password for authentication. The only time the original password is necessary is when the user first logs on to the Kerberos realm (AD domain) and authenticates to the Authentication Server (a role played by the Domain Controller). After that point, authentication to other services is based on the user possessing session-specific keys which it would only know if it originally authenticated properly to the Authentication Server. There’s not enough space to go into the specifics of Kerberos here, but if you understand how Kerberos works, this should be clicking.

To boil it down, when a client authenticates to a front-end server using Kerberos, the server doesn’t need to contact a DC for verification. The fact that the client knows the session-specific key for this service proves that it must already have proven its identity with the DC, and that’s all the server needs to know. (N.B. : No DC round trip necessary!)

So why does this make authentication for delegation easier? Well, for the server process to authenticate to the next-hop server process, it no longer needs access to the client’s real password! All it needs is the session-specific key for the session with the next server, something we can feel a little more comfortable passing along. The original client forwards this key, along with a ticket for the back-end service, to the front-end server, which can then use it for the next-hop authentication, the service now presenting itself to the next-hop service as the original client.

(Technical clarification: The client typically forwards its TGT and TGS session key to the service and the service acts as the client to further request a session ticket and key from the KDC for the next hop. This is called a forwardable TGT, and is what is used in SharePoint when delegation is enabled. The scenario described in the previous paragraph is that of a proxiable session ticket. The concepts relevant to our discussion, however – mainly the fact that passwords aren’t used – are the same, and, for those of you who might be grumbling at my jumbling of these two concepts, technically a forwardable TGT is a special case of a proxiable session ticket for the TGS.)

So to sum up, the basic nature of NTLM – using the base password directly – prevents a straightforward way to delegate authentication rights to services. Because Kerberos doesn’t require the base password for authentication, there is less vulnerability in temporarily delegating the ability to authenticate to services acting on the user’s behalf. And that is why Kerberos works for delegation and NTLM does not.