There and back again.. the journey of a bug in ADFS

Let's look at a bug fix.. end to end.

So back in November, my friend Jim Simonet had posted a question about a problem with ADFS using ADAM as the auth store and specifying that it connect via  LDAP over SSL.

He could connect to ADAM via LDP on 636, so we knew ADAM and the certificate validation\chaining process seemed OK.
With SSL it was failing and the logs show:

 

2008-10-28T17:11:41 [INFO] Client is unauthenticated. Attempting to collect credentials.

2008-10-28T17:11:41 [INFO] Requesting token for https://adfsweb.treyresearch.net:8081/claimapp/ with username adam@treyresearch.net.

2008-10-28T17:11:41 [INFO] InternalRST: target = https://adfsweb.treyresearch.net:8081/claimapp/, credtype = urn:oasis:names:tc:SAML:1.0:am:password, userhint = dam@treyresearch.net, store =

2008-10-28T17:11:41 [INFO] GetClaimsForUserNameWorker (LDAP): called for user adam@treyresearch.net

2008-10-28T17:11:41 [VERBOSE] GetClaimsForUserNameWorker (LDAP): Got COMException 18446744071562534970: The server is not operational.

2008-10-28T17:11:41 [INFO] AccountStoreCollection.InternalGetClaimsForUser: User adam@treyresearch.net logon handled non-authoritatively with LdapFailed by selected store ldap://adfsweb.treyresearch.net

2008-10-28T17:11:41 [VERBOSE] Processing FS response: policy version is d5a48817-e9fc-4ab6-b7ef-2d99642bbe6b - 208

2008-10-28T17:11:41 [VERBOSE] Creds verification: AccountStoreDisplayName = ADAM Account Store

2008-10-28T17:11:41 [VERBOSE] AccountStoreType = LdapDirectoryType

2008-10-28T17:11:41 [VERBOSE] AccountStoreTypeDisplay = AD/AM Directory

2008-10-28T17:11:41 [VERBOSE] AccountStoreUriStr = ldap://adfsweb.treyresearch.net/

2008-10-28T17:11:41 [VERBOSE] User Validation Info: ErrorCode = -2147016646

2008-10-28T17:11:41 [INFO] Token issuance request to FS failed: ValidationFailure

 

Note the highlighted section above.

Logging is always a good place to start, if you have the source code of course.

All of the web.security stuff  which includes GetClaimsForUserNameWorker  is in managed code.  I am still making my way through debugging managed code and am not entirely comfortable with it ( in other words I have not really done enough of it ),  I know folks who swear it is easier but if you ask me, doing this :

1.       !name2ee [Assembly name (including extension] [Class Full Namespace]. For example: !name2ee SyncBlkDeadLock.exe SyncBlkDeadLock.Form1. That is the class on which we want to place a breakpoint in one of its methods. The output will look like this.

2.       !dumpmt -md [MethodTable handle that we got from the previous command]. For Example: !dumpmt -md 0×00a8543c. The output will look like this.

3.       !dumpmd [MethodDesc handle that we got from the previous command]. For Example: !dumpmd 0×00a853d8. This is the handle for the method
SyncBlkDeadLock.Form1.Thread1Handler().The output will look like this.

4.       In the field “Method VA” we now have th method Virtual Address and we can set a breakpoint on that address.

Is not easier than this:

1.       bp foo!func

Maybe that's why I have not really dug into it much. But, to each their own. (  I know I can no longer avoid it either via debugging it, or writing it -- , much like I can no longer avoid PowerShell )  So, due to my lack of time, and general propensity to fall back on familiar tools and ways.. I figured that something is eventually going to call into wldap32.dll – either directly, or  via the slightly higher ADSI functions.

I tend to try to nail things at the lowest level. What I mean is, if I know some hunk of managed code is calling to ADSI, which is calling to WLDAP32, which is calling to the Network.. I may want to start at the very lowest level in order to at least get an idea of where things are coming from. Does that make sense? On we go.

I knew I didn’t need to catch it at the network layer, a little overkill there. So I decided on wldap32 – specifically wldap32!ldap_connect  , since this is where we connect.

This eventually returned 0x51 LDAP_SERVER_DOWN , but if I dug deeper it returned 0x35 LDAP_UNWILLING_TO_PERFORM

err.exe shows us ( https://www.microsoft.com/downloads/details.aspx?FamilyId=BE596899-7BB8-4208-B7FC-09E02A13696C&displaylang=en  )

 

C:\TEMP\Err>err 35

# for decimal 35 / hex 0x23 :

  BTH_ERROR_LMP_TRANSACTION_COLLISION bthdef.h

  FAT_FILE_SYSTEM bugcodes.h

# Disable or uninstall any anti-virus, disk defragmentation

<snip>

  ERROR_BAD_NETPATH winerror.h

# The network path was not found.

  LDAP_UNWILLING_TO_PERFORM winldap.h

 

So you see – the lower ldap levels set LDAP_UNWILLING_TO_PERFORM , which got set to LDAP_SERVER_DOWN which the  ADFS logs translated into "The server is not operational".  This has long been a pet peeve of mine.  If we could only consistently maintain the REAL error returned, I think it would help a lot of folks ( IMO ) . But, there are so many layers and layers of code that this is nearly impossible and is the ROI really worth it?  How could it be implemented across so many layers?

Anyway.. OK we know that 0x35h is not a good thing, so how to find where this was set?

This take s little bit of digging, but  the windbg command “pc”  (step to next call ) help a bit and just keep an eye on error returns. Narrowed it to here..

 

ChildEBP RetAddr
01c7e0e8 76f24d0c wldap32!LdapSetupSslSession
01c7e138 76f1786f wldap32!OpenLdapServer+0x700
01c7e158 76f1834c wldap32!LdapConnect+0x169
01c7e178 76dc457b wldap32!ldap_connect+0x26
01c7e19c 76dc43fb adsldpc!LdapOpen+0x1cc
01c7e1cc 76dc42fe adsldpc!LdapOpenBindWithDefaultCredentials+0x10e
01c7e634 712d29aa adsldpc!LdapOpenObject2+0x128
01c7e8b4 712d284c adsldp!GetServerBasedObject+0x18f
01c7ed0c 712d52f3 adsldp!GetObjectW+0x69
01c7ed38 76df1bbb adsldp!CLDAPNamespace::OpenDSObject+0x34
01c7ed90 01cdbc00 activeds!ADsOpenObject+0xb2
*** WARNING: Unable to verify checksum for system.directoryservices.ni.dll
01c7ee30 67170eb3 CLRStub[StubLinkStub]@1cdbc00
01c7ee8c 67166575 system_directoryservices_ni!System.DirectoryServices.Interop.UnsafeNativeMethods.ADsOpenObject(System.String, System.String, System.String, Int32, System.Guid ByRef, System.Object ByRef)+0x1f
01c7eee0 67166405 system_directoryservices_ni!System.DirectoryServices.DirectoryEntry.Bind(Boolean)+0x14d
01c7eef0 67167316 system_directoryservices_ni!System.DirectoryServices.DirectoryEntry.Bind()+0x25
01c7ef2c 04867375 system_directoryservices_ni!System.DirectoryServices.DirectoryEntry.RefreshCache(System.String[])+0x2e
Unable to load image c:\windows\assembly\gac_msil\system.web.security.singlesignon\1.0.0.0__31bf3856ad364e35\system.web.security.singlesignon.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for system.web.security.singlesignon.dll
*** ERROR: Module load completed but symbols could not be loaded for system.web.security.singlesignon.dll
01c7f240 04866d38 system_web_security_singlesignon!System.Web.Security.SingleSignOn.LdapDirectoryAccountStore.GetClaimsForUserNameWorker(System.String, System.String, System.Web.Security.SingleSignOn.TrustingRealm, System.Web.Security.SingleSignOn.ClaimContext, System.Web.Security.SingleSignOn.UserValidationInfo ByRef, Boolean ByRef, System.String ByRef)+0x5f5
01c7f2b4 048669af system_web_security_singlesignon!System.Web.Security.SingleSignOn.LdapDirectoryAccountStore.InternalGetClaimsForUser(System.Web.Security.SingleSignOn.ClientCredentialInfo, System.Web.Security.SingleSignOn.TrustingRealm, System.Web.Security.SingleSignOn.ClaimContext, System.Web.Security.SingleSignOn.UserValidationInfo ByRef, Boolean ByRef, System.String ByRef)+0x100
01c7f2d0 04865f4e system_web_security_singlesignon!System.Web.Security.SingleSignOn.AccountStoreCollection.InternalGetClaimsForUser(System.Web.Security.SingleSignOn.ClientCredentialInfo, System.Web.Security.SingleSignOn.TrustingRealm, System.Web.Security.SingleSignOn.ClaimContext)+0x177
01c7f3b8 0486576a system_web_security_singlesignon!System.Web.Security.SingleSignOn.FederationServerService.InternalRST(System.String, System.String, System.String[], System.String, Byte[], System.String, System.Web.Security.SingleSignOn.RSTRResult ByRef)+0x79e
<snip>

 

0:002> r

eax=00000000 ebx=00000000 ecx=7ffab000 edx=01742428 esi=01742338 edi=01742428

eip=76f306df esp=01c7e0dc ebp=01c7e0e8 iopl=0 nv up ei pl nz na po nc

cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

wldap32!LdapSetupSslSession+0xf3:

76f306df 6a35 push 35h

 

Anywho – once I found where it was being set, I can see that it was examining the flags passed to it and failing based on them.

The ldap options included one or both ( I can’t  recall exactly ) of the following flags:

LDAP_OPT_SIGN
0x95

Determines the Kerberos signing state or enables Kerberos signing. The LDAP_OPT_SIGN session option should be enabled prior to binding using the LDAP_AUTH_NEGOTIATE flag. Cannot be used over an SSL connection. When used with Windows XP and Windows Server 2003, NTLM signing is also supported.

LDAP_OPT_ENCRYPT
0x96

Enables/disables Kerberos encryption prior to binding using the LDAP_AUTH_NEGOTIATE flag. Cannot be used over an SSL connection.
When used with Windows XP and Windows Server 2003, NTLM encryption is also supported.

The key of course, is the phrase "Cannot be used over an SSL connection". So , off Jim goes to file a bug,,,

In between then and now, lots of hand waving :

Then there is the PM, Dev and Test reviews, possible security reviews and much talk about WAR and customer impact etc.. but eventually we get there.

The public KB is now available:

AD FS cannot connect to an ADAM store over an SSL connection on a Windows Server 2003 R2-based server

https://support.microsoft.com/?kbid=959923

And that, is the story of that hotfix

Spat