MaxConcurrentAPI - or, how fast can you authenticate users?

 

MaxConcurrentAPI.. what is it?

 

It is referenced a few places:

 

Netlogon reg entries

IAS Best Practices

How to configure your ISA Server for a very large number of authentication requests

 

 

Edit 11/21/2011 NOTE: Please see https://support.microsoft.com/kb/975363 for updated informaton

Background:
ISA Servers were very very slow when processing web requests only when domain authentication was turned on.

 

 

 

First, enable the netlogon log  ( can you tell from previous posts, that I really like his log? )  on the ISA server and examine the log during the time the ISA response was slow.

 

Second - parse this log for interesting data during the time of failure(s)

 

Y:\data>findstr /i NlAllocateClientApi Server02Netlogon.log > AllocateAPI.txt

 

AllocateAPI.txt shows:

 

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

 

 

We see a repeated pattern here:

 

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User7161 from ClientMachine16151 Returns 0xC000005E

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User233  from ClientMachine16333 Entered

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User3433 from ClientMachine16123 Returns 0x0

 

 

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Entered

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User546456 from ClientMachine453 Returns 0x0

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User54642 from ClientMachine565  Entered

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Returns 0xC000005E

 

 

err.exe shows us ( https://www.microsoft.com/downloads/details.aspx?FamilyId=BE596899-7BB8-4208-B7FC-09E02A13696C&displaylang=en  )

 

F:\tools\err.exe\i386>err.exe 0xC000005E

# for hex 0xc000005e / decimal -1073741730 :

  STATUS_NO_LOGON_SERVERS                                   ntstatus.h

# There are currently no logon servers available to service

# the logon request.

# 1 matches found for "0xC000005E"

 

 

A review of this area in netlogon shows us we have a standing timeout period of 45000 ms   before we will post the first error ( and subsequent error for NlpUserValidateHigher  and the return error 0xC000005E)

 

The number of outstanding RPC calls to the DC is throttled via the MaxConcurrentApi setting.

We use a semaphore to manage this queue, so when the netlogon log shows us that the API timed out – it was waiting longer than 45 sec for a slot to open up, and finally gave up and quit.

 

They had already increased the setting for MaxConcurrentAPI per q326040, to 10.

The theory here, was that the DC simply was not responding fast enough to satisfy the queued requests,  even with this value increased.

 

 

We took a userdumpof LSASS on the ISA server and the DC which the ISA server held a secure channel to ( found via Nltest) .

This was important because we need to see what ISA is doing at this time, as well as what the DC was doing at this time.

 

 

The ISA LSA dump confirms that we are very busy with 283 threads in LSA - about 115 of them waiting for one of these slots to open.

 

What's the DC doing? It had 118 threads . 

 

Some are processing the logon requests which takes in the data from netlogon on the DC, sends the data to the NTLM SSP (msv1_0.dll) and eventually calls into  the Authentication directory .

 

But, most ( if not all ) of the busy threads were waiting.

 

So what are all the DC threads waiting on? A critical section which is needed when we allocate memory from the heap.  Normally this would not be an issue, but someone had  set gflags on LSASS on this machine.

 

 

0:037> !gflag

Current NtGlobalFlag contents: 0x00001040

    hpc - Enable heap parameter checking

    ust - Create user mode stack trace DB

UST  should not cause a huge perf hit, but HPC will.

 

Disabled  via running: "gflags -i lsass.exe -hpc -ust"

After disabling and rebooting the DC – things ran smoothly.

 

PS: The netlogon log is really really cool...

 

 

Spat