Are my actual worker threads exceeding the sp_configure 'max worker threads' value?

Tom Stringer (@SQLife) was working on some HADR testing for a customer to simulate many availability groups and introduce significant load into the system to measure overhead and such. In his quest to do that he was seeing behavior that he couldn’t really explain and so worked with him to uncover what was happening under the covers.

For his test environment he was using a single CPU VM with the default worker thread setting on X64 which means it would result in 512 worker threads. Yet he was showing far more than 512 worker threads on his system which needed to be explained.

 

To give you context, in this test there were 40 Availability groups

select count(*)as NumAvailabilityGroups from sys.availability_groups      

clip_image002    

Sys.dm_os_schedulers showed a huge amount of workers, more than the max of 512 expected.

 

select scheduler_id,current_tasks_count,

current_workers_count,active_workers_count,work_queue_count    

from sys.dm_os_schedulers    

where status = 'Visible Online'      

 

clip_image004

  

Sp_server_diagnostics backed this up as well:    

 <queryProcessing maxWorkers="512" workersCreated="923"    

 

We then looked at sys.dm_os_workers to see if that would shed any light, and it did give us a clue. All the WAITFOR threads here were from the induced workload ( User queries), but we saw a ton of HADR type last_wait_types. We can see at least 1 thread per AG being the ‘preemptive Lease mechanism’ thread, and a bunch of other HADR type waits and also Broker and some others – but how do we deterministically figure out “what threads” are counted or not towards max worker thread limit?

  

select is_preemptive,state,last_wait_type,count(*) as NumWorkers from sys.dm_os_workers    

Group by state,last_wait_type,is_preemptive    

order by count(*) desc      

 

clip_image006

  

So then the bigger question as to whether these threads are “counted” towards the Max worker threads or not. Bob Dorr had posted about the HADR Worker Pool earlier and from an AlwaysON  Availability Groups perspective there is a pool of workers that are capped to a number that is [ Max worker threads – 40 ] but what we discovered is that these seemingly do not count towards the Max worker thread configured value and we had to prove that.  

Looking at source,  discovered that when we spawn a new worker thread, we calculate an IdealWorkerLimit and compare that with Max worker threads. In that calculation, if a task that is bound to a worker is marked as a “Permanent Task”, these are not “counted” towards the worker thread limit. So there are a bunch of System threads that spawn of Permanent tasks and as such as not counted towards the configured ‘max worker thread’ configured limit including but not limited to:  

- Always On ( HADR Notification, HADR Controller etc )  

- Lock Manager  

- Log Shipping worker  

- Some Fulltext  

- SQL Trace  

- Some of the Transport Threads such as Service Broker  

- Some of the Recovery threads  

Currently in sys.dm_os_tasks or in sys.dm_os_schedulers we don’t expose a column that exposes if a task is a “Permanent Task” if you will. You could roughly extrapolate it somewhat from the query on workers posted earlier, or looking at Exec requests as below.  

select last_wait_type, count(*) as NumRequests from sys.dm_exec_requests    

group by last_wait_type    

order by count(*) desc      

clip_image008  

  Luckily sys.dm_exec_sessions has a column is_user_process that is exposed. So we can see that there are 385 requests that are “System” and 502 that are user  

select  is_user_process,count(*) as RequestCount from sys.dm_exec_sessions s    

inner join sys.dm_exec_requests r    

on s.session_id = r.session_id    

group by is_user_process      

 

clip_image010

 

Knowing this we can further get additional details including what type of command it is running, and many other details by modifying the query below.  The query below is only listing threads that are “not” being counted against the ‘max worker thread limit’

;with cte as    

(    

       select    

              s.is_user_process,    

  w.worker_address,    

              w.is_preemptive,    

              w.state,    

              r.status,    

              t.task_state,    

              r.command,    

              w.last_wait_type,    

              t.session_id,    

              t.exec_context_id,    

              t.request_id    

       from dm_exec_sessions s    

       inner join dm_exec_requests r    

       on s.session_id = r.session_id    

       inner join dm_os_tasks t    

       on r.task_address = t.task_address    

       inner join dm_os_workers w    

       on t.worker_address = w.worker_address    

       where s.is_user_process = 0    

)    

select    

       is_user_process,command,    

       last_wait_type,    

       count(*) as cmd_cnt    

from cte    

group by is_user_process,command, last_wait_type    

order by cmd_cnt desc      

 

clip_image012

 

Moral of the story:  Max worker threads limit does NOT have a hard cap in terms of accounting for all the System Tasks but we have demonstrated a way to approximate which tasks those are.

 

As a follow up, Tom will be posting another article with more details on monitoring HADR thread usage through a few other avenues.

 

Denzil Ribeiro – Sr. Premier Field Engineer