Why Do I see more threads per SPID in sysprocesses than MAXDOP?

Most of the Customers using sql server 2000,2005 and 2008 have this doubt in their mind when using parallelism in sql server.

Why Do I see more threads per SPID in sysprocesses than MAXDOP?

The above question is being addressed in the following blog by Bob Dorr from SQL Server Escalation Services,

https://blogs.msdn.com/psssql/archive/2008/02/13/how-it-works-sql-server-per-query-degree-of-parallelism-worker-count-s.aspx

But we will try to provide some more clarification on this issue, in this blog by explaining the behaviour of query processor when dealing with parallelism with the help of following example.

We have used the following script from the book "Inside Sql server 2005 Query Tuning and Optimization" by Kalen Delaney, to induce parallelism in the execution plan,

create table [HugeTable1]

(

[Key] int,

[Data] int,

[Pad] char(200),

Constraint [PK1] PRIMARY KEY ([Key])

)

SET NOCOUNT ON

DECLARE @i int

BEGIN TRAN

set @i = 0

WHILE @i < 250000

BEGIN

   INSERT [HugeTable1] Values (@i,@i,NULL)

   SET @i = @i + 1

   if @i % 1000 = 0

   BEGIN

      COMMIT TRAN

      BEGIN TRAN

   END

END

COMMIT TRAN

SELECT [KEY],[DATA],[PAD] INTO [HugeTable2] FROM HugeTable1

ALTER TABLE [HugeTable2] ADD CONSTRAINT [PK2] PRIMARY KEY ([Key])

We use the following query which displays parallelism in it’s execution plan

set statistics profile on

select T1.[Key],T1.[Data],T2.[Data] From HugeTable1 T1 Join [HugeTable2] T2 ON T1.[Key] =T2.[Key] where T1.Data < 100 OPTION (MAXDOP 2)

 

Execution Plan

|--Parallelism(Gather Streams)          

  |--Nested Loops(Inner Join, OUTER REFERENCES:([T1].[Key]))

    |--Parallelism(Repartition Streams, RoundRobin Partitioning)

    | |--Clustered Index Scan(OBJECT:([AdventureWorks2008].[dbo].[HugeTable1].[PK1] AS [T1])

                                                               WHERE:([AdventureWorks2008].[dbo].[HugeTable1].[Data] as [T1].[Data]<(100)))

  |--Clustered Index Seek(OBJECT:([AdventureWorks2008].[dbo].[HugeTable2].[PK2] AS [T2]),

                                                       SEEK:([T2].[KEY]=[AdventureWorks2008].[dbo].[HugeTable1].[Key] as [T1].[Key]) ORDERED FORWARD)

We ran the above query 100 times in a loop,identified the SPID while running the above query and queried the sysprocesses for that spid from another query window at the same time

select * from sys.sysprocesses where spid=56

spid   kpid blocked waittype waittime lastwaittype cpu physical_io ecid status

56     5640 0 0x00BB 3 CXPACKET 66653 20605 0 suspended

56     5936 0 0x00BB 3 CXPACKET 2147483647 0 1 suspended

56     1252 0 0x00BB 1 CXPACKET 2147483647 0 2 suspended

56     3508 56 0x0024 0 LATCH_EX 2147483647 0 3 suspended

56     3580 0 0x0000 0 LATCH_EX 2147483647 0 4 runnable 

As we can see here we have 5 threads which are spawned to execute the query, however while executing the query we limited the DOP to 2 by specifying the MAXDOP hint to 2.

The reason is “MAXDOP is always specified per operator in the execution plan and not per execution plan”.

So in the above execution plan we have 3 operators in the execution plan viz Clustered Index Scan, Clustered Index Seek and Nested Loops.

So we have

- 2 threads (as per the MAXDOP hint) which are used to perform Clustered Index Scan,

- 2 threads (as per the MAXDOP hint) which are used for Nested Loop Join which also performs Clustered Index Seek to join the rows from outer Clustered Index Scan.  In other words we do not have separate threads allocated for Clustered Index Seek opearor,

- 1 thread is used for parallel gather streams operator to collect and converge the parallel thread output from the Nested Loop join operator. This thread is basically a synchronizing thread (also referred as Thread 0 IN xml plan) which is used when there is parallelism in the execution plan of the query.

To justify the above explanation, we can observe the XML execution plan which also show the runtime behaviour of the plan

2 threads used to perform Clustered Index Scan

<RelOp NodeId="3" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan"….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="2" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

2 threads used to perform Nested Loop as well the clustered index seek

<RelOp NodeId="1" PhysicalOp="Nested Loops" LogicalOp="Inner Join" ….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread = " 2 " ActualRows = " 50 " ActualEndOfScans = " 1 " ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

Same threads used to perform Clustered Index Seek on inner query

<RelOp NodeId="4" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek"…>

<RunTimeInformation>

<RunTimeCountersPerThread Thread = " 2 " ActualRows = " 50 " ActualEndOfScans = " 0 " ActualExecutions="50" />

<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="0" ActualExecutions="50" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

Finally Thread 0 which is shown in all the operators is used to gather streams from all the parallel threads and to provide the final output to the client

<RelOp NodeId="0" PhysicalOp="Parallelism" LogicalOp="Gather Streams"….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

To conclude the MAXDOP setting is used to limit the number of threads per operation in the execution plan and does not limit the number of threads used to execute the query. So it is normal to see threads per SPID in sysprocesses greater than MAXDOP setting.

However if you see lot of waits for CX Packets in sys.dm_os_wait_stats and High CPU being consumed by sql server, you can consider reducing the MAXDOP setting or identify the missing indexes which can expedite the processing of the query.

 

 

Parikshit Savjani,

SE, Microsoft SQL Server,