Parallel Index Creation performance issue

Both SQL Server 2005 and 2008 allow you to do parallel index creation.  In fact you can specify MAXDOP in create index statement.   When you use MAXDOP (let's say 8), you would assume that 8 threads will do equal amount of work and finish the work  at the same time.

But this is not always the case.   We have recently investigated an issue from a customer who complained about index creation being slow.   They have a need to create non-clustered indexes every night on a big table that they import data daily.   What we observed is that there were 8 threads initially doing work.  After a while, we just saw one thread (in addition to the main thread) consuming 100% CPU for a long time.    This is definitely unusual.  

As it turns out that the table has really skewed data.    Out of 250 million rows, over 150 millions rows contain empty strings for the column they are trying to create index.  So one thread ended up doing most of the work. 

Here is discovery process,
We got  xml  plan (set statistics xml on).   In the XML plan, you can see actual rows processed by each thread for each operator.  For example, in this output, we see that thread 1 ended up sorting over 152 million rows.  Rest of the threads didn't do too much work.
<RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="2.57398e+008" EstimateIO="2744.91" EstimateCPU="1958.46" AvgRowSize="35" EstimatedTotalSubtreeCost="7260.16" Parallel="1" EstimateRebinds="0" EstimateRewinds="0">

- <OutputList>

<ColumnReference Column="RowRefSrc1011" />

</OutputList>

<MemoryFractions Input="1" Output="1" />

- <RunTimeInformation>

<RunTimeCountersPerThread Thread="7" ActualRows="16059608" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="6" ActualRows="15888692" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="8" ActualRows="8269991" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="5" ActualRows="16257841" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="4" ActualRows="16733855" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="3" ActualRows="15253016" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="152841862" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="2" ActualRows="16092988" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualRebinds="0" ActualRewinds="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

You wouldn't see anything in regular statistics profile.  All you see is over 250million rows processed.

Rows                 Executes             StmtText                                                                                                                                                        
-------------------- -------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------
0                    1                    insert [schema1].[table1] select *, %%bmk%% from [schema1].[table1]                                                                                 
0                    1                      |--Parallelism(Gather Streams)                                                                                                                                
0                    8                           |--Index Insert(OBJECT:([db2].[schema1].[table1].[IX_table1_column1]))                                                                
257397853            8                                |--Sort(ORDER BY:([db1].[schema1].[table1].[column1] ASC, [Bmk1000] ASC) PARTITION ID:([db1].[schema1].[table1].[column1])) 
257397853            8                                     |--Table Scan(OBJECT:([db1].[schema1].[table1]))  

At this point, we obtained show statistics output from customer after index creation to further confirm data distribution. From histogram, we confirmed that  the contaim has over 140 million rows of empty strings out of 250 million rows.

RANGE_HI_KEY             RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
------------------------ ------------- ------------- -------------------- --------------
                         0             1,426618E+08  0                    1
01....             114197        19829         39314                2,904741
0313...               493116        12894         97814                5,041364
0316...               451782        16061         82636                5,467133
.....

Because of the skewed data distribution, sort can't be truely parallelized.    Currently SQL can't divide a single value for multiple threads to handle. So one thread ended up doing more work.

So if you have skewed data on a column you are trying to index, it will not achieve true parallelism at sort phase.

For this particular case, you can potentially do filter index on NULL, or blanks to avoid the sort inbalance.  But if it's a meaningful value to you that you won't want to filter out, you can end up with one thread doing most of the work during the sort phase. 

For this particular customer, there are other tuning opportunities specific to their requirement which are not relavant to this post.   So I will not add confusion.

 

Jack Li | Senior Escalation Engineer |Microsoft SQL Server Escalation Service