Parallel Index Creation performance issue

Both SQL Server 2005 and 2008 allow you to do parallel index creation.  In fact you can specify MAXDOP in create index statement.   When you use MAXDOP (let's say 8), you would assume that 8 threads will do equal amount of work and finish the work  at the same time.

But this is not always the case.   We have recently investigated an issue from a customer who complained about index creation being slow.   They have a need to create non-clustered indexes every night on a big table that they import data daily.   What we observed is that there were 8 threads initially doing work.  After a while, we just saw one thread (in addition to the main thread) consuming 100% CPU for a long time.    This is definitely unusual.  

As it turns out that the table has really skewed data.    Out of 250 million rows, over 150 millions rows contain empty strings for the column they are trying to create index.  So one thread ended up doing most of the work. 

Here is discovery process,
We got  xml  plan (set statistics xml on).   In the XML plan, you can see actual rows processed by each thread for each operator.  For example, in this output, we see that thread 1 ended up sorting over 152 million rows.  Rest of the threads didn't do too much work.
<RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="2.57398e+008" EstimateIO="2744.91" EstimateCPU="1958.46" AvgRowSize="35" EstimatedTotalSubtreeCost="7260.16" Parallel="1" EstimateRebinds="0" EstimateRewinds="0">

- <OutputList>
  <ColumnReference Column="RowRefSrc1011" />
  <MemoryFractions Input="1" Output="1" />
- <RunTimeInformation>
  <RunTimeCountersPerThread Thread="7" ActualRows="16059608" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="6" ActualRows="15888692" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="8" ActualRows="8269991" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="5" ActualRows="16257841" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="4" ActualRows="16733855" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="3" ActualRows="15253016" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="1" ActualRows="152841862" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="2" ActualRows="16092988" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
  <RunTimeCountersPerThread Thread="0" ActualRows="0" ActualRebinds="0" ActualRewinds="0" ActualEndOfScans="0" ActualExecutions="0" />

You wouldn't see anything in regular statistics profile.  All you see is over 250million rows processed.

Rows                 Executes             StmtText                                                                                                                                                        
-------------------- -------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------
0                    1                    insert [schema1].[table1] select *, %%bmk%% from [schema1].[table1]                                                                                 
0                    1                      |--Parallelism(Gather Streams)                                                                                                                                
0                    8                           |--Index Insert(OBJECT:([db2].[schema1].[table1].[IX_table1_column1]))                                                                
257397853            8                                |--Sort(ORDER BY:([db1].[schema1].[table1].[column1] ASC, [Bmk1000] ASC) PARTITION ID:([db1].[schema1].[table1].[column1])) 
257397853            8                                     |--Table Scan(OBJECT:([db1].[schema1].[table1]))  

At this point, we obtained show statistics output from customer after index creation to further confirm data distribution. From histogram, we confirmed that  the contaim has over 140 million rows of empty strings out of 250 million rows.

------------------------ ------------- ------------- -------------------- --------------
                         0             1,426618E+08  0                    1
01....             114197        19829         39314                2,904741
0313...               493116        12894         97814                5,041364
0316...               451782        16061         82636                5,467133

Because of the skewed data distribution, sort can't be truely parallelized.    Currently SQL can't divide a single value for multiple threads to handle. So one thread ended up doing more work.

So if you have skewed data on a column you are trying to index, it will not achieve true parallelism at sort phase.

For this particular case, you can potentially do filter index on NULL, or blanks to avoid the sort inbalance.  But if it's a meaningful value to you that you won't want to filter out, you can end up with one thread doing most of the work during the sort phase. 

For this particular customer, there are other tuning opportunities specific to their requirement which are not relavant to this post.   So I will not add confusion.


Jack Li | Senior Escalation Engineer |Microsoft SQL Server Escalation Service


Comments (5)
  1. Bolik says:

    Introducing Web Friendly Feeds aka Friendly Feeds Parallel Index Creation performance issue Integration

  2. Bolik says:


  3. Grumpy Old DBA says:

    This has to be a prime case for a filtered index ( SQL 2008 ) when presumably this problem would not then arise?

  4. Iain Barnetson says:

    Jack, Hi, Great article.

    I was wondering if you could shed some light on the parallel index operation process.

    Where do the stats that are used to calculate parallelism for the index operation come from, is it the existing stats, or as stated here: "…/ms191292.aspx", a separate set of stats calculated specifically for index operation ?

    Many thanks,


  5. Bhavya says:

    For this particular customer, there are other tuning opportunities specific to their requirement which are not relavant to this post.
    What other options we have pls

Comments are closed.

Skip to main content