How It Works: Maximizing Max Degree Of Parallelism (MAXDOP)

I was working on an index build issue for an 80 CPU system and kept seeing that only 64 CPUs were getting used. I had carefully studied sys.dm_os_spinlock_stats and sys.dm_os_wait_stats along with performance counters, memory usage pattern, and I/O activities.   In fact, I had an 80 CPU, 2TB RAM, 4TB SSD system so I was convinced the SQL Server was CPU bound and adding more CPUs for the index build could be a benefit.

Note: I must caution you that adding more CPUs can lead to reduced performance because a bottleneck such as memory or I/O can become a larger problem.  

I like to think of it like pumping gas at my favorite filling station.  The pump at the storage tank can only move so much liquid through a finite pipe size.   Adding more filling outlets for patrons to use does not mean the overall flow of gas (gallons/sec) increases.  In fact, I like filling when no one else is filling because I maximize my flow and reduce my overall time at the pump. 

TEST any MAXDOP setting well as you might find less goes faster.

After looking at various SQL Server Books Online references and then stepping though the code I realized our documentation is not as accurate as it could be.   I hope I this post can reduce some of the confusion.

There are plenty of references for tuning MAXDOP to allow queries to run at their best while reducing the overhead of the parallelism.  You have all seen the references for capping MAXDOP at 8, or number of schedulers of the NUMA node if smaller, or ….   The fact is, these are all great and recommended best practices.

This post is not intended to contradict any of the current recommendations.  This blog is solely focused on a specific maintenance target.    The reset of my system has to be idle so I can safely consume the majority of schedulers.  It is probably the middle of the night, users are sleeping and you want to schedule a job that can take full advantage of the overall system.  I have reviewed various performance points and I believe using a high level of parallelism could allow my index build to complete quickly.

Warning: Index fragmentation with increased levels or parallelism: https://blogs.msdn.com/b/psssql/archive/2012/09/05/how-it-works-online-index-rebuild-can-cause-increased-fragmentation.aspx

There are several stages to determining the degree of parallelism (MAXDOP) a query can utilize.

Stage 1 – Compile

During complication SQL Server considers the hints, sp_configure and resource workgroup settings to see if a parallel plan should even be considered.  Only if the query operations allow parallel execution:

If hint is present and > 1 then build a parallel plan

else if no hint or hint (MAXDOP = 0)

          if sp_configure setting is 1 but workload group > 1 then build a parallel plan

           else if sp_configure setting is 0 or > 1 then build parallel plan

Stage 2 – Query Execution

When the query begins execution the runtime, degree of parallelism is determined.  This involves many factors, already outlined in SQL Server Books Online: https://technet.microsoft.com/en-US/library/ms178065(v=SQL.105).aspx 

Before SQL Server looks at the idle workers and other factors it determines the target for the degree of parallelism.

if sp_configure or query hint forcing serial plan use (1)

else if resource workgroup set

    if query hint present use min(hint, resource workgroup)

     else use resource workgroup

If still 0 after the detailed calculations it is set to 64  (default max for SQL Server as documented in Books Online.)   This fooled me some because on the 80 CPU system it has 2 Windows scheduler groups x 40 CPUs.   I might have expected a 40 CPU cap to avoid crossing over a Windows scheduler group.   This is not the case, SQL Server hard codes the 64 CPU target when the runtime target of MAXDOP is still 0 (default.)

The MAXDOP target is now adjusted for:

  • Actual CPU count (affinity settings from sp_configure and the resource pool). 

  • Certain query types (index build for example) look at the partitions

  • Other query type limitations that may exist

Now SQL Server takes a look at the available workers (free workers for query execution.) You can loosely calculate the free worker count on a scheduler using (Free workers = Current_workers_count – current_tasks_count) from sys.dm_os_schedulers

Once the target is calculated the actual is determined by looking at the available resources to support a parallel execution. This involves determining the node(s) and CPUs with available workers.

Older versions of SQL Server used a polling mechanism every ~1 second to determine the node with the most free workers to target. This meant you could encounter race conditions from multiple queries, both going parallel on the same node when going parallel on separate nodes would have resulted in better CPU usage.

Newer builds of SQL Server actively track the free workers. This significantly reduces the possibility of assigning parallel queries to the same set of schedulers.

Trace flag 2466 - Force older version logic to determine number of available resources.

 <p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p><font color="#424242" size="2">The worker location information is then used to target an appropriate set of CPUs to assign the parallel task to. </font></p></span></p>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p></p></span></p>  <table><tr><td></td></tr></table>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'></span><font size="2">In general the placement decisions are:</font></p>  <table><tr><td></td></tr></table>    <ul>     <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p><font size="2"><strong>SMP (FPlaceThreadsOneNodeSystem):</strong> If a single node system treat as SMP and only enqueue to a single node. SOFT NUMA and affinity may cause SQL to treat as SMP                 <br></font></p></span></div>     </li>      <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p><font size="2"><strong>CONNECTION (FPlaceThreadsOneNodeSystem): </strong>If trace flag 2479 is enabled force all parallel decisions to be limited to the node the connection is associated with.&nbsp; This may be helpful when using SOFT Numa or connection, node affinity.                 <br></font></p></span></div>     </li>      <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"><strong>FULL (FPlaceThreadsAllNodes): </strong>If MAXDOP target is equal to all schedulers enqueue work to all schedulers             <br></font></div>     </li>      <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p><font size="2"><strong>LEAST (FPlaceThreadsWithinLeastLoadedNode): </strong>If target MAXDOP target is less than a single node can provide and if trace flag 2467 is enabled attempt to locate least loaded node&nbsp; <br></font></p></span></div>     </li>      <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p><font size="2"><strong>SPREAD (FPlaceThreadsMultipleNodes): </strong>The load is spread across any available node<span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi;'><p></p></span></font></p></span></div>     </li>   </ul>    <p class="MsoNormal" style="margin: 0in 0in 0pt;">&nbsp;</p>  <table><tr><td></td></tr></table>   <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">Using XEvents you can monitor the MAXDOP decision logic.&nbsp; For example:</font></p> <table><tr><td></td></tr></table>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>    <ul>     <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><span style='font-family: "Segoe UI","sans-serif"; font-size: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: en-us; mso-fareast-language: en-us; mso-bidi-language: ar-sa;'><font color="#000000" size="2">XeSqlPkg::calculate_dop_begin</font></span></div>     </li>      <li>       <div class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">XeSqlPkg::calculate_dop</font></div>     </li>   </ul>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>  <table><tr><td></td></tr></table>   <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">Back to trying to get my index build to use all 80 CPUs.&nbsp; I can do several things:</font></p> <table><tr><td></td></tr></table>    <blockquote>     <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">1. Use MAXDOP=80 query hint</font></p>      <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">2. Set sp_configure &lsquo;max degree of parallelism&rsquo;, 80&nbsp;&nbsp;&nbsp; -- Warning this applies to any query</font></p>      <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">3. Create resource pool/workload group and set MAXDOP=80 and assign only the index build connection to it using a resource governor classifier.</font></p>   </blockquote>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"><strong>Testing Results</strong></font></p>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>  <table><tr><td></td></tr></table>   <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">Here are the MAXDOP results on my 80 CPU system at different setting levels.</font></p> <table><tr><td></td></tr></table>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>   <strong></strong><strong></strong><strong></strong>    <table cellspacing="0" cellpadding="2" width="363" border="1"><tbody>       <tr>         <td valign="top" width="96"><strong>Query Hint</strong></td>          <td valign="top" width="102"><strong>sp_configure</strong></td>          <td valign="top" width="100"><strong>Workgroup</strong></td>          <td valign="top" width="63"><strong>RUNTIME</strong></td>       </tr>        <tr>         <td valign="top" width="94">0</td>          <td valign="top" width="103">0</td>          <td valign="top" width="100">80</td>          <td valign="top" width="64"><strong>80</strong></td>       </tr>        <tr>         <td valign="top" width="94">0</td>          <td valign="top" width="103">0</td>          <td valign="top" width="99">0</td>          <td valign="top" width="65"><strong>64</strong></td>       </tr>        <tr>         <td valign="top" width="93">1</td>          <td valign="top" width="103">0</td>          <td valign="top" width="99">80</td>          <td valign="top" width="65"><strong>1</strong></td>       </tr>        <tr>         <td valign="top" width="93">2</td>          <td valign="top" width="103">0</td>          <td valign="top" width="99">80</td>          <td valign="top" width="65"><strong>2</strong></td>       </tr>        <tr>         <td valign="top" width="93">0</td>          <td valign="top" width="103">80</td>          <td valign="top" width="99">0</td>          <td valign="top" width="65"><strong>80</strong></td>       </tr>        <tr>         <td valign="top" width="93">0</td>          <td valign="top" width="103">1</td>          <td valign="top" width="99">0</td>          <td valign="top" width="65"><strong>1</strong></td>       </tr>        <tr>         <td valign="top" width="93">0</td>          <td valign="top" width="103">2</td>          <td valign="top" width="99">0</td>          <td valign="top" width="65"><strong>2</strong></td>       </tr>        <tr>         <td valign="top" width="93">80</td>          <td valign="top" width="103">2</td>          <td valign="top" width="99">0</td>          <td valign="top" width="65"><strong>80</strong></td>       </tr>        <tr>         <td valign="top" width="93">80</td>          <td valign="top" width="104">10</td>          <td valign="top" width="100">2</td>          <td valign="top" width="66"><strong>2</strong></td>       </tr>     </tbody></table>    <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2"></font></p>  <table><tr><td></td></tr></table>   <p class="MsoNormal" style="margin: 0in 0in 0pt;"><font size="2">You can monitor the number of parallel workers by querying:</font></p>   <font size="2">     <p><font color="#008000" size="2" face="Consolas"><font color="#008000" size="2" face="Consolas"><font color="#008000" size="2" face="Consolas">sys</font></font></font><font color="#808080" size="2" face="Consolas"><font color="#808080" size="2" face="Consolas"><font color="#808080" size="2" face="Consolas">.</font></font></font><font color="#008000" size="2" face="Consolas"><font color="#008000" size="2" face="Consolas"><font color="#008000" size="2" face="Consolas">dm_os_tasks</font></font></font></p>   </font>    <p><font size="2"><strong>Note:</strong>&nbsp; Some configuration changes may require a flush of procedure cache (dbcc freeproccache) or a disconnect/connect pairing to take affect.</font></p>  </span><p class="MsoNormal" style="margin: 0in 0in 0pt;">&nbsp;

Bob Dorr - Principal SQL Server Escalation Engineer