SQL Server 2005 and 2008 versions may not detect all available processors on a machine with more than 64 logical processors

Scenario: I came across this scenario recently where a SQL Server 2005 instance installed on Windows Server 2008 R2, wasn’t detecting all available logical processors after Hyperthreading was enabled on the server. The machine in question had 4 sockets with each socket representing a NUMA node and each socket had 10 cores on it - so there were a total of 40 cores on the server. Without Hyperthreading turned on, SQL Server detected all 40 logical processors. However, after Hyperthreading was enabled, SQL server instance detected only 20 logical processors, although there were a total of 80 logical processors available on the server. Here is what we noticed in the SQL Server error logs before and after turning on Hyperthreading -

With Hyperthreading turned OFF

2011-10-27 12:37:32.74 Server      Detected 40 CPUs. This is an informational message; no user action is required.

2011-10-27 12:37:32.94 Server      Large Page Extensions enabled.

2011-10-27 12:37:32.94 Server      Large Page Granularity: 2097152

2011-10-27 12:37:32.98 Server      Large Page Allocated: 32MB

2011-10-27 12:37:33.03 Server      Large Page Allocated: 32MB

2011-10-27 12:37:33.42 Server      Using locked pages for buffer pool.

2011-10-27 12:37:33.79 Server      Large Page Allocated: 32MB

2011-10-27 12:37:33.90 Server      Large Page Allocated: 32MB

2011-10-27 12:37:34.17 Server      Using dynamic lock allocation.  Initial allocation of 2500 Lock blocks and 5000 Lock Owner blocks per node.  This is an informational message only.  No user action is required.

2011-10-27 12:37:34.19 Server      Lock partitioning is enabled.  This is an informational message only. No user action is required.

2011-10-27 12:37:34.53 Server      Multinode configuration: node 0: CPU mask: 0x00000000000ffc00 Active CPU mask: 0x00000000000ffc00. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

2011-10-27 12:37:34.60 Server      Multinode configuration: node 1: CPU mask: 0x00000000000003ff Active CPU mask: 0x00000000000003ff. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

2011-10-27 12:37:34.65 Server      Multinode configuration: node 2: CPU mask: 0x000000003ff00000 Active CPU mask: 0x000000003ff00000. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

2011-10-27 12:37:34.70 Server      Multinode configuration: node 3: CPU mask: 0x000000ffc0000000 Active CPU mask: 0x000000ffc0000000. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required

With Hyperthreading turned ON

2011-10-27 13:13:50.31 Server     Detected 20 CPUs. This is an informational message; no user action is required.

2011-10-27 13:13:50.48 Server      Large Page Extensions enabled.

2011-10-27 13:13:50.48 Server      Large Page Granularity: 2097152

2011-10-27 13:14:04.68 Server      Large Page Allocated: 32MB

2011-10-27 13:14:07.87 Server      Using locked pages for buffer pool.

2011-10-27 13:14:12.25 Server      Using dynamic lock allocation.  Initial allocation of 2500 Lock blocks and 5000 Lock Owner blocks per node.  This is an informational message only.  No user action is required.

2011-10-27 13:14:12.28 Server      Lock partitioning is enabled.  This is an informational message only. No user action is required.

Cause and Explanation: A new feature introduced in Windows Server 2008 R2 called “Processor Groups” was responsible for this behavior. Existing applications such as SQL Server 2005 and 2008, which were released prior to the release of Windows Server 2008 R2, can’t take advantage of this feature thus causing the issue

What is a Processor group? In Windows Server 2008 R2, support for systems that have more than 64 logical processors is based on the concept of a processor group, which is a static set of up to 64 logical processors that is treated as a single scheduling entity. Processor groups are numbered starting with 0. Systems with fewer than 64 logical processors always have a single group, Group 0. The operating system minimizes the number of groups in a system. For example, a system with 128 logical processors would have two processor groups with 64 processors in each group, not four groups with 32 logical processors in each group. Processor Group is a new concept in Windows Server 2008 R2 and is not supported by Windows Server 2008, Windows Vista and Windows Server 2003.

Why does Processor Groups affect SQL Server 2005 and 2008 instances? Following three factors combined together would cause Processor Groups to affect SQL Server 2005 and 2008 instances -

  • Both SQL Server 2005 and SQL Server 2008 were released prior to the release of Windows Server 2008 R2 and therefore these versions do not utilize the new APIs that were released to take advantage of Processor Groups in Windows Server 2008 R2.
  • Windows Server 2008 R2 creates uneven Processor Groups on machines that has > 64 logical processors. This is because the algorithms implemented in Windows 2008 R2 originally were looking to create as little as possible processor groups and keep the individual processor groups as large as possible. Since there is only one Processor Group on servers that has <= 64 processors, no problems are expected on such machines. However, if a machine has more than 64 logical processors, Windows Server 2008 R2 will create more than one processor group and SQL Server 2005/2008 will be bound to any of the groups on service restart. In our scenario, after Hyperthreading was turned on, the machine had 80 Logical processors and 4 NUMA nodes, which were divided into 2 groups by the Windows Server 2008 R2 operating system. The first group had 3 NUMA nodes and hence 60 logical processors in it while the second groups had 1 NUMA node with 20 logical processors in it. When SQL instance was restarted, it was bound to the first group with 20 logical processors in it.
  • Windows Server 2008 R2 will randomly assign such applications that are not aware of Processor Groups to one of the existing Processor Groups.

Workaround/Fix: A couple of things to consider -

  • How to utilize > 64 logical processors for SQL instances? SQL Server 2005/2008 versions cannot utilize more than 64 logical processors and therefore if you would like your SQL server instance to be able to utilize more than 64 processors, the only option is to upgrade to SQL Server 2008 R2.
  • How to force a SQL Server instance to use the Processor Group with maximum number of logical processors in it? For example, on a machine with 96 CPUs you could have two processor groups, Group # 1 with 64 CPUS and Group # 2 with 32 CPUs.  If you have SQL Server 2005/2008 Enterprise installed it can use up to 64 CPUs.  In this scenario, you don’t want  your SQL Server instance to be associated with Group # 2 that has 32 CPUs. If you would like your SQL Server 2005 and 2008 instances to be able to utilize the processor group that has maximum available processors in it, you can use the sc.exe utility as described in this blog article
  • How to resolve issues with uneven processor group assignment? Apply Windows Server 2008 R2 hotfix as described in KB 2510206
  • How to manually assign processor groups to specific NUMA nodes? Follow instructions in KB 2506384

Additional References:

Would love to hear thoughts from folks who have experienced this issue firsthand. Please leave your feedback below if you have anything to share

Cheers!