Uneven Windows Processor Groups

The theme discussed in this blog does apply only to hardware which has more than 64 Logical Processors. All hardware with less Logical Processor threads exposed to the Operating System is not affected by the subjects described in this article.

In these blog articles: https://blogs.msdn.com/b/saponsqlserver/archive/2010/09/28/windows-2008-r2-groups-processors-sockets-cores-threads-numa-nodes-what-is-all-this.aspx

https://blogs.msdn.com/b/saponsqlserver/archive/2011/04/18/how-many-logical-processors-does-sql-server-2008-r2-enterprise-edition-support.aspx

https://blogs.msdn.com/b/saponsqlserver/archive/2011/04/20/changes-in-affinity-settings-of-sql-server-2008-r2-to-support-gt-64-logical-processors.aspx

we talked about Windows Server 2008 R2 and the fact that we support more than 64 Logical Processors. At the time we wrote the first listed article everything looked very straight forward and easy to understand. With the hardware available we looked at a maximum of 8 cores/physical processor or socket. Even with Hyperthreading you got 16 logical processors. Every one of those sockets formed one or two NUMA nodes. Four or 8 of those again formed one Processor Group. The next larger hardware beyond a 4-socket server was an 8-socket server which subsequently ended up in having two Processor Groups with 64 Logical Processors each. The assignment of Logical Processors threads to processor groups is done at startup of the server. For this purpose Windows 2008 R2 and later Windows releases will look at the physical hardware architecture in order to assign processor groups aligned to the NUMA nodes and will check memory latency in order to decide which Logical Processors threads to combine into one processor group. Once such an assignment is done, it cannot be changed dynamically. Such kind of an assignment only takes place when there are more than 64 Logical Processors threads. On typical 8-socket servers, the distribution of resources and memory between the different processor groups all were even with a processor group covering 64 Logical Processors usually (except some more exotic hardware which came on the market with like 96 Logical Processors in 2009/2010)

What we didn’t explain in more details is how older software releases would deal with the new concepts of Processor Groups. Or other way round how would Windows deal with software which to a degree does care about running on different Logical Processors. Or even allows controlling affinity of such an application on specific Logical Processors. So what happens to an application which was developed before we released the Processor Group concepts and hence only foresaw dealing with a maximum of 64 different Logical Processors to choose from.

Fact is Windows will assign such applications at startup of the application to one of the Processor Groups on the server. The application will see its 64 Logical Processors window of the hardware it is running on. Well, at least in terms of computing (CPU) resources. However the application will see the complete world of memory resources. Typical application which would be scheduled on one of the Processor Groups and which would live within the boundaries of such a processor group would be SQL Server 2005, SQL Server 2008 or even the SAP ABAP or Java stack. Or all other software which didn’t get changed to use changes in some of the Win32 APIs we did for Windows Server 2008 R2 to support Processor Groups.

As long as the processor groups have an even distribution and the software which is not processor group aware doesn’t take any dependencies on having certain NUMA nodes to their availability, everything is fine.

However this nice balance got disturbed a bit with Intel releasing the latest release of their Intel Xeon E7 processor family with 10 cores and 20 Logical Processors. Obviously numbers of cores and Logical Processors which don’t add up too well with 64 CPUs. In the blog I listed as third at the top I talked about the impact of these processors to the SQL Server Affinity Mask settings already.

What we didn’t cover so far was how Windows Server 2008 R2 would deal with the fact that on a 4-socket server suddenly 80 Logical Processors or on an 8-socket server 160 of those would show up. The algorithms implemented in Windows 2008 R2 originally were looking to create as little as possible processor groups and keep the individual processor groups as large as possible. Hence we ended up with uneven Processor Groups on servers with these new 10-core processors. Let’s see what happened.

Detection of current processor group information

In order to detect the exact configuration of processor groups in Windows Server 2008 R2, the hardware used needs to address more than 64 Logical Processors threads. The executable to perform the check with is called coreinfo and can be downloaded here: https://technet.microsoft.com/en-us/sysinternals/cc835722.aspx

Please download coreinfo .exe and run it in a command window. It is best to pipe the result into a text file which can be opened with notepad. E.g. coreinfo > structure.txt. It is expected that the program execution just takes a few seconds

Opening the file, we look for the following section which usually is the last section in the results. The Section is called ‘Logical Processor to Group Map’. The result of a server with 80 LOGICAL PROCESSOR threads might look like:

Logical Processor to Group Map:

Group 0:

************************************************************
--------------------

Group 1:

------------------------------------------------------------
********************

The symbol ‘*’ characterizes one Logical Processors thread being a part of the processor group. The sign ‘-‘symbolizes a Logical Processors thread which is not part of a processor group. Hence the display above tells that the processor group 0 is covering the first 60 Logical Processors thread of the hardware whereas the second processor group is covering just the last 20 Logical Processors threads. Understandable that it does make a larger difference in regards to available Logical Processors resources for an application to be assigned to the first or the second processor group. We saw the results turned around as well with the first processor group covering the first 20 Logical Processors threads and the second processor group covering the remaining 60 Logical Processors threads.

The result of this section run on a 160 Logical Processors thread hardware could look like:

Logical Processor to Group Map:

Group 0:

************************************************************
------------------------------------------------------------
----------------------------------------

Group 1:

------------------------------------------------------------
************************************************************
----------------------------------------

Group 2:

------------------------------------------------------------
------------------------------------------------------------
****************************************

Means we would look at 3 processor groups with twice 60 Logical Processor threads and the last one having 40 Logical Processor threads.

Problem with uneven processor Groups

Encountering Processor Groups which are uneven in terms of CPU resources, dealing with applications which are not Processor Group aware combined with Windows’ random assignment of such applications to one of the processor groups can cause a non-deterministic behavior of non-processor-groups aware applications. Restarting such an application might get it assigned to a different Processor Group with more or less CPU resources as before and hence cause different behavior under workload applied to it.

In order to be able to get determinable performance for SAP applications and SQL Server we recommend to configure Windows Server 2008 R2 to create Processor Groups with the same number of Logical Processor threads per Processor Group. In order to get there, Windows Development changed the algorithms when grouping the CPU resources into Processor Groups in a QFE.

Getting to an even size of processor groups

The easiest way to get to an even number of Logical Processors threads is to apply a QFE which got released by Microsoft in March 2011. The QFE and the related Knowledgebase Article from Microsoft can be found here: https://support.microsoft.com/kb/2510206

This QFE will change the strategy when building processor groups. The goal is to build processor groups of the same size. E.g. in the case of the 4 and 8 processor hardware driven by the Intel Xeon E7 family with 10 cores and Hyperthreading, we would look at 2 or respectively 4 processor groups of 40 Logical Processor threads each. The next release of Windows will use this new strategically by default.

Please keep this in mind when dealing with new hardware where the Logical Processor count goes beyond 64.

You can also get more detailed explanations in OSS note #1635387 – Windows Processor Groups