Changes in affinity settings of SQL Server 2008 R2 to support > 64 Logical Processors

In earlier articles on this blog we talked about the fact that SQL Server 2008 R2 Enterprise Edition being able to address more than 64 logical processors. In order to give customers the possibility to affinitze SQL Server to certain CPUs in situations where multiple SQL Server instances get consolidated on the same hardware, it was necessary to change the way how SQL Server processor affinity is set. So far the hardware used with most customers was not going beyond 64 logical processors anyway. However in early summer 2011, we will see 4-processor (socket) hardware getting onto the market which will have 80 logical CPUs. So the changes we made and which I describe here might become interesting, specifically for DBMS consolidation scenarios in the SAP space.

Before reading further it might make sense reading this article Cameron wrote since the article is defining what Numa nodes and processor groups are. Terms I will use in the following sections rather often.

New way to set processor affinity in SQL Server 2008 R2

Looking in the way how we extended the processor affinity configuration (with sp_configure) when going from 32 logical processors to 64 logical processors somewhere in the year 2002, it became clear that we can’t continue down that road. Hence we introduced a new command which is called:

‘alter server configuration’ . The BOL article can be found online here. The command allows configuring processor affinity way easier than before where we had to go through the bitmaps and in a lot of cases made some little mistakes here and there. Unfortunately we only released this command for processor affinity settings but not yet for I/O affinity settings. Shouldn’t be a big deal since we mostly recommend leaving I/O affinity on the default anyway.

In order to test with the new command I used a new 4-processor server with new Intel Xeon E7 processor family. Each of those processors has 10-cores. With Hyperthreading enabled we look at 80 logical CPUs. Each of the processors/sockets is a NUMA node in itself. Hence the hardware has 4 NUMA nodes. In order to reach beyond 64 logical processors, the hardware needs to be divided into 2 processor groups. When starting, Windows Server 2008 R2 segments the hardware in one processor group with one processor/socket and 20 logical processors (first processor group) and a second processor group with 3 NUMA nodes and 3 processors/sockets.

I checked the # of CPUs used by SQL Server after changing the affinity with this command:

select COUNT(*) from sys.dm_os_schedulers where status = 'VISIBLE ONLINE';

Additionally I checked the errorlog for the settings it recorded when starting.

I used the new command to change the processor affitinity the following way

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE=0;

Then I used the command which counted the schedulers which were visible and online. I indeed got the expected number of 20. Even without restarting SQL Server. Based on the fact that we just need to toggle the status of the schedulers, a restart of SQL Server is not required when you make a change in the processor affinity mask. Subsequently I issued this command:

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE=0,1;

And it wasn’t exactly a surprise that our query counting the schedulers returned 40 schedulers and to finish the test, I included NUMANODE 2 and 3 as well, just to end up with 60 and 80 schedulers.

If you read the BOL article, you will realize that you can use the command as well to specify exact CPUs or ranges of CPUs. Trying to do that however ended a bit in a surprise. Issuing this command:

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY CPU=20 to 39;

Which I wanted to use to restrict SQL Server to the second NUMA node or second processor/socket, ended up in this error message:

Msg 5861, Level 16, State 1, Line 1
A CPU with id 20 does not exist on this system. Use sys.dm_os_schedulers to locate valid CPUs for this system.

As recommended in the error message, I used the following query to get behind the counting schema we applied to the CPUs with this query:

select * from sys.dm_os_schedulers where status = 'VISIBLE ONLINE' order by cpu_id, parent_node_id;

The result, to a degree surprising, was that we had a range of CPUs with IDs from 0 to 19. From there on we then had a gap and started to count again with the ID of 64 to 123. This basically is caused by the fact that this new 4-socket hardware has two processor groups of which the first one has 20 logical processors and the second one has 60 logical processors. However SQL Server’s counting scheme for CPU IDs is reserving 64 CPU IDs for each processor group. If we face processor groups which contain less logical processors than 64, we will have gaps in the counting which need to be taken into account when trying to set the affinity with the new command. Hence achieving what I wanted to achieve, establishing the processor mask to the second hardware NUMA node, I would have used this command:

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY CPU=64 to 83;

Or

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE=1;

Latter one being easiest.

What does the SQL Server errorlog tell us about the processor affinity we’ve got

As usual, SQL Server shows the processor affinity mask and the SQL Server NUMA node configuration at startup. Having SQL Server in its default configuration, the hardware described above leads to the following SQL Server 2008 R2 EE NUMA node configuration as documented in the SQL Server errorlog:

Node configuration: node 0: CPU mask: 0x00000000000fffff:1 Active CPU mask: 0x00000000000fffff:1.
Node configuration: node 1: CPU mask: 0x00000000000fffff:0 Active CPU mask: 0x00000000000fffff:0.
Node configuration: node 2: CPU mask: 0x000000fffff00000:1 Active CPU mask: 0x000000fffff00000:1.
Node configuration: node 3: CPU mask: 0x0fffff0000000000:1 Active CPU mask: 0x0fffff0000000000:1.

As we see SQL Server’s first NUMA node and second NUMA node are exchanged. Something SQL Server is doing all the time to avoid having SQL Server’s first NUMA node not on the first NUMA node of the hardware and the OS. However please note the additional digits I marked. These are indicating the processor group and as such were not existing in SQL Server 2005 and 2008. As you see NUMA node 0 and Numa node 1 seem to cover the same processors according to the processor mask. However please remark that SQL Server’s Numa node 0 is covering the first 20 logical processors in the second processor group. Whereas SQL Server’s Numa node 1 is covering the first 20 logical processors in the first processor group. Also visible above is that SQL Server is active on all the logical processors on this hardware (in this case 80).

Setting the processor affinity to the first NUMA node on the specific server (logical processors 0 to 19) with this command:

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE=0;

the errorlog entries after restart would look like:

Processor affinity turned on: node 0, processor mask 0x00000000000fffff.
Processor affinity turned on: node 1, processor mask 0x0000000000000000.
Processor affinity turned on: node 2, processor mask 0x0000000000000000.
Processor affinity turned on: node 3, processor mask 0x0000000000000000.

Node configuration: node 0: CPU mask: 0x00000000000fffff:0 Active CPU mask: 0x00000000000fffff:0.
Node configuration: node 1: CPU mask: 0x00000000000fffff:1 Active CPU mask: 0x0000000000000000:1.
Node configuration: node 2: CPU mask: 0x000000fffff00000:1 Active CPU mask: 0x0000000000000000:1.
Node configuration: node 3: CPU mask: 0x0fffff0000000000:1 Active CPU mask: 0x0000000000000000:1.

4 new lines in the errorlog are added in this case telling that we restricted the number of logical processors for SQL Server. It basically is telling that SQL Server is restricted on the first20 logical processors. Unfortunately it isn’t telling on which of the processor groups. This question is answered in the next 4 line section where it tells that the Active CPU mask of SQL Server is on the first 20 logical processors of the first processor group.

Please note the subtle difference when setting the affinity mask to the second NUMA node with this command:

ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE=1;

Processor affinity turned on: node 0, processor mask 0x00000000000fffff.
Processor affinity turned on: node 1, processor mask 0x0000000000000000.
Processor affinity turned on: node 2, processor mask 0x0000000000000000.
Processor affinity turned on: node 3, processor mask 0x0000000000000000.

Node configuration: node 0: CPU mask: 0x00000000000fffff:1 Active CPU mask: 0x00000000000fffff:1.
Node configuration: node 1: CPU mask: 0x00000000000fffff:0 Active CPU mask: 0x0000000000000000:0.
Node configuration: node 2: CPU mask: 0x000000fffff00000:1 Active CPU mask: 0x0000000000000000:1.
Node configuration: node 3: CPU mask: 0x0fffff0000000000:1 Active CPU mask: 0x0000000000000000:1.

You see that it is worth to look closely when using processor affinity on the new hardware generation which will arrive in the market in early summer 2011.