SQL 2016 - It Just Runs Faster: Automatic Soft NUMA


As hardware continues to expand and evolve SQL Server testing and customer reports have highlighted the need to partition activities for optimal scaling.  Partitioning based designs are common ways to localize activities and improve performance and scalability.    An example of how SQL Server leverages partitioning is the CMemThread object.

 

For thread safety various synchronization mechanisms are utilized (spinlock, latch, mutex, semaphore, …)   For discussion purposes let's focus on a spinlock with imperial, computer industry testing results.   A highly contended spinlock does not scale well beyond 8 CPUs.  The CPU is utilized and N-1 spinners are unable to acquire the lock, only one owner is possible.

 

The SQL Server product team studies and tracks internal structures and partitioned designs.   Many of the SQL Server, common structures are designed with various partitioning schemes and rooted around the NUMA layout of the machine.  

 

The core inception of NUMA partitioned based designs evolved in SQL Server 2000 and more so in SQL Server 2005.   As the SMP, single node machines, advanced beyond 8 CPUs the scalability issues were uncovered and design changes made to address the issues.   To combat the scalability NUMA partitioning was a standardized choice.  During SQL Server 2000 and 2005 development 8 CPUs per NUMA node was a high-end system.    Hardware advancements have 18 cores in a single NUMA node today and expose SMP like scalability issues within a single NUMA node.

 

Even today, Soft NUMA can be used to divide a physical node into multiple logical nodes presenting a different layout to the entire SQL Server and adjusting the partitioning to optimize scalability and performance.  Microsoft recommends use of Soft NUMA on the newer, large CPU NUMA system deployments to increase performance.

 

During startup, SQL Server 2016 interrogates the hardware layout and automatically configures Soft NUMA on systems reporting > 8 CPUs per NUMA node.   The partitioning triggers various adjustments throughout the database engine improving scalability and performance.  The Automatic Soft NUMA logic considers logical CPU ratios, total CPU counts and other factors, attempting to create soft, logical nodes containing  8 or fewer CPUs each.

 

clip_image001[4]

 

  • SQL Error log: Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 logical processors.
  • DMV: The softnuma_configuration_desc column in sys.dm_os_sys_info can have one of the three values: OFF / ON / MANUAL

 

Your mileage may vary but, here is a testing results from the SQL Server 2016 test harness:  "With HT aware auto soft-NUMA, we get up-to 30% gain in query performance when DOP is set to the number of physical cores on a socket (12 in this case) using Automatic Soft NUMA."

 

The automatic, soft NUMA behavior is Hyperthread (HT/logical processor) aware.   When determining the optimal node layout the logical CPU information is queried and used to prevent groupings of logical only and physical only nodes which could lead to performance variations across the nodes.

 

Furthermore, many of the background processes are created within each node.   The partitioning and creation of additional nodes scales background processing.   For example, each node contains a worker to listen for network activity and performs encryption activities.   The additional nodes created with a soft NUMA configuration increases the number of listeners, scaling and network and encryption capabilities.

 

'It Just Runs Faster' - Apply SQL Server 2016 and SQL Server internally leverages SOFT NUMA partitioning to achieve double digit performance gains.

 

Nitin Verma  - Principal SQL Server Developer

Bob Dorr - Principal SQL Server Escalation Engineer

Comments (4)

  1. Balaji Ram says:

    Does configuring soft NUMA in SQL 2012 also improve performance(by about 30%) or is this only applicable for SQL 2016? We have a 60Core box with 15Cores per NUMA Node running SQL 2012.

    1. RDORR says:

      It can help but I don't think you will see as much of a gain in SQL 2012. Other work done on CMemThread (reference -T8048) and such activities combine to make SQL 16 faster.

  2. Glenn Berry says:

    The current Intel Xeon E5-2600 v4 and E5-4600 v4 families have up to 22 physical cores per processor. The current Intel Xeon E7-8800 v4 family has up to 24 physical cores per processor.

  3. Neeraj Mittal says:

    Optimized utilization of CPU cores in enhancing substantial performance without investing a penny extra Hardware in SQL 2016 !!!!

Skip to main content