This blog provides an example of turning a customer escalation to Tiger team into a successful engagement and using the learnings to deliver improvements in the SQL Server product to scale on modern hardware for specific workload.
As hardware trends change over the years, customer environments move to newer hardware with larger number of cores per socket and TBs of memory, performance issues related to NUMA memory access and database lock contention become predominant. In addition to this, the democratization of storage has resulted in superfast IO technologies such as FusionIO, SSD etc that perform 5-10x faster than a SAN at a fraction of the cost. What this means is IO is no longer a bottleneck. With scale-up hardware, customer expectations increase with the assumption that SQL Server applications will perform better and faster with more transaction throughput.
A major class of contention issues in SQL Server are attributed to excessive waits on thread safe memory objects (CMEMTHREAD) which many workloads experience on modern hardware. The next section describes the problem as well as the solution that addresses this in SQL Server.
Dynamic partitioning of thread safe PMOs
Heap allocators, called memory objects in SQL Server, allow to allocate memory from the heap. CMemThread is a thread-safe memory object allowing for concurrent memory allocations from multiple threads. For correct bookkeeping, CMemThread objects rely on synchronization constructs, a mutex in that case, to ensure only a single thread is updating critical pieces of information at a time.
A major disadvantage, however, is that the use of mutexes can lead to contention if many threads are allocating from the same memory object in a highly concurrent fashion. To that end, SQL Server introduced the concept of a partitioned memory object where each partition is represented by a single CMemThread object. The partitioning of a memory object as described above is statically defined in code and cannot be changed after creation. As memory allocation patterns vary widely based on aspects like hardware and memory usage, it is impossible to come up with the perfect partitioning pattern upfront. In the vast majority of cases no partitioning (single partition) will suffice but some scenarios may lead to contention (see the “old experience” on the left side of the illustration below) that can be prevented only with a highly partitioned memory object. It is not desirable to partition each memory object as more partitions result in inefficiencies and increase memory fragmentation.
Dynamic promotion of memory objects improvement introduces a new mechanism (see the “new experience” on the right side of the illustration below) that is able to
1. Determine bottlenecks caused by memory object contention at runtime, i.e., detect contention, and
2. Dynamically increase number of partitions at runtime. The partitioning scheme first creates as many CMEMTHREAD objects as the number of nodes and if the contention is still not reduced, SQL Server will dynamically increase the number of CMEMTHREAD objects to match number of logical cores the instance is using.
In tests done in a lab environment, the performance monitor chart below shows up to 3x improvement in throughput from 3000 batch requests/sec to 8000 batch requests/sec (green line) and 60x reduction in waits from an average of 57000 waits/sec to 0 waits/sec (red line).
Detecting contention on CMEMTHREAD
The sys.dm_os_memory_objects DMV shows detailed information about various memory objects that are currently allocated by SQL Server.
In the example below, you can see that MEMOBJ_XP memory object (used for memory allocation by XPs) is partitionable (partition_type=1) but no contention on the object has occurred yet. We have added three new columns (contention_factor, waiting_tasks_count, exclusive_access_count) to this DMV to indicate if there is contention and how many threads are waiting. Since the partition_type is 1, it indicates that though multiple copies of this can be created, only one is created by default for the entire instance.
In addition you can query sys.dm_os_waitstats for CMEMTHREAD wait_type to see how many tasks are waiting on this mutex.
select * from sys.dm_os_wait_stats where wait_type = 'CMEMTHREAD'
As the workload increases, you can see that the output of both queries now show heavy contention(1000s of threads waiting) on MEMOBJ_XP as every allocation request is waiting to acquire CMEMTHREAD mutex. This is because all the threads in the entire instance have to share the single copy of the mutex.
With dynamic partitioning of memory objects in SQL Server 2016, and in SQL Server 2014 Service Pack 2 (SP2), the contention is greatly reduced as SQL Server engine detects this condition and automatically partitions (i.e., creates multiple copies of the memory object as there are number of NUMA nodes first and if that is not sufficient as many copies as number of logical cores are created). With this you can see the contention is greatly reduced. partition_type 3 indicates object is partitioned by number of logical cores.
Detecting Dynamic Memory Object Partitioning
A new extended event sqlos.pmo_promotion is available to help with detecting if dynamic memory object partitioning is occurring. This event will fire when
1. an un-partitioned CMEMTHREAD object (i.e. only one copy exists at the instance level when it starts) is partitioned by node after contention is detected on the object
2. a node partitioned CMEMTHREAD object (i.e. as many copies as number of nodes the instance is using) is partitioned by cores after contention is detected on any node.
Sample Extended Event session:
The output from the extended event sessions shows when the contention was detected, which node and scheduler it was on to trigger partitioning of the memory object by node. Since the contention continued to occur with with node partitioning, SQL Server automatically partitioned the object by CPU (logical cores) and the contention was greatly reduced.
|pmo_promotion||6/25/16 8:52 AM||2.187220097||2.187220097||PartitionedByNode||MEMOBJ_XP||3||0|
|pmo_promotion||6/25/16 8:52 AM||0.981935024||1.963870049||PartitionedByCpu||MEMOBJ_XP||36||1|
Ajay Jagannathan (@ajayMSFT)
Principal Program Manager