Fewer Hardware Locks and Greater Parallelism

Windows Server 2008 R2 (aka Windows 7) has been optimized to run more efficiently on modern CPU architectures.   Minimizing resource contention and maximizing concurrency reduces overall system latency and increases server performance.   In the case of resource locks, several strategies exist including lock elimination, resource partitioning, or even designing faster lock scenarios.   One such significant improvement is the removal of the kernel dispatcher lock.

The kernel is responsible for managing the scheduling of threads on multiple processors, management of the underlying synchronization facilities in the system (e.g., events, semaphores, completion ports, etc.), management of timers and deferred procedure calls, as well as coordinating with memory management to manage the in swap and out swap of thread kernel stacks and process memory.  To properly manage the state transitions associated with this diverse set of functionality, the kernel has relied on the use of a variety of internal synchronization techniques, including lock-free algorithms as well as an assortment of spinlocks.  Though the latter protected relatively coarse grained sections of code in the early days of NT, this was acceptable as it afforded simplicity to the kernel design and posed minimal contention as the number of processors in even high-end server configurations was limited.  As the scalability demands on the NT kernel have increased, many improvements were made relative to synchronization granularity in the kernel.  However, the kernel dispatcher lock has persisted since NT’s inception in order to ensure proper coordination of the activities outlined above.  Since it is fairly large in scope, the dispatcher lock was the most contended spinlock in the kernel and, depending upon the workload, a significant scalability consideration on systems with more than 16 processors.

With Windows 7, the dispatcher lock is replaced with several finer-grained synchronization techniques thus effectively distributing resource contention.  The main benefits for applications include increased system performance and more optimal use of available hardware resources.