I know this is a slightly more esoteric topic, even for me, but I want to address cc:NUMA platforms, and how they matter to Windows and Windows applications. What is NUMA you ask? NUMA stands for Non-Uniform Memory Architecture. (The cc: stands for Cache Coherent, by the way, because there is non-cache coherent NUMA as well, but I won’t address that here since there are no Windows support platforms that are non-cache coherent.)
To understand why NUMA exists, we need to look at Symmetric Multiprocessing (SMP). SMP has a few core principles it is built around, and one is that every CPU in the system has an identical view of the system. Memory, I/O subsystem, and other CPU’s can all be treated the same by software. The problem comes when this assumption is no longer true. As you scale up the size of a system, it becomes harder and harder to keep everything close together, literally. The more switches and busses your data flows through, the longer it takes.
This fact means that in order to squeeze the maximum amount of performance out of the system, it behooves the OS as well as the programmer to try and keep data as close to the place where it’s needed as possible. By keeping track of which pages of memory and CPU’s have the best locality to each other, decisions can be made when threads are scheduled and memory allocated that will squeeze that extra little bit out of the system.
Until only a few years ago, this was exclusively the realm of large mainframe style computers, not the PC world. But with the introduction of the Unisys ES7000 in 2000, the PC suddenly had something to benefit by being NUMA aware. Even then, this was something that mostly concerned large scale-up server implementations, not the average user or programmer. That is, until AMD announced their unique implementation of their new Opteron and Athlon64 processors. Suddenly, any system that has more than one of those CPUs could potentially benefit from NUMA optimizations. I’ll go into why in the next entry.