An in-depth look at SQL Server Memory–Part 1

I know that memory management in SQL Server is one area that’s a bit of an enigma for a lot of people, and most of us only tend to know as much about memory as is related to our day-to-day activities. In this post(and others in this series), I shall seek to do a deep dive into SQL Server memory management, and give you as complete a picture as possible.

Let’s start off by understanding a few terms:

VAS

Virtual Address Space. Windows uses Virtual addresses to allocate memory to a process, and the virtual address to physical address mapping is taken care of by the OS. For details on the need for using Virtual Addresses, please refer to the following technet article:

https://technet.microsoft.com/en-us/library/cc767886.aspx

   

On a 32 bit system, the max address that can be referenced is 2^32 (since each bit can reflect can reflect either a "set" state or a "reset state"), which amounts to ~4 GB. Thus, the VAS on a 32 bit system is 4 GB, of which 2 GB is for the OS Kernel, and 2 GB is allocated to each process. This means that each process can potentially grow up to 2 GB in terms of VAS usage.

 

/PAE

Stands for Physical Address Extension. Basically, on 32 bit systems, it enables the use of 36 bit pointers (instead of the default 32 bit ones) by utilizing the underlying hardware. Using 36 bit pointers means that we can now use 36 bit addresses as opposed to 32 bit ones, thereby increasing the max memory the OS can "see" to 64 GB (2^36). If you want to utilize more than 4 GB of RAM on a 32 bit server, then you have to use the /PAE switch in the boot.ini OS file.

 

/3GB

The /3GB switch changes the default break-up of the VAS, giving 3 GB to applications (such as SQL) which are Large Address Aware, and leaving 1 GB for the OS kernel. Keep in mind that setting the /3GB switch means that the OS Kernel can then only "see" up to 16 GB of physical memory. More on the /3GB switch here.

 

/USERVA

This switch is used to fine tune the VAS usage by applications to between 2 and 3 GB, and is added in the boot.ini as well.

 

Bpool

Short for Buffer Pool. SQL memory can be divided into 2 parts, BPool and MTL/Non-BPool. The BPool area caters to all memory requests upto 8 KB in size. Since the size of a page in SQL is 8KB, this basically means that all data and index page allocation requests are catered to from the BPool, as are Large Pages. The Max Server Memory setting up to SQL 2008 R2 caps only the BPool area.

 

MTL/Non-BPool

All requests for memory greater than 8KB are catered to from the MTL/Non-BPool area. This area also includes memory used for COM Objects, CLR Code, Extended Stored Procedures, Large cached plans, etc. Leaks by these non-SQL components can also cause SQL memory usage to bloat and eventually lead to an OOM (Out Of Memory) condition.

 

AWE

Stands for Address Windowing Extensions. There's a specific set of AWE API's used to allocate AWE memory. This feature has different uses in 32 and 64 bit. AWE can only be used if the account under which SQL Service is running (the "Service Account") has the "Lock Pages in Memory" privilege granted to it in gpedit.msc.

32 bit: In 32 bit systems, enabling AWE basically helps you take advantage of the fact that "fetching from RAM is faster than fetching from Disk". Only if the RAM on the server is greater than the VAS (4 GB) shall SQL be able to utilize AWE. Using the AWE API's, SQL allocates memory, fetches pages (data and index pages only) into RAM, and then maps/unmaps them into the BPool as needed. To put it simply, we create a "window" in the BPool VAS which is used to map/unmap data and index pages stored in the AWE allocated region.

 

64 bit: If SQL has the Lock Pages in Memory privilege, then it will try and allocate some amount of memory through AWE API's. The benefit is that this memory cannot be paged out by the Operating System as part of a working set trim operation.

The VAS windowing concept does not come into picture here because on 64 bit, we have virtually unlimited VAS.

Please note that the AWE memory is not part of the working set, which is why it will not be a candidate for "working set trimming" by the OS in case of server level memory pressure. This is true for both 32 bit and 64 bit environments. 

Memory Architecture

Now we get to the interesting stuff. Let's understand the major components in the SQL Memory architecture:

Memory Node: A memory node is a logical division of memory, mapped on top of a NUMA node. In English, this means that if you have 2 NUMA nodes on your server, there will be 2 memory nodes as well. If you do not have NUMA, then there will be just one memory node.

Memory Allocator: All memory allocation on the memory nodes have to go through memory allocator routines tied to the Memory Nodes. Basically, memory requests to a Memory Node will have to land up with the Memory Allocators in order to be honored. This is because the it's the Memory Allocator routines that know the various types of Windows API's to be called for different kinds of allocation requests. The allocator routines have code for allocating Pages (used for single, multi and large page requests), Virtual allocator, and Shared memory allocator.

The virtual allocator uses VirtualAlloc() and AWE API’s to allocate memory. More about these later in this post. The multi-page allocator also uses the Virtual Allocator to honor requests for multiple pages. 

Memory Clerks: The most crucial in the memory architecture perhaps, is the Memory Clerks component. The major memory consumers in SQL have their own memory clerks, and we use the Memory Clerks to track memory usage by component. The memory clerks can be divided into the following categories, based on the larger structures that house them in memory:

  • Generic: Includes the Buffer Pool, CLR, Optimizer and Xevent Clerks. The generic clerks do not use the SQL OS caching infrastructure, but still have the ability to respond to memory pressure.
  • Cache Store : The Procedure Cache and System Rowset clerks come under this bucket. Cache store clerks use multiple hash tables for lookup. So for example, if you're searching on multiple criteria, having multiple hash tables for lookup helps boost performance. These clerks also use the clock algorithm (based on LRU policy) to control the lifetime and visibility of entries. This clock algorithm enables these clerks to respond efficiently to memory pressure.
  • User Store: Includes the Token Perm and Metadata clerks. User store clerks are similar to Cache Store, but they do not use Hash Tables for lookup. The user store requires cache developer users to leverage the framework to implement their own storage semantics, i.e. they need to build their own custom logic for lookup. In a cache store, the lifetime is fully controlled by SQLOS’s caching framework. In a user store, the entry’s lifetime is only partially controlled by a store. Since the user store implements its own storage, it also participates in lifetime control. In plain English, this means that for user store clerks, the developers can develop their own logic to manage the lifetime of an entry (and hence also the response to memory pressure). They can leave the lifetime management to the caching infrastructure, or they can develop their own way to manage it.
  • Object Store/Memory Pool: Includes clerks like Locks and SNI Network Packets. The ObjectStore/Memory pool is a cache of homogenous objects (unlike user and Cache stores, which can hold heterogeneous objects). These do not have hash tables (for lookup or clock algorithms for lifetime management.

 

When a thread wants memory in SQL, it has to go to the Memory clerks to request for the same. The clerk, in turn, requests the Memory Allocators for memory (it’s not possible for a thread to interface directly with the Allocators). Another thing is that the clerks have functionality built in for responding to memory pressure. The memory allocation can be from:

 

  1. Heap/Memory Object: Used when the requirement is for a very small size (say, a few hundred bytes). Heap allocation API’s are used in some rare scenarios by SQL Server.
  2. Virtual Alloc: This is the most commonly used method of allocating memory in SQL, and involves the use of the VirtualAlloc() windows API. The primary reason for the extensive use of VirtualAlloc() is that it gives us the flexibility to manage memory in our own way. It has the capability to reserve and/or commit memory, as well as specification of access control for the pages involved. VirtualAlloc is used for honoring both single and multi-page requests.
  3. AWE: AWE API's are also used to allocate memory by SQL, as long as the Lock Pages in Memory privilege has been granted to the service account under which SQL Server is running. I've specified the uses of AWE API's on both 32 bit and 64 bit systems above.

 

Let's talk about some specific consumers here:

  1. Database Page Cache: The database page cache requests for memory from Buffer Pool, and the Buffer Pool, in turn, calls the Virtual Allocator (which, in turn, uses the VirtualAlloc() and AWE API's to try and honor the request).
  2. Backup Buffers: The backups request for memory from the SQLUtilities memory clerk, which in turn, calls the Virtual Allocator to allocate memory.
  3. Plan Cache: The plan cache requests memory from the Memory Object (which is like a heap), which in turn requests mostly for single pages using a memory clerk called SQLQUERYPLAN. The interesting thing is that from SQL 2005 onwards, all single page requests go through the Buffer Pool, which is basically code optimized for providing 8K pages. The Buffer Pool, in turn, uses the Virtual Allocator to honor the request.
    If the plan cache needs multiple pages (i.e. requests memory > 8K), then the memory clerk will directly invoke the Multi-page allocator. The multi-page allocator, in turn, uses the same VirtualAlloc() and AWE API's to allocate memory.
  4. Optimizer: The Optimizer requests memory from a mark/shrink heap (as it just uses and then releases memory), and this is tracked by a memory clerk called SQLOPTIMIZER.

 

The Buffer Pool acts as both a memory clerk and consumer because it's optimized for allocating 8K pages as well as managing a cache of 8K pages. What this means is that the Buffer Pool is good at tracking it's own memory consumption, as well as providing single pages to other consumers such as plan cache on demand. It also keeps a track of the pages it provides to other consumers (which shows up as "stolen pages" in DBCC Memorystatus).

 

So, to sum up, this is what the picture looks like:

 

 

Additional Information:

Here are some DMV's that you can use to track the memory architecture components explained above:
sys.dm_os_memory_clerks

sys.dm_os_memory_objects

sys.dm_os_memory_nodes

sys.dm_os_memory_pools

 

Please feel free to play around with these. Do refer to books online for more details on these DMV's.

 

Hope this post helps. Any comments, suggestions or feedback is welcome.