An in-depth look at SQL Server Memory–Part 1


I know that memory management in SQL Server is one area that’s a bit of an enigma for a lot of people, and most of us only tend to know as much about memory as is related to our day-to-day activities. In this post(and others in this series), I shall seek to do a deep dive into SQL Server memory management, and give you as complete a picture as possible.

Let’s start off by understanding a few terms:


VAS

Virtual Address Space. Windows uses Virtual addresses to allocate memory to a process, and the virtual address to physical address mapping is taken care of by the OS. For details on the need for using Virtual Addresses, please refer to the following technet article:

http://technet.microsoft.com/en-us/library/cc767886.aspx

   

On a 32 bit system, the max address that can be referenced is 2^32 (since each bit can reflect can reflect either a “set” state or a “reset state”), which amounts to ~4 GB. Thus, the VAS on a 32 bit system is 4 GB, of which 2 GB is for the OS Kernel, and 2 GB is allocated to each process. This means that each process can potentially grow up to 2 GB in terms of VAS usage.

 

/PAE

Stands for Physical Address Extension. Basically, on 32 bit systems, it enables the use of 36 bit pointers (instead of the default 32 bit ones) by utilizing the underlying hardware. Using 36 bit pointers means that we can now use 36 bit addresses as opposed to 32 bit ones, thereby increasing the max memory the OS can “see” to 64 GB (2^36). If you want to utilize more than 4 GB of RAM on a 32 bit server, then you have to use the /PAE switch in the boot.ini OS file.

 

/3GB

The /3GB switch changes the default break-up of the VAS, giving 3 GB to applications (such as SQL) which are Large Address Aware, and leaving 1 GB for the OS kernel. Keep in mind that setting the /3GB switch means that the OS Kernel can then only “see” up to 16 GB of physical memory. More on the /3GB switch here.

 

/USERVA

This switch is used to fine tune the VAS usage by applications to between 2 and 3 GB, and is added in the boot.ini as well.

 

Bpool

Short for Buffer Pool. SQL memory can be divided into 2 parts, BPool and MTL/Non-BPool. The BPool area caters to all memory requests upto 8 KB in size. Since the size of a page in SQL is 8KB, this basically means that all data and index page allocation requests are catered to from the BPool, as are Large Pages. The Max Server Memory setting up to SQL 2008 R2 caps only the BPool area.

 

MTL/Non-BPool

All requests for memory greater than 8KB are catered to from the MTL/Non-BPool area. This area also includes memory used for COM Objects, CLR Code, Extended Stored Procedures, Large cached plans, etc. Leaks by these non-SQL components can also cause SQL memory usage to bloat and eventually lead to an OOM (Out Of Memory) condition.

 

AWE

Stands for Address Windowing Extensions. There’s a specific set of AWE API’s used to allocate AWE memory. This feature has different uses in 32 and 64 bit. AWE can only be used if the account under which SQL Service is running (the “Service Account”) has the “Lock Pages in Memory” privilege granted to it in gpedit.msc.

32 bit: In 32 bit systems, enabling AWE basically helps you take advantage of the fact that “fetching from RAM is faster than fetching from Disk”. Only if the RAM on the server is greater than the VAS (4 GB) shall SQL be able to utilize AWE. Using the AWE API’s, SQL allocates memory, fetches pages (data and index pages only) into RAM, and then maps/unmaps them into the BPool as needed. To put it simply, we create a “window” in the BPool VAS which is used to map/unmap data and index pages stored in the AWE allocated region.

 

64 bit: If SQL has the Lock Pages in Memory privilege, then it will try and allocate some amount of memory through AWE API’s. The benefit is that this memory cannot be paged out by the Operating System as part of a working set trim operation.

The VAS windowing concept does not come into picture here because on 64 bit, we have virtually unlimited VAS.


Please note that the AWE memory is not part of the working set, which is why it will not be a candidate for “working set trimming” by the OS in case of server level memory pressure. This is true for both 32 bit and 64 bit environments.
 


Memory Architecture

Now we get to the interesting stuff. Let’s understand the major components in the SQL Memory architecture:

Memory Node: A memory node is a logical division of memory, mapped on top of a NUMA node. In English, this means that if you have 2 NUMA nodes on your server, there will be 2 memory nodes as well. If you do not have NUMA, then there will be just one memory node.

Memory Allocator: All memory allocation on the memory nodes have to go through memory allocator routines tied to the Memory Nodes. Basically, memory requests to a Memory Node will have to land up with the Memory Allocators in order to be honored. This is because the it’s the Memory Allocator routines that know the various types of Windows API’s to be called for different kinds of allocation requests. The allocator routines have code for allocating Pages (used for single, multi and large page requests), Virtual allocator, and Shared memory allocator.

The virtual allocator uses VirtualAlloc() and AWE API’s to allocate memory. More about these later in this post. The multi-page allocator also uses the Virtual Allocator to honor requests for multiple pages. 

Memory Clerks: The most crucial in the memory architecture perhaps, is the Memory Clerks component. The major memory consumers in SQL have their own memory clerks, and we use the Memory Clerks to track memory usage by component. The memory clerks can be divided into the following categories, based on the larger structures that house them in memory:

  • Generic: Includes the Buffer Pool, CLR, Optimizer and Xevent Clerks. The generic clerks do not use the SQL OS caching infrastructure, but still have the ability to respond to memory pressure.
  • Cache Store : The Procedure Cache and System Rowset clerks come under this bucket. Cache store clerks use multiple hash tables for lookup. So for example, if you’re searching on multiple criteria, having multiple hash tables for lookup helps boost performance. These clerks also use the clock algorithm (based on LRU policy) to control the lifetime and visibility of entries. This clock algorithm enables these clerks to respond efficiently to memory pressure.
  • User Store: Includes the Token Perm and Metadata clerks. User store clerks are similar to Cache Store, but they do not use Hash Tables for lookup. The user store requires cache developer users to leverage the framework to implement their own storage semantics, i.e. they need to build their own custom logic for lookup. In a cache store, the lifetime is fully controlled by SQLOS’s caching framework. In a user store, the entry’s lifetime is only partially controlled by a store. Since the user store implements its own storage, it also participates in lifetime control. In plain English, this means that for user store clerks, the developers can develop their own logic to manage the lifetime of an entry (and hence also the response to memory pressure). They can leave the lifetime management to the caching infrastructure, or they can develop their own way to manage it.
  • Object Store/Memory Pool: Includes clerks like Locks and SNI Network Packets. The ObjectStore/Memory pool is a cache of homogenous objects (unlike user and Cache stores, which can hold heterogeneous objects). These do not have hash tables (for lookup or clock algorithms for lifetime management.

 

When a thread wants memory in SQL, it has to go to the Memory clerks to request for the same. The clerk, in turn, requests the Memory Allocators for memory (it’s not possible for a thread to interface directly with the Allocators). Another thing is that the clerks have functionality built in for responding to memory pressure. The memory allocation can be from:

 

  1. Heap/Memory Object: Used when the requirement is for a very small size (say, a few hundred bytes). Heap allocation API’s are used in some rare scenarios by SQL Server.
  2. Virtual Alloc: This is the most commonly used method of allocating memory in SQL, and involves the use of the VirtualAlloc() windows API. The primary reason for the extensive use of VirtualAlloc() is that it gives us the flexibility to manage memory in our own way. It has the capability to reserve and/or commit memory, as well as specification of access control for the pages involved. VirtualAlloc is used for honoring both single and multi-page requests.
  3. AWE: AWE API’s are also used to allocate memory by SQL, as long as the Lock Pages in Memory privilege has been granted to the service account under which SQL Server is running. I’ve specified the uses of AWE API’s on both 32 bit and 64 bit systems above.

 

Let’s talk about some specific consumers here:

  1. Database Page Cache: The database page cache requests for memory from Buffer Pool, and the Buffer Pool, in turn, calls the Virtual Allocator (which, in turn, uses the VirtualAlloc() and AWE API’s to try and honor the request).
  2. Backup Buffers: The backups request for memory from the SQLUtilities memory clerk, which in turn, calls the Virtual Allocator to allocate memory.
  3. Plan Cache: The plan cache requests memory from the Memory Object (which is like a heap), which in turn requests mostly for single pages using a memory clerk called SQLQUERYPLAN. The interesting thing is that from SQL 2005 onwards, all single page requests go through the Buffer Pool, which is basically code optimized for providing 8K pages. The Buffer Pool, in turn, uses the Virtual Allocator to honor the request.
    If the plan cache needs multiple pages (i.e. requests memory > 8K), then the memory clerk will directly invoke the Multi-page allocator. The multi-page allocator, in turn, uses the same VirtualAlloc() and AWE API’s to allocate memory.
  4. Optimizer: The Optimizer requests memory from a mark/shrink heap (as it just uses and then releases memory), and this is tracked by a memory clerk called SQLOPTIMIZER. 

 

The Buffer Pool acts as both a memory clerk and consumer because it’s optimized for allocating 8K pages as well as managing a cache of 8K pages. What this means is that the Buffer Pool is good at tracking it’s own memory consumption, as well as providing single pages to other consumers such as plan cache on demand. It also keeps a track of the pages it provides to other consumers (which shows up as “stolen pages” in DBCC Memorystatus).

 

So, to sum up, this is what the picture looks like:

 

SQL Memory architecture by HarshDeep

 

Additional Information:

Here are some DMV’s that you can use to track the memory architecture components explained above:
sys.dm_os_memory_clerks

sys.dm_os_memory_objects

sys.dm_os_memory_nodes

sys.dm_os_memory_pools

 

Please feel free to play around with these. Do refer to books online for more details on these DMV’s.

 

Hope this post helps. Any comments, suggestions or feedback is welcome.

 

 

 

Comments (25)

  1. Nivedita says:

    Excellent work Harsh!!! Am sure this will help many folks who work with SQL Server, it covers quite a lot of information to be picked up…!!! Cheers…!!

  2. Lonny Niederstadt says:

    Great article!  One aspect of SQL Server memory management which I've found very scant information on is disfavoring of database pages in the buffer pool.  For dbcc checkdb to disfavor its pages seems fairly straightforward, but for scans consuming more than 10% of the database cache, this doesn't seem very straightforward.  For example, does the decision to disfavor happen before the first reads for the scan?  Or are pages 'retro-disfavored' after the 10% threshold is reached?  It seems that the 10% threshold is probably of the database cache for the NUMA node local to the scan itself – is that correct?

    Thanks!

  3. Hi Lonny,

    Thanks for appreciating. Can you please elaborate on what you mean by "disfavouring" ? I will be more than happy to research and get back to you on this, as long as I know what you're looking for.

    Regards,

    Harsh

  4. Vijay Kondapally says:

    Awesome!! So many questions answered in a cogent manner. The visual representation of the memory allocation/usage is very useful.

  5. Hi Vijay,

    Many thanks for appreciating the effort. Glad to know you found it useful.

    Regards,

    Harsh

  6. Gordon Feeney says:

    Very good article. Was struggling with the concept of Memory Clerks previously.

    Regards,

    Gordon.

  7. Thanks for appreciating Gordon…!!!

  8. Hello Harsh,

    In order to prevent large table/index handling from monopolizing the bpool database cache, some operations disfavor the database pages they read, so the they are more quickly retired than other pages. Dbcc checkdb for example disfavours the pages it reads, and rumor has it full scans of tables/indexes larger than 10% of db cache size disfavour (or maybe its 10% of bpool size, or maybe db cache size/bpool size on the individual NUMA node).

    Thanks!

  9. Hi Lonny,

    Thanks for the response. I can confirm that the DBCC Checkdb does not retain pages in the Bpool when its running. This makes sense, because otherwise we'd end up fetching the entire database into memory (the checkdb checks all the pages of the db). So Checkdb fetches pages into memory, performs consistency check operations on them, and then discards them.

    As for the other operations, let me check and get back to you.

    Cheers,

    Harsh

  10. HarshDeep_Singh says:

    Hi Lonny,

    I've done some research on the topic, and I can confirm the following:

    When SQL needs to load a page into memory, it checks the available memory. If there's enough memory available, it will load the entire extent into memory (which makes it more efficient). If however, you do an operation which requires, say, a full table scan on a table for which sufficient space is not available in memory, SQL will need to discard/remove some pages(read buffers) from memory. To do this, it will take into account 2 things: one, the number of requests a page has been referenced by in the "recent past", and two, the timestamp when a page was last referenced.

    Hope this helps.

    Regards,

    Harsh

  11. Guest-New SQL DBA says:

    Hi Harsh,

    Thanks for such a wonderful article. Highly Appreciated.

    Kindly can you please resolve some of my following doubts:-

    It is being said that Buffer Pool only honors single page requests that are <=8 kb. For more than 8 kb MTL is used.

    As per the above article it says that dB page cache request for memory from buffer pool which in turn requests from virtual allocator.

    So my question is if bpool adheres requests for requests < 8kb then it should be using SIngle Page Allocator, isnt it?

    Next doubt

    if memoryclerk_sqlbufferpool is our buffer pool, then it should only be having single pages_kb column.

    what is the exact need of multi pages_kb , vm reserved and vm committed.

    Awaiting Response

    Have a good day:)

  12. HarshDeep_Singh says:

    Hi Guest,

    Thanks for showing interest in the blog.

    For your first question, up to SQL 2008 R2, Bpool "is" the "actual" single page allocator. All single page requests will have to go to the buffer pool to be honored, which is why the single page allocator actually goes to the Bpool, to that the request can be honored.

    As for your second question, i believe your questions will be answered by this KB Article:

    support.microsoft.com/…/907877

    Please email me on harshdeep_singh@hotmail.com if you'd like to take this further.

    Thanks,

    Harsh

  13. Rajesh_Devabakthini says:

    Awesome article sir.. really helps..

  14. HarshDeep_Singh says:

    Thanks a lot Rajesh for appreciating.

  15. pituach says:

    Awesome job!

  16. HarshDeep_Singh says:

    Thanks a lot Pituach…!!!

  17. hi HarchDeep says:

    we have a server with 8 NUMA nodes with * SQL instances running. Each SQL server is bound to one NUMA node using TCP port binding. how can you check if the buffer pool memory allocations comes from the corresponding NUMA node it's suppose to be bound to? we are running SQL 2008 R on Windows 2008 R2.When I run

    select * from sys.dm_os_memory_clerks where type='MEMORYCLERK_SQLBUFFERPOOL' on those 8 instances all comes up with virtual_memory_commited_KB values on memlory_node_id=1.

  18. HarshDeep_Singh says:

    hi…thanks for showing interest in the blog. You can use the MSSQL$<instancename>:Memory node group of perfmon counters to find the memory allocated locally for each NUMA node. Hope this helps.

  19. Gopalakrishnan Arthanarisamy says:

    Excellent Article, Harsh Deep Singh. Hats Off to You.

  20. HarshDeep_Singh says:

    Thanks Gopalakrishnan for your kind words. I am glad to hear you liked the blog. Cheers.

  21. Vivek says:

    Good Article , I had  search the lot of article in google but i could not got this type of explanation how memory is work with respect to sql server .  You doing as awsome work

  22. HarshDeep_Singh says:

    Thanks a lot Vivek for appreciating…!!! Cheers.

  23. Will Smith says:

    Very nice article, I keep visiting your memory related articles frequently. Is it possible for you to update this excellent information for SQL Server 2012 and SQL Server 2014 ? As we have some major changes in memory management module from SQL 2012 onward and it would be great if these articles can be updated.

  24. HarshDeep_Singh says:

    Thank you so much Will for your kind words and for your feedback. I will try to publish another set of blog posts for SQL 2012 and SQL 2014. Thanks.

  25. Senthamizhmani.S says:

    Thank you  so much.. very nice article ,This is possible for  you to update this excellent information about sqlserver memory management concepts.