Understanding SQL server memory grant

This article describes how query memory grant works in Microsoft SQL Server. It applies to both SQL2005 and SQL2008. Written by Jay Choe, Microsoft SQL Server Engine.

--------------------------------------------------------------------------------

Query memory grant (a.k.a. query work buffer) is a part of server memory used to store temporary row data while sorting and joining rows. It is called "grant" because the server requires those queries to "reserve" before actually using memory. This reservation improves query reliability under server load, because a query with reserved memory is less likely to hit out-of-memory while running, and the server prevents one query from dominating entire server memory. In the following sections, I will describe how SQL server estimates query memory requirement, and how it throttles memory grants when multiple queries are competing.

When SQL server receives a user query, it follows well-defined steps to produce a result for user. First, it creates a compiled plan, which is a set of logical instructions such as how to join rows. Next, it creates an execution plan based on the compiled plan. This execution plan contains instructions with all logical references translated to actual objects, and tracking mechanisms for query execution. Finally, the server starts execution from the top of instruction tree. Creating a compiled plan is expensive because the server needs to find the optimal plan out of hundreds of potential candidates. Distinguishing between compile and execution helps overall server performance because each compiled plan can be cached and shared among multiple execution plans. The memory grant estimate follows the same overall sequence. It has parameters saved in compiled plan, and a mechanism to calculate actual grant size at execution time.

Memory consumers
Now that I briefly covered the lifetime of query execution, I would like to show where query memory grant fits in overall server memory usage. As hinted in the previous paragraph, a successful query execution involves 3 major memory consumers: compile, cache, and memory grant.

  • Compile (query optimization): Building and searching the most optimal plan out of hundreds candidates typically requires significant amount of memory. The lifetime of this usage is typically short because optimizer releases memory as soon as optimal plan is found. Lack of available memory would cause delays in compile, and potentially inefficient (slow) plans.
  • Cache: Finding optimal plan is costly in terms of CPU and memory usage. SQL server tries to store compiled plans in caches for later reuse. Lifetime of this memory usage is long-term. Lack of cache memory would cause more unnecessary re-compiles.
  • Memory grant: This memory is used to store temporary rows for sort and hash join. The lifetime of memory grant is the same as the lifetime of query. Lack of available memory grant causes a query to use hard disk, which affects query performance.

SQL Server maintains the balance between these 3 consumers with internal facility called "memory broker". Based on the usage and physical memory available, the memory broker sets the usage limit and tells each component to trim its memory usage if it anticipates a shortage. Generally, a well behaving server would have about the same contributions from these consumers.

Grant parameters
When SQL server creates a compiled plan, it calculates two memory grant parameters called "required memory" and "additional memory".

  • Required memory: Minimum memory needed to run sort and hash join. It is called required because a query would not start without this memory available. SQL server uses this memory to create internal data structures to handle sort and hash join.
  • Additional memory: Amount needed to store all temporary rows in memory. This depends on the cardinality estimate (expected number rows and average size of row). This is called additional because a query can survive lack of such memory by storing part of temporary rows on hard disk. A query is not guaranteed to have the full amount if the total exceeds preset limit.

For example, let's consider a simple query which needs to sort 1 million rows of 10 bytes each in size. The required memory for this query is 512KB because this is the minimum amount SQL server needs to construct internal data structures to handle one sort. Since it would take 10MB to store all rows, the additional memory would be 10MB (slightly higher when overhead is included). This calculation becomes complicated if the compiled plan has multiple sorts and joins because SQL server also considers the lifetime of each operator for more efficient memory usage. You would generally see smaller estimate than the sum of all sorts and joins. If you are interested in the relative memory usages among operators, you can check out <MemoryFractions> tag in Showplan XML. The following sections show how these parameters are used when calculating grant size at runtime.

DOP dependence
If SQL server has more than 1 CPU, it can run a query in parallel mode for improved performance by sharing work among parallel workers. These workers run independent of each other, and use "parallelism operator (a.k.a. exchange)" to transfer processed rows. This parallel mode increases memory usage because each worker needs its own copy of sort or hash join, and the parallelism operator needs buffers for temporary storage of transferred rows. Since DOP N would use N parallel workers, the query would need N times more required memory. On the other hand, the total number of rows to handle (and memory cost to store them) does not change with DOP. This means that the additional memory would stay the same regardless of DOP setting. Starting with SQL 2008, the buffer memory used by parallelism operator is also counted as a required part of memory grant.

Memory grant process
In the previous section, we discussed how parallelism affects the query memory requirement. In this section, we will discuss how SQL server takes server memory and number of concurrent queries into considerations. The server needs to consider such dynamic factors to avoid committing memory beyond its physical limit. This is done in 2 distinct steps. First, the server calculates how much memory to grant for given query. Then it uses the internal facility called Resource Semaphore to reserve actual memory, or throttle if too many queries ask for memory. First, the following steps show how the request size is determined.

  • The server decides parallelism (DOP) based on the plan and the server state.
  • The server checks if memory grant is needed not. If not needed, the server can start the query immediately. For example, a simple serial query without "ORDER BY" or "GROUP BY" may not need memory grant.
  • The server calculates the memory limit for one query. By default, this is 25% (20% on 32bit SQL 2005) of total query memory (which is set by memory broker as about 90% of server memory). This per-query limit helps to prevent one query from dominating the whole server. This percentage is configurable on SQL 2008.
  • The server calculates the ideal query memory as required*DOP + additional (+ exchange on SQL 2008). This is amount a query would like to have, based on its cardinality estimate.
  • The server checks if the ideal memory exceeds the per-query limit. If it does, then the server reduces the additional memory until the total fits within the limit. This revised size is called requested memory.The server asks Resource Semaphore to grant the requested memory.

Resource Semaphore
Resource Semaphore is responsible for satisfying memory grant requests while keeping overall memory grant usages within the server limit.

  • Resource Semaphore allows a query to reserve memory only if there is enough free memory available. Otherwise, a requesting query is forced to wait in a queue.
  • When Resource Semaphore receives a new request, it first checks if any query is waiting. If it finds one, it puts the new query into a queue for fairness because the wait queue is designed as first-come-first-served basis.
  • Resource Semaphore checks for waiting query in its queue. If it finds one, it puts the new query into a wait queue for fairness, because the wait queue is designed as first-come-first-served basis with small weight to favor small queries.
  • Resource Semaphore makes a grant attempt if it does not find waiting query, or when existing query returns memory.
  • Resource Semaphore attempts to grant memory when there is no waiting query, or when a query returns memory.
  • Grant attempt is made when there is no waiting query, or when an existing query returns reserved memory.
  • If it finds a waiting query, it puts the current query into a waiting queue for fairness.
  • If it does not find any waiting query, it then checks available free memory.
  • If it finds enough free memory, then the requested memory is granted and the query can start running.
  • If it does not find enough free memory, then it puts the current query into the waiting queue.
  • Resource Semaphore wakes up queries in the waiting queue when enough free memory becomes available.

Debugging memory grant related issues
SQL server supplies a few dynamic management views (DMV) to help investigate memory grant related issues. Please refer BOL (books-on-line) for detailed information on DMVs discussed below.

sys.dm_exec_query_resource_semaphores
This DMV shows the current status of Resource Semaphore mentioned in the previous section. On SQL 2005, you will find 2 of them. One with non-null max_target_memory_kb column is called Regular Resource Semaphore, and the other is called Small Resource Semaphore. As its name implies, Regular Resource Semaphore is used by all queries under normal condition, while Small Resource Semaphore is used by small size queries (less than 5 MB) when they have to wait (see step 6 in the previous section). This would improve response time of small queries which are expected to finish very fast. Note max_target_memory_kb column shows the server memory limit used in step 2 of previous section.

sys.dm_exec_query_memory_grants
This DMV shows all queries consuming memory grants including those waiting in Resource Semaphore queue. Waiting queries would have null grant_time column. Resource Semaphore uses internal query cost estimate to prioritize memory grants, and is_next_candidate column shows which query to wake up when memory is available.

sys.dm_os_wait_stats
This DMV shows wait statistics of all server objects. Memory grant uses "RESOURCE_SEMAHORE" wait type. If you see significant waits on this wait type, you may have an issue with big queries.

Sample queries
The following sample queries show how memory grant DMV are used.

Find all queries waiting in the memory queue:

SELECT * FROM sys.dm_exec_query_memory_grants where grant_time is null

Find who uses the most query memory grant:

SELECT mg.granted_memory_kb, mg.session_id, t.text, qp.query_plan
FROM sys.dm_exec_query_memory_grants AS mg
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) AS t
CROSS APPLY sys.dm_exec_query_plan(mg.plan_handle) AS qp
ORDER BY 1 DESC OPTION (MAXDOP 1)

Search cache for queries with memory grants:

SELECT t.text, cp.objtype,qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
JOIN sys.dm_exec_query_stats AS qs ON cp.plan_handle = qs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
WHERE qp.query_plan.exist('declare namespace n="https://schemas.microsoft.com/sqlserver/2004/07/showplan"; //n:MemoryFractions') = 1