I’ve received a question about parallel execution plans from a customer.
(http://technet.microsoft.com/en-gb/library/cc966425.aspx) Execution contexts for parallel plans are not cached. A necessary condition for SQL Server to compile a parallel query plan is that the minimum of the number of processors that have survived the processor affinity mask and the value of the "max degree of parallelism" server-wide option (possibly set using the "sp_configure" stored procedure) is greater than 1. Even if a parallel query plan is compiled, SQL Server’s "Query Execution" component may generate a serial execution context out of it. Any execution contexts derived out of a parallel plan — serial or parallel — are not cached. A parallel query plan, however, is cached. Why is the execution context is not cached?
The white paper referenced is a bit old but the core tenets still hold for the current versions of SQL Server as well.
When SQL Server compiles a query “plan”, it stores a structure in the procedure cache. This structure represents the recipe of how SQL Server should perform work to execute a query. It doesn’t actually execute the query. When the query is actually executed, the read-only, static plan is instantiated into a runtime plan. This runtime plan allocates memory, temp tables, and any other resources needed to execute the query. So, if you compile a plan without executing it, you get the first but not the second structure. If you run the same query twice in parallel from two different connections, you have 1 runtime instantiation for each parallel user request. They can’t use the same memory and temp tables, obviously. They just use the same recipe for how to execute the plan.
SQL Server also has the ability to generate parallel query plans. The read-only plan will include instructions to try to run a plan at a particular degree of parallelism (DOP). When the runtime plan is instantiated, the system will try to allocate all of the needed resources to execute the runtime plan at the DOP requested. Parallel query plans can run faster but consume more resources (threads, memory, cpu schedulers) than serial plans. At the time that the runtime plan is instantiated, the system tries to reserve these resources before executing the query (SQL has a complex system to manage resources beyond what the OS or a standard C++ library would provide that keeps the system from running out of memory 90% of the way through a query and also happens to improve throughput substantially). If all of the resources requested for the runtime plan are available at the time that the runtime plan is instantiated, then the runtime resources are reserved and the query is executed at the DOP specified. If there is resource contention in the system, however, it may not be possible to run the query with all resources. SQL Server can sometimes degrade the parallelism request to require fewer resources and start executing the query without waiting. This can degrade all of the way down to a serial plan. When the query finishes executing, the resources are released for other queries to be able to consume them.
Now back to the user question – “why” is this runtime plan not cached for parallel query plans? Well, they consume a lot of resources. It’s a heuristic choice made to keep the system from having to go scour for resource all the time – it’s faster for the vast majority of workloads to just get those resources when it is time to execute and release them when done. I’m sure that someone could identify a workload where things would be faster if only we had cached X at spot Y, but this isn’t a common issue we see in the field. (Often it is actually far more interesting to go after the algorithmic problem to speed up workloads by orders of magnitude – we have some features in SQL 11 that fall into this category).