Max worker threads

Saw an article online the other day that compelled me to respond.  The author was suggesting that SQL Server instances with high numbers of user connections would benefit from having their max worker threads setting increased.  He tossed around several UMS-specific terms, and newbies might actually think his contention was right, but as I read the article, it was evident that he was confusing 'user' and 'worker'.  Because I don't want to embarrass him, I won't link the article, but I do feel I should deconstruct this a bit.

First, a UMS worker and a user are two different things.  As I've written in my books and in various online articles, a single worker can service multiple users.  That is, in fact, why UMS exists.  It reduces thread context switching in the OS kernel by keeping the execution in user mode as long as possible.  It does this by having a single worker (usually a thread) service multiple end user connections (users).  To understand how this works, let's discuss UMS' architecture a bit.

A UMS scheduler object is a logical abstraction of a CPU.  SQL Server creates a scheduler object for each processor to which it has access.  If you haven't configured a specific processor affinity, this is all the processors on the box. These scheduler objects are not affinitized to specific CPUs, but, due to the way that Windows spreads a process’s workload across processors, each scheduler usually ends up being ‘soft-affinitized’ to a different CPU.

SQL Server assigns each user to a scheduler when it first connects using a simple round-robin algorithm.  On SS2K and 7.0, a user remains associated with its scheduler until it disconnects.

As work comes in from a user, it is added to the scheduler's work request queue, then de-queued and carried out by a worker.  UMS is designed such that it's typical for just one worker to be active on each scheduler at a time.  This means that it's common for one worker to carry out the work of many users simultaneously.  How does this happen?  Believe it or not, even on high volume servers, it's unusual for multiple users to submit work at exactly the same time, particularly when those users are spread across multiple processors.  As long as it's possible for one worker to carry out the work of multiple users efficiently, this is preferable because it helps prevent multiple workers (again, typically threads) from competing with one another for the same CPU.  This competition can lead to thread context switches, the very thing UMS was designed to minimize.

Second, you shouldn't increase max worker threads unless someone from Microsoft or one of its partners has advised you to.  And, even then, you might want to get a second opinion -- depending on who's talking.  Any consideration of increasing max worker threads on releases of SQL Server prior to Yukon should be based on data, not random speculation about what might make your system run faster or confusion about the difference between a user and a worker.  As I’ve written before, things like DBCC SQLPERF(umsstats) should give a strong indication that you are running out of workers before you even think about something like this.  Further, the fact that you're running out of workers doesn't necessarily mean that you should increase max worker threads.  If you were regularly running out of virtual memory, would your first reaction be to stick more RAM in the box?  Of course not.  You'd first try to find out why you're running low.  Once you had eliminated things like leaks and garden-variety waste, you might then look at more extreme options such as adding memory to the box, but that would be way down the list.  The same should be true of increasing max worker threads.  If you are truly running out of workers, it makes good sense to find out why before you begin tweaking the system configuration.

Understand that increasing max worker threads has some negative side effects.  For one thing, SQL Server uses max worker threads to calculate the size of the MemToLeave region on startup because it must set aside thread stack space for workers.  On SQL Server 2000, this means that, besides the 256 MB that is set aside for MTL by default, an additional 127.5 MB is set aside for worker thread stacks (255 workers * .5MB stack space per thread). This ~384 MB is reserved, then freed at startup in order to set it aside for later use. This causes it to be ignored by the buffer pool, SQL Server’s primary memory cache and the source for most memory allocation within the server. If you increase max worker threads, you drive up the amount of virtual memory that must be set aside for thread stacks. And when you do that, you deprive the bpool of memory it could otherwise use for things you might need – caching data and index pages, for example. If those extra workers aren’t actually needed (and they almost always aren’t), you’re effectively constraining the bpool for no good reason. So, not only is increasing max worker threads not helping performance, it’s actually hindering it – and doing so in a subtle way that may not be easy to detect.

The moral of the story is this: don’t mess with max worker threads unless Microsoft or one of its partners advises you to. Even then, do your homework, get a second opinion, and be sure it’s really the best way to use your finite system resources. Normally, there will be many, many things ahead of tweaking max worker threads on your SQL Server performance checklist.