So you need a worker thread pool...

And, for whatever reason, the NT’s built-in thread pool API doesn’t work for you.

Most people would write something like the following (error checking removed to reduce typing (and increase clarity)):

class WorkItem
{
LIST_ENTRY m_listEntry;
:
:
};

class WorkerThreadPool
{
HANDLE m_heventThreadPool;
CRITICAL_SECTION m_critsThreadPool;
LIST_ENTRY m_workItemQueue;

void QueueWorkItem(WorkItem *pWorkItem)
{
//
// Insert the work item onto the work item queue.
//
EnterCriticalSection(&m_critsWorkItemQueue);
InsertTailList(&m_workItemQueue, pWorkItem->m_listEntry);
LeaveCriticalSection(&m_critsWorkItemQueue);
//
// Kick the worker thread pool
//
SetEvent(m_heventThreadPool);
}
void WorkItemThread()
{
while (1)
{
//
// Wait until we’ve got work to do
//
WaitForSingleObject(&m_heventThreadPool, INFINITE);
//
// Remove the first item from the queue.
//
EnterCriticalSection(&m_critsWorkItemQueue);
workItem = RemoveHeadList(&m_workItemQueue);
LeaveCriticalSection(&m_critsWorkItemQueue);
//
// Process the work item if there is one.
//
if (workItem != NULL)
{
<Process Work Item>
}
}
}
}

I’m sure there are gobs of bugs here, but you get the idea. Ok, what’s wrong with this code? Well, it turns out that there’s a MASSIVE scalability problem in this logic. The problem is the m_critsWorkItemQueue critical section.  It turns out that this code is vulnerable to condition called “lock convoys” (also known as the “boxcar” problem).  Basically the problem occurs when there are more than one threads waiting on the m_heventThreadPool event.  What happens when QueueWorkItem calls SetEvent on the thread pool event?  All the threads in the thread pool immediately wake up and block on the work queue critical section.  One of the threads will “win” and will acquire the critical section, pull the work item off the queue and release the critical section.  All the other threads will then wake up, one will successfully acquire the critical section, and all the others will go back to sleep.  The one that woke up will see there’s no work to do and will block on the thread pool.  This will continue until all the work threads have made it past the critical section.

Essentially this is the same situation that you get when you have a bunch of boxcars in a trainyard. The engine at the front of the cars starts to pull. The first car moves a little bit, then it stops because the slack between its rear hitch and the front hitch of the second car is removed. And then the second car moves a bit, then IT stops because the slack between its rear hitch and the front hitch of the 3rd card is removed. And so forth – each boxcar moves a little bit and then stops. And that’s just what happens to your threads. You spend all your valuable CPU time executing context switches between the various threads and none of the CPU time is spent actually processing work items.

Now there are lots of band-aids that can be applied to this mechanism to make it smoother. For example, the m_heventThreadPool event could be an auto-reset event, which means that only one thread would wake up for each work item.  But that’s only a temporary solution - if you get a flurry of requests queued to the work pool, you can still get multiple worker threads waking up simultaneously.

But the good news is that there’s an easier way altogether. You can use NT’s built-in completion port logic to manage your work queues. It turns out that NT exposes a really nifty API called PostQueuedCompletionStatus that essentially lets NT manage your worker thread queue for you!

To use NT’s completion ports, you create the port with CreateIoCompletionPort, remove items from the completion port with GetQueuedCompletionStatus and add items (as mentioned above) with PostQueuedCompletionStatus.

PostQueuedCompletionStatus takes 3 user specified variables, one of which which can be used to hold a 32 bit integer (dwNumberOfBytesTransferred), and two of which can be used to hold pointers (dwCompletionKey and lpOverlapped). The contents of these parameters can be ANY value; the API blindly passes them through to GetQueuedCompletionStatus.

So, using NT’s completion ports, the worker thread class above becomes:

class WorkItem
{
:
:
};

class WorkerThreadPool
{
HANDLE m_hcompletionPort;

void QueueWorkItem(WorkItem *pWorkItem)
{
PostQueuedCompletionStatus(m_hcompletionPort, 0, (DWORD_PTR)pWorkItem, NULL);
}

void WorkItemThread()
{
while (1)
{
GetQueuedCompletionStatus(m_hCompletionPort, &numberOfBytes, &pWorkItem, &lpOverlapped, INFINITE);
//
// Process the work item if there is one.
//
if (pWorkItem != NULL)
{
<Process Work Item>
}
}
}
}

Much simpler. And as an added bonus, since NT’s managing the actual work queue in the kernel, it allows NT to eliminate the lock convoy in the first example.

 

[Insert std disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights]