The importance of passing the WT_EXECUTELONGFUNCTION flag to QueueUserWorkItem


One of the flags to the QueueUserWorkItem function is WT_EXECUTELONGFUNCTION. The documentation for that flag reads

The callback function can perform a long wait. This flag helps the system to decide if it should create a new thread.

As noted in the documentation, the thread pool uses this flag to decide whether it should create a new thread or wait for an existing work item to finish. If all the current thread pool threads are busy running work items and there is another work item to dispatch, it will tend to wait for one of the existing work items to complete if they are "short", because the expectation is that some work item will finish quickly and its thread will become available to run a new work item. On the other hand, if the work items are marked WT_EXECUTELONGFUNCTION, then the thread pool knows that waiting for the running work item to complete is not going to be very productive, so it is more likely to create a new thread.

If you fail to mark a long work item with the WT_EXECUTELONGFUNCTION flag, then the thread pool ends up waiting for that work item to complete, when it really should be kicking off a new thread. Eventually, the thread pool gets impatient and figures out that you lied to it, and it creates a new thread anyway. But it often takes a while before the thread pool realizes that it's been waiting in vain.

Let's illustrate this with a simple console program.

#include <windows.h>
#include <stdio.h>

DWORD g_dwLastTick;

void CALLBACK Tick(void *, BOOLEAN)
{
 DWORD dwTick = GetTickCount();
 printf("%5d\n", dwTick - g_dwLastTick);
}

DWORD CALLBACK Clog(void *)
{
 Sleep(4000);
 return 0;
}

int __cdecl
main(int argc, char* argv[])
{
 g_dwLastTick = GetTickCount();
 switch (argc) {
 case 2: QueueUserWorkItem(Clog, NULL, 0); break;
 case 3: QueueUserWorkItem(Clog, NULL, WT_EXECUTELONGFUNCTION); break;
 }
 HANDLE hTimer;
 CreateTimerQueueTimer(&hTimer, NULL, Tick, NULL, 250, 250, 0);
 Sleep(INFINITE);
 return 0;
}

This program creates a periodic thread pool work item that fires every 250ms, and which merely prints how much time has elapsed since the timer was started. As a baseline, run the program with no parameters, and observe that the callbacks occur at roughly 250ms intervals, as expected.

  251
  501
  751
 1012
^C

Next, run the program with a single command line parameter. This causes the "case 2" to be taken, where the "Clog" work item is queued. The "Clog" does what its names does: It clogs up the work item queue by taking a long time (four seconds) to complete. Notice that the first callback doesn't occur for a whole second.

 1001
 1011
 1021
 1021
 1252
 1502
 1752
^C

That's because we queued the "Clog" work item without the WT_EXECUTELONGFUNCTION flag. In other words, we told the thread pool, "Oh, don't worry about this guy, he'll be finished soon." The thread pool wanted to run the Tick event, and since the Clog work item was marked as "fast", the thread pool decided to wait for it and recycle its thread rather than create a new one. After about a second, the thread pool got impatient and spun up a new thread to service the now-long-overdue Tick events.

Notice that as soon as the first Tick event was processed, three more were fired in rapid succession. That's because the thread pool realized that it had fallen four events behind (thanks to the clog) and had to fire the next three immediately just to clear its backlog. The fifth and subsequent events fire roughly on time because the thread pool has figured out that the Clog really is a clog and should be treated as a long-running event.

Finally, run the program with two command line parameters. This causes the "case 3" to be taken, where we queue up the Clog but also pass the WT_EXECUTELONGFUNCTION flag.

  251
  511
  761
 1012
^C

Notice that with this hint, the thread pool no longer gets fooled by the Clog and knows to spin up a new thread to handle the Tick events.

Moral of the story: If you're going to go wading into the thread pool, make sure you play friendly with other kids and let the thread pool know ahead of time whether you're going to take a long time. This allows the thread pool to keep the number of worker threads low (thus reaping the benefits of thread pooling) while still creating enough threads to keep the events flowing smoothly.

Exercise: What are the consequences to the thread pool if you create a thread pool timer whose callback takes longer to complete than its timer period?

Comments (17)
  1. John says:

    When I’ve looked at any of these thread pool API’s, I’ve wondered how is the thread pool created, who creates it and how is it maintained?

    Are there any API’s that allow you to look at or interact with the thread pool?

  2. Ben says:

    John: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/thread_pooling.asp describes when the queue is created. I don’t know how you can query or cancel work items in the queue.

    However, one redneck way to count the number of work items in the queue is to simply increment a variable when you call QueueUserWorkItem(), and decrement it when your job runs.

    Similarly, you could implement cancelling jobs by reserving a bit in the data structure you use to initialize each work item that indicates if the work item should simply return. Anyone wanting to cancel that work item simply sets that bit, and when the work item notices it goes away.

    Of course all this requires that you program your stuff well. ;-)

  3. Luther says:

    I’m just curious – in this sample code – why TCHARs or _tmain are not used – being that seems to be all the rage in windows literature …

    Maybe Raymond addressed this in an earlier post? Doesn’t really complicate the snippet does it? Especially when I notice __cdecl and the windows specific types?

    Just curious. I’m a huge fan of the blog. I’m probabably reading too much into the snippet. Admittedly, I’m a weekend windows programmer – so I tend to read from gurus like a bit too closesly. I guess I’m slightly wondering if Raymond knows something I’ve not realized or not been introduced to yet. Like, the OS isn’t natively UNICODE something that’d astound me … maybe he’s just writing to be pre win2k compatible?

    At any rate, reminds me of advanced programming books using

    for(int i = 0; i < x; i++)

    when they could just as easily have written it as

    for(int i = 0; i < x; ++i)

    Many thanks … not at all trying to be offensive in this post. Just a newbie to Raymond.

  4. Since the code doesn’t manipulate file names, receive input, and is hard-coded English, adding Unicode support would just distract from the point of the article. (I don’t see why the difference between i++ and ++i is important in the example above; they both compile to the same object code.)

  5. Luther says:

    Thanks. That answers my question.

    My (++i) vs (i++) example was more about semantics and intention than actual object code.

    Just getting a feel for your style. Again, thanks and please keep up the great insights.

    -Luther

  6. hahahahhaha says:

    nobody can solve the exercise??

  7. Tim Smith says:

    ++i vs i++ is all about the final object code. If you don’t understand the difference between the two, you will not understand when ++i is better than i++.

  8. Norman Diamond says:

    Exercise: What are the consequences to the

    > thread pool if you create a thread pool

    > timer whose callback takes longer to

    > complete than its timer period?

    I’ve only seen this happen in VB so I don’t know whether the following is a Windows answer or a VB answer. What happened was that the callback was reentered and wasn’t designed for reentrancy and trashed its own variables.

    When I write code with a timer in either VB or VC++, I kill the timer at the start of the callback and start the timer again before the callback returns. So I never had to observe a real answer to this exercise ^_^

    Saturday, July 23, 2005 11:48 AM by Luther

    > At any rate, reminds me of advanced

    > programming books using

    > for(int i = 0; i < x; i++)

    > when they could just as easily have written

    > it as

    > for(int i = 0; i < x; ++i)

    Yeah? Well what about advanced programming books using

    for(int i = 0; i < x; ++i)

    when they could just as easily have written it as

    for(int i = 0; i < x; i++)

    It mattered in object code 30 years ago on the particular machine where C was first implemented. But even there, if the value of the incrementing expression wasn’t being used as an operand in a larger expression, it still didn’t matter because the compiler generated a simple increment instruction. Post-increment and pre-decrement took advantage of PDP-11 hardware accelerators only when they were used as array subscripts.

  9. James Risto says:

    I am trying to compile this with VS2003, and I get "error C3861: ‘QueueUserWorkItem’: identifier not found, even with argument-dependent lookup". Perhaps I am not config right for multi-thread? Do I need ATL? If I hover over the call, I get a parm list, so something is finding it.

  10. Luther says:

    :)

    OK – OK, Norman … turns out to be a bad example.

    My apologies as it sounds like I’ve offended you. My question boiled down to why Raymond was using chars in a clearly win32 program … WHICH!! … Raymond answered for me.

    I wasn’t sure if it was intentionally or just for didactic purposes.

    There is no need to expand on his answer.

    Thanks

  11. Eric says:

    "What are the consequences to the thread pool if you create a thread pool timer whose callback takes longer to complete than its timer period?"

    In .Net, the timer function will be called successively. If the timer function uses only local variables, or if it’s properly synchronized (which it ought to be), then at least you won’t trash anything.

    However, what you will do (if the timer handler is always longer than the interval) is exhaust the thread pool, at which point your timer function will essentially run repeatedly, forever, as new calls are getting queued up faster than existing calls get completed.

    This can have really unfortunate effects on the rest of the application, since there are then no pool threads for anything else, either. (The program will appear to hang, only to process those events a random period of time later.)

  12. Norman Diamond says:

    Monday, July 25, 2005 2:46 PM by Luther

    > My apologies as it sounds like I’ve offended

    > you.

    It looks like there was some misunderstanding involved. In the past there really have been religious arguments over choices of prefix vs. postfix operators in cases where it didn’t matter. Sometimes agnostics can figure out why religious arguments arise but in this case I couldn’t even figure out why. Anyway it looked like you were bringing it here, so I balanced it. Rest assured that if you had posted for the opposite side then I would have posted for the opposite’s opposite in exactly the same way ^_^

  13. Luther says:

    No problem …

    My thought was that by definition, the postfix operator returned the old value and that in general, if I didn’t need the old value – I ought to use the prefix operator. Period.

    If nothing else, I thought this was sort of a "self-commenting" idiom. I do realize that the compiler is "smart" and may do the optimum thing in either case, but I considered it better form, explicitly clearer … I consider that the more hints I can include in my code, all the better …. for both the person reading my code and the compiler compiling my code. "Yes, I choose prefix notation here bcs I absolutely do not need the old value."

    For me, prefix notation here was about keeping the code TEXT more closely consistent with the intended result in the object code.

    But it is clear that practically – it makes no difference. It is also clear that many developers would have no problem understanding intention with either notation (given the context). What really helped nail the coffin was this comment from Kernighan and Ritchie:

    "Section 2.8 … In a context where no value is wanted, just the inrementing effect, as in

    if (c == ‘n’)

    n1++;

    prefix and postfix are the same."

    So for my benefit, even from a strictly "language semantic" standpoint, there is no difference. "for(…i++)" is not only optimized by the compiler, but by language definition (or at least, K&Rs suggestion), it is the same operation as "for(…++i)".

    Many thanks to all who made this revelation possible ;-) I may continue to use prefix notation in for loops but I realize that it isn’t always necessary – so I will no longer tease my coworkers ;-)

  14. Owen Cunningham says:

    Raymond, do you know why the .NET equivalent of QueueUserWorkItem does not support a WT_EXECUTELONGFUNCTION flag?

    Also, do you know why the .NET FCL bothers to expose UnsafeQueueUserWorkItem? The documentation describes the security risks of doing this, but not the benefit (of which there presumably is some).

  15. Um, read the subtitle of the blog again? Try asking somebody who works on .NET.

  16. Bryan says:

    Owen — the documentation for UnsafeQueueUserWorkItem says that the difference is that QueueUserWorkItem’s worker thread "inherits" the stack of the caller (the caller of QueueUserWorkItem, that is) when the thread starts executing the work-item. The Unsafe version does not "inherit" the stack. ("… does not propagate the calling stack onto the worker thread…")

    This only matters when the code has security requirements set, and it’s doing full stack walks instead of just link checks. (Of course, "you can’t trust the return address" anyway, but apparently the .net people didn’t figure that out, or maybe there’s a reason that the issues mentioned in Raymond’s "you can’t trust the return address" blog entry don’t apply to .net.)

    As for why it doesn’t support the long-function flag… I don’t know, I’m not on the design team. I just use it. ;-)

  17. earhart says:

    Note that WT_EXECUTELONGFUNCTION won’t really do much of anything in Longhorn.

    It turned out that a lot of people were just using it to spin up threads faster, since the threadpool was throttling pretty aggressively. So now it just spins up threads very quickly all the time; it tries hard to fully utilize the available processor bandwidth.

Comments are closed.