Sure, I’m supposed to pass WT_EXECUTELONGFUNCTION if my function takes a long time, but how long is long?


A customer contacted me to tell a story and ask a question. The customer discovered that in their code base, all calls to Queue­User­Work­Item passed the WT_EXECUTE­LONG­FUNCTION flag, regardless of whether the function actually took a long time or not. Their program creates a large number of work items at startup, and the result of passing WT_EXECUTE­LONG­FUNCTION for all of them was that the thread pool created a new thread for each queued work item, resulting in a bloated thread pool that thrashed the CPU.

When he asked the other people on his team why they were passing the WT_EXECUTE­LONG­FUNCTION flag unconditionally, they pointed to this article from 2005 on the importance of passing the WT_EXECUTE­LONG­FUNCTION flag to the Queue­User­Work­Item function.

As I've mentioned before, Good advice comes with a rationale so you can tell when it becomes bad advice, but the people who applied my advice didn't understand the rationale and merely concluded, "It is important always to pass the WT_EXECUTE­LONG­FUNCTION flag!"

The WT_EXECUTE­LONG­FUNCTION flag is two-edged. If you pass the flag when queueing a task, then the thread pool will more aggressively create a new thread when that task is running. The upside is that other tasks don't get stuck waiting for your long-running task. The downside is that this creates more threads. And if you set the flag for all of your tasks, then you don't really have a thread pool at all, since you basically told the thread pool, "Run every task on its own thread, stat!"

But this raises the question of "How long is long?" How long does a task need to run before you declare it a long-running task?

There is no magic number here.

The definition of a long-running task depends on the nature of your application. Let's consider, for concreteness, a task that takes one second. If this task is not marked as a long-running task, then the thread pool will wait for it to complete rather than creating a new thread. What are the consequences for your application of the thread pool choosing to wait for one second rather than creating a new thread? If your application doesn't generate tasks at such a high rate that a one-second pause would be a significant problem, then it's not a long-running task.

On the other hand, if your application is a service that is handling thousands of requests per second, then waiting for a one-second tasks means that a thousand tasks pile up in the meantime, and that may be enough to push your service to the brink of death because it has started falling behind on its processing and may never catch up.

Which category does your application fall in? That's for you to determine.

Comments (14)
  1. Pedrow says:

    Does it also depend a bit on how much of the task's clock time is spent actually running stuff – say if it runs for 10 minutes mostly blocked waiting for stuff coming down a slow network connection it makes sense to create more threads. If it is running at 100% CPU there's less point.

  2. Joshua says:

    I've been considering making a specialized thread pool that creates a new thread whenever it tries to allocate one and there isn't one to allocate.

  3. 640k says:

    And that's why naming (of variables, constants, functions, …) is more important than compiler errors. Compiler errors are easy to fix compared to errornously named things.

  4. Adrian says:

    I was about to write something very similar to what Pedrow said.  A task that's mostly waiting (the thus not scheduled CPU time) is a good candidate for WT_EXECUTELONGFUNCTION.  A task that's CPU bound probably isn't.

  5. Mike says:

    @Pedrow, Adrian

    Sure, if you know that the machine only has a single CPU…

  6. Ben Voigt says:

    @Mike: Doesn't matter how many cores there are.  Thread pools start with 1-2 threads per core.  If your task is CPU bound, then there's no point in running it in parallel with the other tasks, you end up slowing everything down.  This is an elementary result of queueing theory.

  7. ErikF says:

    So should WT_EXECUTELONGFUNCTION be thought of as "WT_EXECUTESLOWFUNCTION" (as in "resource-bound" slow, not "takes a long time" slow) then?

  8. I read the last part of the title first, and my first thought was "a long is 32 bits. Why are you storing a time in 32 bits?"

  9. Ben Cooke says:

    I can't help but think that it would've made things simpler to be explicit about what is going on here and call the constant something like WT_FORCESEPARATETHREAD; while hiding the implementation behind an abstraction is good in some cases, this is clearly a case where the only way to make a decision on whether to set this flag is to understand the details of what it implies and how it is implemented.

  10. Guest says:

    it should be WT_ALLOWOVERSUBSCRIPTION

  11. chrismcb says:

    But it doesn't FORCE a SEPARATE thread. It is more likely to create a new thread than if you don't use the flag.

    I'm not sure why someone thinks it is missed name. If your job is going to take a long time to execute, then set this flag. You has the programmer decide what a "long time" is.

  12. Falcon says:

    @GWO: I think we know by now how the story ends – it turns out that some software relies on the one second timeout, therefore it's bound to that value in all future Windows versions for backwards compatibility!

  13. GWO says:

    "After about a second, the thread pool got impatient and spun up a new thread to service the now-long-overdue Tick events."

    Which suggests that, internally, there *is* a magic number, but that its none of your business.

  14. Bob says:

    In 2005 year article mentioned here, the example with 2 command line parameters confuses me a little. So there will be only ONE thread in the 'pool' if flag WT_EXECUTELONGFUNCTION is never used? (Or maybe it was because example was run on a single-core CPU?)

    For example, say thread pool has 2 threads, and one task is already running (without WT_EXECUTELONGFUNCTION flag). If new task is queued, will it wait for first task to complete, or will it run on the second thread?

    And what if running task enters the wait-state? Doesn't the pool manager activates another task from the queue to keep the number of running threads in the pool at the same level? (I recall something like that is used in IOCP. I don't know if it applies here.)

Comments are closed.

Skip to main content