Tasks and Continuations: Available for Download Today!

The most difficult problem that a scheduler or a thread pool has to solve is figuring out how many running threads to have at any given moment. Create too many threads and the CPUs are oversubscribed. Create too few and the CPUs are underutilized. The straightforward setup – one thread per core – doesn’t work when threads block. During an uncooperative blocking operation, such as a call to Sleep or WaitForSingleObject, a thread is not using the CPU but is still holding on to other operating system resources, such as the stack. So the scheduler has to detect a blocking operation and quickly spin up a new thread – but not too quickly or else thread explosion ensues. This is a very hard problem to solve.

Blocking is undesirable in other situations as well. A blocked GUI thread cannot service the message loop – meaning that the application becomes unresponsive and the user sees a “spinning doughnut”. 

So what’s a programmer to do?

Typically, we use a blocking synchronization when the results of the pending operation are needed before the next operation can proceed. In effect, we’re saying to the thread: “I’ll be here waiting for you to finish, and then I’ll tell you what to do next”.

What if instead of waiting, we could say “when you’re done with this task, continue with that one, and in the meantime I’ll go away to do some other work”. Doing other work can mean processing the message loop, or scheduling other tasks – which naturally can run in parallel with each other.

This is exactly what PPL tasks and continuations, available for download today from the ConcRT Sample Pack, aim to accomplish.

Imagine that you’re implementing a build tool such as make or msbuild, which builds the projects in the dependency order. Your configuration contains projects A, B, C and D, where project C depends on project A, and project D depends on both projects A and B:


This is how you do it (you will find the full example in the sample pack):

 1: Project A = ...
 2: Project B = ...
 3: Project C = ...
 4: Project D = ...
 6: task<void> a([=]() {
 7:     Build(A);
 8: });
 10: task<void> b([=]() {
 11:     Build(B);
 12: });
 14: // Build C after A
 15: auto c = a.then([=]() {
 16:     Build(C);
 17: });
 19: // Build D after both A and B
 20: auto d = (a && b).then([=]() {
 21:     Build(D);
 22: });
 24: c.wait();
 25: d.wait();

First, you create an instance of a task a that takes a C++ lambda as a parameter in the constructor. The task starts running as soon as it is created – or, more precisely, as soon as the underlying scheduler finds an available hardware thread that can execute it.

The Build function returns void, and hence the type of the task is void. A more advanced implementation would probably return an error code, which would be the type of the task. This is how it could be done:

 1: task<int> a([=]() {
 2:     return Build(A);
 3: });


Next, you create a task b, which is also scheduled to start immediately, and begins building the project B.

Notice that the task b didn’t need to wait for the task a to finish, nor is it guaranteed to run in parallel with a. You did not specify any dependencies between the two tasks, leaving it up to the underlying Concurrency Runtime to schedule them as it sees fit – and it will schedule the tasks in parallel if enough CPU resources are available.

Next, you create a new task, c, which is a continuation of task a. The then is a function that creates a new task. The parameter of the then is a lambda, and the parameter of the lambda is the return value of the antecedent task.

In our case, this type is void, but if we were to define the task a with type int, the call to then could look like this:

 1: auto c = a.then([=](int errorcode) -> int {
 2:     if( errorcode == 0)
 3:     {
 4:         // Build C only if A succeeded
 5:         return Build(C);
 6:     }
 7:     else
 8:     {
 9:         return errorcode;
 10:     }
 11: });


The next step is to schedule the build of the project D. The operator && creates a task that joins together multiple tasks of the same type. The task returned by the join is somewhat unusual, in that it has no action. It does one thing and one thing only – join multiple tasks. Like all tasks, it can be waited on, or continued – which is how you’d typically use it.

Finally, we’re waiting for the last two tasks – c and d – to finish by explicitly waiting on them. This is a usual blocking wait, which would be undesirable in a GUI application but is perfectly fine in a command-line program.

How would you do this in a GUI application? You already know how:

 1: (c && d).then([] () {
 2:     ReportResults();
 3: });
 5: // now if you excuse me, I have messages to pump
 6: return;

Those of you keeping up with the news from our friends in the PFX team must be very well familiar with the concept of continuations. We have borrowed many ideas from the managed world, but tried hard to make the programming model feel natural for the C++ developer, and preserve the overall “look and feel” of the PPL API. Thus, you can seamlessly use tasks and continuations together with existing and new parallel algorithms, such as parallel_for and parallel_sort.

The functionality in the sample pack has a few limitations compared to the version that we eventually plan to integrate into the product, mainly because we wanted to enable a header-only release without having to ask early adopters to install a whole new CRT, which involves copying of several lib and dll files. We tried to keep these limitations to a minimum, so you can still play with the code and give us feedback.

You’ll find several samples that demonstrate uses of PPL tasks, including a few concepts that were not covered in this post. In my next installment, I’ll explain these concepts and show a few more advanced samples.

Please download the sample pack here and send us your feedback. Specifically, we’re interested in the following:

Do you find the concept of tasks and continuations easy to understand? Do you find it useful? Would you use tasks and continuations in your C++ projects? What did we get right, and what did we get wrong? What’s missing and what is superfluous?

We’re eagerly awaiting your feedback.

To be continued

Artur Laksberg
Concurrency Runtime Team

Comments (20)
  1. petke says:

    Very nice library, I think it will become very popular, and I can wait to get to use some of this stuff. I got one stylistic bone to pick though:

    Instead of "(a && b).continue_with(…)"  I would like to see a named function like "join_task(a, b).continue_with(…)". This is descriptive and does not break with user expectations of what && means. The guideline for operator overloading is to simulate how built in types behave, and not to reuse these operators for different behaviors. In this case && does something very different, and therefore surprising.

    I guess you did it so that you can chain many objects together in an easy way:


    instead of

    join_task(a, join_task(b, join_task(c, d))

    But with variadic templates you could have it look like this

    join_task(a, b, c, d)

    of the above alternatives this looks the cleanest and most natural

    Is there any other reason you chose to overload the operator?

  2. Petke — thanks for the feedback! I agree the && and || operators are somewhat controversial — succinct but cryptic at the same time.

    We do have the when_all construct, which takes two iterators. So while you cannot do this:


    you can do

       task<T> tasks[] = {a,b};

       when_all(&tasks[0], &tasks[2]);

    Artur Laksberg

  3. Petke says:

    Another question that just came to me. How does this relate to the c++0x futures, promises, packaged_task etc? Is these basically rival libraries?


  4. Petke,

    Another great question!

    PPL Tasks enable wait-free composition of asynchronous operations. This is their main differentiator.

    We worked hard to integrate PPL tasks seamlessly with the existing PPL constructs. Cancellation is one example – you can have a parallel_for inside a task, and cancelling the task will cancel the parallel_for.

    We also work on integrating C++0x futures with ConcRT. We want both tasks and C++0x futures to use the same ConcRT scheduler under the covers, so they would not compete for hardware resources in the same application. We’re also considering whether we should integrate them even further – for example, we can provide a task constructor that takes a future as a parameter. Another idea is to do away with the task_completion_event and use the std::promise instead – thus reducing the concept count from two new classes to just one (plus a handful of new global functions).

    What we do depends on the product schedule and your feedback — so keep it coming.

  5. Zach says:

    I don't like that tasks schedule immediately upon their creation.  I would like an explicit run() function.  This way I could set up an entire hierarchy of tasks before the first one even runs.  

    Also, it would be nice if tasks were usable with the task_group class in the core framework.  This way I could cancel an entire group of tasks easily.

    I'm finding it very difficult to use these continatuions in a situation where the task chain is built up over time, iteratively, as opposed to "all at once".  

    Imagine, for example, a situation where you have a person at a console typing commands like:

    id1 : r1 = multiply(3, 4)

    id2 : r2 = add(r1, 7) [depends_on: id1]

    id3 : r3 = divide(2, 7)

    id4 : r4 = multiply(r2, r3) [depends_on: id2, id3]


    Ignore, of course, how the parsing code works and how storage of intermediate results work.  But from a high level, something like this is difficult to achieve with the current task / continuation sample code.  In this case the functions are pure and do not have side effects, so it won't matter if id1 is computed immediately or when the run statement is encountered, but now imagine that these functions have side effects.  

    We can get around this by having the roots of the hierarchy wait on an event handle, but this still doesn't solve the problem of iteratively building up the tree over time.  Furthermore, since tasks are always returned by value from all the functions, it becomes impossible to save off anything but the roots.

    So suppose I allocate id1 on the heap, then when I get the command for id2 I call id1->continue_with(…).  How do I store this result, so that I can later use it when the command for id4 comes in?

  6. Zach says:

    Also, the code does not compile if _HAS_EXCEPTIONS=0 is defined.  

    Also, in response to my previous question, I realized that tasks are copyable with no problem, so it appears to be safe to store the return values of continue_with(), etc by value in containers, for example.

  7. Zach,

    We resisted adding the run method because we think most people will want their tasks to start right away, without having to remember to call the run method. We also wanted to keep the API set as small as possible. One less method for you to learn and for us to explain is a good thing.

    Other than that, if we were to introduce the run method, the things we would have to solve are:

    1) What happens if run on the same task is called twice on different threads? Is the second call a no-op or should it throw an exception?

    2) Should you call run on tasks created with continue_with? Whan_all/when_any?

    3) Should we also introduce the is_running method? Clearly, this method would be inherently unsafe, because another thread can call run at the same time (or a nanosecond later)

    4) What happens if you wait on a task that hasn’t started? Throw an exception, hang, or start the task automatically?

    What you describe as a solution (roots of the hierarchy waiting on an event handle) is OK, but you can do it more efficiently – without burning a thread – using the task_completion_event. Here is how:

       task_completion_event<void> tce_start;

       task<void> t_start(tce_start);

       auto id1 = t_start.continue_with([](){



       // build up your hierarchy here…

       // now start the whole graph:


    That said, nothing is set in stone yet. If my sample above is not convincing, let me know why.

    What do other people think – do we need the run method on tasks? Speak up if you have an opinion!

    PS. Thanks for the bug report with _HAS_EXCEPTIONS, we’ll look into it.

  8. Zach says:

    The proposed solution is nice, I'll give that a try.  

    Another problem I'm faced with is that of cancelling tasks.  Suppose I'm writing a general purposes task scheduler where clients can schedule high level tasks that have a pre-defined task tree.  In other words, clients can enter commands from a prompt like:


    this really high level task is composed of numerous subtasks, with its own arbitrarily complex dependency tree.  But of course the client need not worry about that.  

    Next, client wants to schedule another really high level task.  High level tasks I want to execute in sequence, one after the other.  So the second high level task should not begin until the first one is completely finished.

    No problem, as I'm constructing the HLT I just keep track of which tasks in the graph are leafs, that way I can construct a when_all() on all the leafs at the end, and have the start node of the second high level task be a continue_with() of the aforementioned join.

    The problem now is that if the first really high level task encounters an error at any step, I would like to cancel every other task in that group.  But I *don't* want to cancel the tasks from the second group, I want it to just start.

    My initial inspection tells me that this isn't possible, that when you cancel a node, all descendants are cancelled as well.  Is there a solution to this?

  9. Vinod Subramanian (MSFT) says:


      thank you for your feedback and questions. I have tried to address them below.

    a) _HAS_EXCEPTIONS is unsupported in PPL & STL in VS2010.

    However, you may be able to work around this issue by re-defining _HAS_EXCEPTIONS after you have finished including all the headers (where this condition would only be effective in user-code).

    b) Regarding your scenario, where you want HighLevelTask1 to continue irrespective of HighLevelTask0 running to completion or canceled (error) state; there are 2 approaches I would like to discuss here:

    Approach 1:

    You can schedule a task that cooperatively waits for highLevelTask0 to complete and irrespective of the status (completed or canceled), schedule highLevelTask1. This would effectively address your problem, the downside to this approach begin highLevelTask1 burns a thread when it is scheduled (but this side-effect is reduced to some degree with the blocking being cooperative)

    task<void> highLevelTask0(HighLevelTaskFunc);

    task<void> highLevelTask1([&]()


           //Doesnt matter if highLevelTask0 completes or cancels, just wait for terminal state


           //Schedule highLevelTask1 and wait



    Approach 2:

    Alternatively, you could schedule highLevelTaskCanc1 to be scheduled iff highLevelTask0  cancels and highLevelTask1  to be scheduled iff highLevelTask0 runs to completetion (note, you would have to explicitly cancel the task highLevelTaskCanc1 in this case). The plus side is that this does not burn a thread in contrast to Approach 1, however, please be advised that the _Continue_on_cancel() is an uglified api used internally (in when_any and when_all), has not been tested and may not work for all cases

    task<void> highLevelTask0(HighLevelTaskFunc);

    //Schedule a continuation on highLevelTask0 if it is canceled

    auto highLevelTaskCanc1 = highLevelTask0._Continue_on_cancel(HighLevelTaskFunc);

    //Schedule a continuation on highLevelTask0 if it is not canceled

    //Cancel the continue_on_cancel task since highLevelTask0 completed successfully

    auto highLevelTask1 = highLevelTask0.continue_with([&]()





    Do either of the approaches address your problem? If yes, which approach would you adopt?



  10. Zach says:

    The second method certainly seems preferable, both in terms of elegance and efficiency.  Do you happen to know what cases the second approach is likely to fail in?  If I make some specific assumptions about usage, can that provide a stronger guarantee of robustness?

    For example, suppose that after constructing a hierarchy of operations I look through all the leaves of the resulting tree, do a when_all() on them, and then use a _Continue_on_cancel on that task.  That is the main use case I'm interested in, will that work all the time?

    I actually think continue_on_cancel is a very important feature, at least for us it's a hard requirement.  Imagine, for example, that all these tasks are really database operations and this high level task I've composed from smaller operations represents a single transaction.  Then if something goes wrong, I *must* be able to guarantee that the transaction is rolled back.

    What are the chances of providing a more robust, supported implementation of continue_on_cancel() in a subsequent release?

    Your first suggested approach appears to work for all situations, so if I find that _Continue_on_cancel is not working in certain situations, maybe I will change to that approach.

    Another idea might be to provide a way to associate a task_completion_event with the actual completion of a task – i.e. allow the user to specify the instance of the TCE to use, but have the framework set the event when task enters a terminal state.  This way I could construct my second task using the same task_completion_event intance to the constructor, and I would not need to burn a thread.  I'm not sure if this approach lends itself well to implementation though.

  11. Hi Zach,

    We left continue_on_cancel() out from this sample pack mainly because we didn't have any specific scenarios in mind where it was terribly useful, so these comments of yours are very important to us! Right now, there's still a possibility we will add support for this, especially if we hear from more customers that it's something they need.  We can't make any commitments at this point about what will and won't ship, but your comments and scenario are definitely helpful for us making that decision — thanks! If you think of other scenarios, please let us know.

    To answer your questions above, I don't see any reasons the second approach would fail. Your when_all scenario should work properly (although the API hasn't gone through the rigors of our QA testing).

    I actually have a question for you (and any other readers of this blog post please feel free to chime in!). One of the issues we had was that we weren't sure what users would actually want the semantics to be revolving around continue_on_cancel().  For example, suppose we have:

    task<int> A([]() { // some work });

    auto B = A.continue_with( // task B's work );

    auto C = A.continue_on_cancel( // task C's work );

    Now, suppose task A completes normally, you would expect task B to be scheduled, but what about task C? Obviously, you expect it will not execute (since A was not canceled)… but do you expect task C itself to be *canceled* or just not executed? This is important, because I could have another task:

    auto D = C.continue_on_cancel( // task D's work);

    or, in other words…

    auto D = A.continue_on_cancel().continue_on_cancel( // task D's work);

    Where I'm not sure you would expect task D to execute or not. I can see arguments for both canceling and not-canceling the task. What would your expectation be?



  12. Zach says:

    I would expect task C to be neither scheduled nor cancelled.  For me, conceptually, scheduling seems like a pre-requisite of cancellation.  So in my mind, task X should be cancelled only if it has already been scheduled.  Otherwise they should just disappear.

    Still though, isn't it possible to mimic the functionality of the system where you cancel the cancellation task if the parent succeeds, by simply specifying the function to execute as a normal continuation?  Suppose you wrote:

    task<void> A([]() {});

    auto B = A.continue_on_cancel( … );

    auto C = B.continue_on_cancel( … );

    Since A does no work, it obviously will not cancel.  if you were expecting C to execute in this case (since the system would call B.cancel() due to A succeeding), why not just specify C as a continuation of A?

    task<void> A([]() {});

    auto B = A.continue_on_cancel( … );

    auto C = A.continue_with( … );

    Out of curiosity, which method is it currently?  I may go test this later although I don't have access to my development machine currently.

  13. Hi Zach, yes, you're correct in that the scenario you laid out you could simply have C continue from A. One of the reasons we released this sample pack is to hear from users like you to see what kinds of scenarios you would use this in, in order to better influence our design.

    Another issue is whether a continue_on_cancel() should return a task to the user, which you can use to create a brand new tree of work, where anything else could be canceled. It was also pointed out to me yesterday that if so, a user cannot safely wait on any task in this "cancellation" side path without the possibility of wait() blocking forever, which would be a undesirable.

    The code currently should simply discard the continue_on_cancel() task if the parent task is successful (i.e. exactly what you were expecting).

    Anyway, thanks for your input, we'll definitely keep it in mind!

  14. Zach says:

    On a different tangent, I'm curious as to your thoughts about my use case at a higher level, because I think I'm using it for a purpose that it probably wasn't intended.  

    It turns out all my tasks actually do NO WORK.  Everything I'm doing is, in fact, an asynchronous I/O operation of some sort that literally returns instantly.  The types of I/O operations I'm doing range anything from issuing HTTP requests and waiting for responses, to issuing database queries or running stored procs and waiting for response, or writing data to / reading data from disk, etc.

    The main feature I actually want out of a system is just the ability to organize my work, at a high level, in such a way that I can build up complicated operations as a sequence of many very small operations with varying parameters.  For example, I might have some domain specific operation called Foo which requires me to coordinate a series of requests / responses from various servers, maybe doing some fiddling with the outputs in order to feed into various stages of the pipeline.  But it's not really a straight pipeline, it's more of a graph.

    The way I have it set up now, it is mostly working although I've yet to define some of the high level operations, but what I'm doing is, when I get a request to perform a Foo operation, I just have my "work" functions issue a request over a socket (or whatever) and then block until I receive notification (from another socket) that the work is complete.  

    Each of these work functions also contain a shared state object that I can use to store results of previous operations to be used by later operations in the "graph / pipeline".  

    The interesting thing about this scenario is that I almost don't want many threads running in the background, because really no work is being done and all my work functions *could* in theory return immediately.  But by returning from the function the underlying system would think the task is complete.  So my solution is to make a Win32 event and use WaitForSingleObject() on it in the work function, and I just set the event when the response comes back.  

    Does a situation like this ring any alarm bells for you in terms of scalability?

  15. petke says:

    @Artur Laksberg MSFT

    Just my two cents here.. It sounds good to me if you could integrate C++0x constructs as much as possible. That way your framework and libraries could be seen as an extension of C++0x rather than an alternative. Maybe even down the line somewhere your work could be used as a working proof of concept for a "wait-free composition of asynchronous operations" API and it could be standardized back into the next C++ standard. Not sure if that is realistic or not.. but it sure would be nice.

  16. Ivan says:

    Looks great!

    1. Does this mean that once this is part of release, task_group style of starting tasks will be deprecated?

    2. Continuations are very elegant, but from the point of view of CPU utilization can we have the same effect as continuations by just using existing task_group and calling get, since get is a cooperative blocking? (BTW, is it also cooperative when called from non-worker thread, i.e. application thread? if yes. how is that possible that runtime if also managing non worker threads?). If yes, what this all has to do with uncooperative blocking as mentioned in beginning of the post?

  17. Ivan says:

    I meant task_group.wait instead of .get

  18. Genevieve says:


    Thanks for your comment. To answer your questions:

    1. No, the task_group style of starting tasks will not be deprecated. task_groups and structured_task_groups are helpful for building fork-join parallelism and will continue to be.

    2. PPL tasks and continuations are intended to be fire-and-forget. That is you can schedule a long chain of continuations and use your thread to do other useful work, while the tasks and continuations are scheduled in the background. And the end of your useful work, if you desire, you may wait on the leaf tasks in the task tree you generated. Scheduling a task on an existing task_group and then waiting on it could lead to your thread moving into a blocked state until the task is finished. This means that particular thread can do no other useful work until that task completes. When your thread wakes up you can schedule your'continuation', but you will be potenially blocking everytime you want to schedule a continuation

    task_group.wait is a co-operative block, but that just means that if it you happen to be calling it on a scheduler worker thread (one that the runtime provides to execute your tasks), the runtime will detect that your thread has 'blocked' and allow a different thread to run and pick up new tasks, or allow other tasks that were blocked and subsequently resumed, to run. This does not happen when it is called from a non-worker thread. When a non-worker thread calls task_group.wait() it will first try to inline the tasks on the task group, but if the tasks are being executed by worker threads, it will simply wait for them to finish.

    In both cases however, the thread itself does block, and does not execute new work until the tasks is is waiting for have finished.

    I hope I have answered your questions. Feel free to ask for clarification, if not.

  19. Cobe says:

    Could you please explain the difference between uncooperative blocking v.s. cooperative blocking, in terms of CPU resource allocation?


  20. Hong Hong - says:


    As we known that all the context to be executed have 3 states: running, ready and blocking.  Running state indicates the current context is executing; ready means the current context has

    gotten all other resources except CPU time, and it is ready to run (but not run yet); blocking means the current context cannot execute because it need some other resources other than CPU.

    Cooperative blocking means when a executing (running) context gets blocked by out of resources, it give out its CPU time to scheduler, so that the scheduler could reallocate CPU time to

    other context in the ready state . After that blocked context get enough resources, it will come back to ready state, and waiting for the scheduler allocating CPU time to him.

    As we can see from this, the term “cooperative” describes the behavior between these contexts how to utilize CPU time: when one context who currently own the CPU temporarily cannot utilize  CPU,

    it will give out its CPU time to someone else. Unless there is no more context in the ready state, when a context is blocked, CPU could still be fully utilized. The uncooperative block, as contrast,  usually means

    when a running context is blocked by out of resources, it still holds the CPU time (or the scheduler “think” he is still holding), so that the scheduler cannot do anything to reuse it.

    The uncooperative blocking usually will waste CPU time, but some of them might be more efficient sometimes. A typical example of this is spin lock, when a context gets into spin lock,

    it does not release its CPU time, but it will not need to reacquire it either when it’s ready to continue execute, thus, it will save some context switch time if the blocking time is very short.

    I hope this would help.


Comments are closed.

Skip to main content