A few weeks ago, I presented on Parallel Extensions to the .NET Framework at the 6th annual Microsoft Financial Services Developer Conference (the decks from the conference are now available online). I had a great time and a great audience, and during the presentation on Thursday I received some good questions. Here are some of them along with answers.
Question: This stuff looks really cool, but why do we need to modify our code to use Parallel Extensions; why can’t you just automatically parallelize it for me?
Igor Ostrovsky, a developer on our team, has a nice set of explanations in his responses to some questions on the Parallel Extensions MSDN Forums:
“The way we traditionally express programs makes it very difficult for the compiler to understand the code at a high level, and parallelize the work. (A part of the problem is that understanding code at a high level is in some sense impossible – see the halting problem.) However, the user can help the parallel engine in various ways:
- The user expresses the goal of the computation instead of the explicit steps to perform it (e.g. Parallel LINQ).
- The user tells the parallel engine how to parallelize the computation (e.g. Task Parallel Library)
- The user specifies certain invariants of the code (e.g. which parts are atomic – transactional memory)”
Question: Parallel Extensions looks like a great way to express concurrency in my applications, but how will I then debug my apps? Debugging parallel applications today is really hard… what are you doing about that?
It most certainly is hard, and we fully realize that developing software doesn’t begin and end with writing code. Tools are incredibly important in the life cycle of applications, especially when it comes to debugging and improving performance. This is never more true than when writing parallel applications, as concurrency introduces a whole host of issues that can be very difficult to find and fix. Alongside Parallel Extensions, the Parallel Computing Platform team is investing in the development of debugging and profiling tools that will increase developer productivity and improve the correctness and maintainability of parallel code. We haven’t released previews of these tools yet, but you can expect to see them in the future.
Question: Do you need multiple cores or multiple processors to do multithreading? Is it possible to use Parallel Extensions on a single CPU machine, and if so, is there overhead in doing so?
Even with a single CPU, there are many places where multithreading can provide significant value. For example, in GUI programming on Windows, doing any significant work on the main GUI thread will cause user interaction with the application to suffer. Instead, it’s better to offload work to a background thread, which can do processing while allowing the application to stay responsive. But when it comes to computationally-intensive tasks, multithreading on a single CPU can lead to decreased performance. Parallel Extensions seeks to minimize that decrease in performance. It can be used on a machine with a single CPU; the idea is that you write your code using Parallel Extensions correctly, and it will then help your application to scale from one to many cores. There is overhead to using Parallel Extensions (as there will be for pretty much any parallel framework), but we’re working hard to minimize that overhead, and for many problems on multi-core machines, the overhead introduced will be greatly dwarfed by the parallel benefits received. On a single CPU, we can typically reduce the overhead to the point where for any problem that would have benefited from parallelism on a multi-core machine, the overhead will be negligible on a single core.
Question: Can one mix and match Parallel Extensions with existing concurrency mechanisms? For example, can I use the .NET ThreadPool in the same application as Parallel Extensions? Can I use monitors and mutexes and sempahores and reader-writer locks with Parallel Extensions just as I do with threads today?
Yes, to all of your questions. Parallel Extensions relies on threads, so you can continue to use existing synchronization primitives with Parallel Extensions just as you would with threads you spin up manually or with threads from the ThreadPool; we’re also introducing some new synchronization and coordination primitives in upcoming releases of Parallel Extensions, so stay tuned for those. (Note, however, that there are some corner-case rules to be aware of. As an example, don’t evaluate futures or wait on tasks while holding a reentrant lock. Doing so isn’t implicitly a problem, but it could lead to some issues that you’ll need to be aware of; I’ll dive into this further in a future blog post.)
Question: What is the relationship between HPC Server and Parallel Extensions? Does one replace the other?
Neither replaces the other, and in fact they’re useful in combination. HPC Server is all about scaling out to nodes in a cluster of machines, and Parallel Extensions is all about scaling up on an individual machine. Just as you can use Parallel Extensions to parallelize your desktop applications, you can use Parallel Extensions to parallelize the .NET applications running on nodes in your cluster. This can be quite powerful when used in conjunction with HPC Server functionality like the new WCF-based SOA Service Broker.
Question: Is there any way to hook into the scheduler used by Parallel Extensions to automatically fan out the work to a cluster?
The scheduler in Parallel Extensions is not currently extensible in this way.
Question: Will the thread pool used by Parallel Extensions allow additional threads to be introduced when the number of cores increases?
Absolutely. When the threads in the pool are all being used to do useful work, the scheduler tries to maintain a ratio of one thread per core, so more cores typically translates to more threads. But in a variety of situations that ratio will not always be 1:1. In some cases, more threads may be used, such as if threads block (e.g. synchronization, I/O, etc.) or the user explicitly requests more threads through a TaskManagerPolicy. Similarly, a TaskManagerPolicy can be used to dial down the amount of available concurrency.
Question: Could you run out of memory if you have a large number of tasks?
Tasks are objects just like any other in .NET, and are subject to the same memory constraints as any other managed code. In the same vein, they’re also managed by the CLR’s garbage collector, such that completed tasks with no references left to them may be collected.
Question: Why will this be faster than using the ThreadPool?
There are a variety of reasons a work-stealing scheduler can perform better than a typical thread pool. For one, the scheduler in Parallel Extensions enables more scalable queue management in that it’s not bottlenecked on a single queue storing all work items. Work items are distributed amongst multiple queues, ideally limiting contention on a single queue for all work; this becomes more and more important as the size of the work items decreases, as the number of threads contending for work increases, and as the number of cores running that work increases. These processor-local queues also enable lock-freedom, which contributes to the efficiency of retrieving work items and executing them. There are other significant benefits related to work-stealing for many types of problems, such as very efficient queueing mechanisms when creating tasks recursively. Moreover, the design can provide significant improvements to things like memory locality, which can greatly improve an algorithm’s performance.
Question: If I’m testing my code on a single-core machine, can I simulate more cores?
To some extent. With the PLINQ programming model exposed in the CTP, you can provide a degree of parallelism value to the AsParallel extension method, which controls the number of threads that can be used to execute the query; by default on a single-core machine, this value will be 1, but you can explicitly make it larger. And for the Task Parallel Library, you can create new TaskManagers with specific TaskManagerPolicy instances that increase the number of threads that can be used beyond the number of cores on the machine. However, while it is possible to simulate it this way, your application won’t be subject to some of the complexities of having multiple cores, and as a result some bugs in the parallel implementation may remain hidden or get hit with much less frequently.
Question: Does Parallel Extensions use threads or fibers?
Threads. Avoid using fibers in managed code: see http://www.bluebytesoftware.com/blog/PermaLink,guid,2d0038b5-7ba5-421f-860b-d9282a1211d3.aspx for more details.
Question: If I spin up a task, and it throws an exception, where will that exception bubble up?
Just as with the Asynchronous Programming Model pattern, where exceptions thrown during asynchronous execution are reraised when the EndXx method is called, exceptions thrown during the execution of a task will be reraised when the Task is waited on (e.g. task.Wait()).