Known Issues in the June 2008 CTP of Parallel Extensions

The Jun08 CTP is still an early pre-release version that is not ready for production usage.  In addition to on-going feature additions and performance work, there are some known issues that we plan to address in future releases.  Of course, there are always things that we missed and we would love your feedback on them either as comments to this post or in the forum.



Task Parallel Library

1.       New scheduler threads are not injected if a task blocks, possibly leading to an inability to run additional tasks.  When there are dependencies between running tasks and to-be-scheduled tasks, this can lead to deadlock.  To make this less likely to happen in practice, TaskManager automatically doubles the number of threads used by default just for this CTP.  This is inconsistent with the documentation which says we will default to a min of 1 processor, an ideal of Environment.ProcessourCount processors, and an ideal of 1 thread per processor.   If you do not like this behavior, or if you still experience deadlocks, you may override the setting by providing your own TaskManagerPolicy with a certain ideal threads per processor count.

2.       Instances of TaskManager, including the default TaskManager instance, do not shutdown cleanly when run in VsTestHost (the host application used to run unit tests from Visual Studio).  This problem may occur whenever TPL threads are aborted, as is used by the CLR in AppDomain unloads.   If you’re using PLINQ or TPL in unit tests in Visual Studio, you can work around this problem by avoiding use of the default TaskManager instance and by explicitly disposing of any used TaskManager instances before exiting your unit tests.  See for an example. 

3.       Parallel.For may overflow and execute extra loop iterations if the toExclusive argument is very close to Int32.MaxValue. This problem is more likely to occur on systems with 8 or more processors. 

4.       Some stress scenarios with multiple TaskManagers may cause crashes or data loss due to an implementation bug.  This will be fixed in a future release.

5.       The concurrent workstation GC is used by the CLR by default on multi-processor machines.  While the concurrent GC does perform some scanning in parallel, collections themselves are run sequentially.  If your parallel program allocates a fair bit of garbage, these collections may substantially impact the kinds of speedups you will witness.  As an alternative, you may turn on the server GC by using an application config file: <configuration><runtime><gcServer enabled=”true”/></runtime></configuration>.  This ensures parallel collections are performed, utilizing all available processors.  Note that this can lead to a poor experience on machines running lots of active processes—such as Terminal Server scenarios—because the GC requires that all processors are used for the collection to complete.

6.       Large numbers of Tasks may underperform the Dec07 CTP in some applications.



7.       In this CTP, PLINQ is implemented on top of the Task Parallel Library, which does not yet have thread injection in this CTP (see the related above issue #1).  Some PLINQ queries require more concurrently-running tasks than the number of threads that Task Parallel Library creates, so you may observe deadlocks.  Specifically, binary operators like SelectMany and Join that use the output of another PLINQ query as the second data source are likely to hit this.  We have provided a workaround for this CTP: the previous implementation, which runs on top of the .NET ThreadPool, is still available. Just set the PLINQ_USE_THREADPOOL environment variable to a non-empty value and PLINQ will revert back to the ThreadPool.  This setting will go away in subsequent releases.

8.       PLINQ with order preservation on is noticeably slower than PLINQ without order preservation.  A natural tension exists between ordering and performance.  For the sake of performance, by default, PLINQ does not maintain query result order (this can be switched by adding the AsOrdered method directly onto the AsParallel method.  This is, in some sense, a built-in part of PLINQ’s design but we are hopeful that sizeable performance improvements can still be made.

9.      Some operators always produce output in an undefined order, even if the user explicitly requested ordering.  Specifically, this affects the following operators: Distinct, GroupBy, Join, Except, Intersect, and Union.

10.  Documentation addendum: Some operators may not do what you expect when order-preservation is not turned on.  Because the ordering among elements will be non-deterministic, SequenceEquals, Take, TakeWhile, Skip, and SkipWhile, for example, will effectively be performed on a randomly ordered input.  As a specific example, Take(N) will take N elements, but not the “first” N elements as you may be expecting.  Reverse is simply a no-op since it inherently relies on ordering.  We are still evaluating how best to surface this possibly-confusing aspect of the programming model more clearly.


Coordination Data Structures

11.   ConcurrentQueue<T> has known scalability work to be done.  Specifically, the implementation uses a linked list internally which increases the amount of memory allocations and garbage when compared to a simple Queue<T> data structure.  It also leads to more interlocked operations being executed to maintain lock-freedom when compared to a simpler fine-grained locking approach.  We are actively experimenting with alternative approaches.



12.   The samples install into %PROGRAMFILES%, which by default is a privileged location that requires administrative privileges to write to; this includes the writes that must occur during compilation.  In order to compile the samples, you’ll either need to copy the samples to another location or start Visual Studio with administrator privileges.

13.   The ImageColorizer sample relies on, a DLL that’s included with Windows Vista and other Tablet-enabled operating systems as well as in the relevant SDKs (link to SDK).  Certain functionality in ImageColorizer relies on this DLL.  While you won’t need the DLL to run the application (the relevant functionality will be disabled if the DLL isn’t present), you will need to acquire the DLL in order to compile the sample. On a Windows Vista install, the DLL is likely available in a location like %PROGRAMFILES%Reference AssembliesMicrosoftTablet PC.

14.   The project file for the F# Raytracer sample contains a hardcoded path to System.Core.dll, which presumes that it resides at “C:Program FilesReference AssembliesMicrosoftFrameworkv3.5System.Core.dll”.  If this is not the case, you will need to modify the path in the property pages.