We’ve been considering adding a Values property to
would return a collection of all current values from all threads (e.g. what
you’d get if you evaluated Value from each thread). This would allow for easy aggregations, and
in fact in our Parallel Extensions Extras we have a wrapper around
ThreadLocal<T> called ReducationVariable<T>
that exists purely to provide this functionality. For example:
var localResult = new ThreadLocal<int>(() => 0);
Parallel.For(0, N, i =>
int result = localResult.Values.Sum();
If you’re familiar with the Parallel Patterns Library (PPL)
in Visual C++ 2010, this feature would make ThreadLocal<T> very similar in
capability to the combinable class.
In .NET 4, it’s already possible to do aggregations using
the thread-local data support that’s built in to the parallel loops.
() => 0,
(i, loopState, localResult) =>
localResult => Interlocked.Add(ref result, localResult));
This approach of using Parallel.For has less overhead than
accessing the ThreadLocal<T> instance on each iteration, which is one of
the reasons Parallel.For has the support built-in. However, there are some advantages to also
having the ThreadLocal<T> approach available:
delegates to understand. Wrapping
your head around three different delegates (and how data is passed between
them) in a single method call can be tough.
It may also be unintuitive that an interlocked operation is required for
the final step (though this approach has performance benefits, as each thread
gets to perform its final reduction in parallel).
scenarios may enjoy less overhead. There is potentially a subtle performance
issue with the Parallel.For approach, depending on why the local support is
being used. In an effort to be fair to
other users of the ThreadPool, the Tasks that service a parallel loop will
periodically (every few hundred milliseconds) retire and reschedule replicas of
themselves. In this way, the threads
that were processing the loop’s tasks get a breather to optionally process work
in other AppDomains if the ThreadPool deems it necessary. The ThreadPool may also choose to remove the
thread from active duty if it believes the active thread count is too high. Consequently,
the number of Tasks created to service the loop may be greater than the number
of threads. In turn, the delegates that
initialize and finalize/aggregate the local states will be executed more,
because they are run for each new Task rather than each new Thread. Of course, this would only be an issue if the
initializer and finalizer delegates are very expensive, but it’s worth noting
that the ThreadLocal<T> approach does not suffer from this.
- Usable in
places where built-in local support isn’t available. You can do aggregations where we do not have
built-in thread-local data support. For
example, Parallel.Invoke does not provide local support.
Your input could help!
If you’ve got a minute, feel free to answer the following questions
and/or provide other thoughts:
- Would you find writing aggregations easier with
ThreadLocal<T>.Values compared to using the support built in to the
parallel loops? If so, does the
convenience make the feature worthwhile?
- Do you need/want support for writing
aggregations outside of parallel loops?
- When you do aggregations, are the routines for
initializing and finalizing local state expensive? Examples would be great.