New Feature? :: ThreadLocal<T>.Values

We’ve been considering adding a Values property to
System.Threading.ThreadLocal<T>.  Values
would return a collection of all current values from all threads (e.g. what
you’d get if you evaluated Value from each thread).  This would allow for easy aggregations, and
in fact in our Parallel Extensions Extras we have a wrapper around
ThreadLocal<T> called ReducationVariable<T>
that exists purely to provide this functionality.  For example:


var localResult = new ThreadLocal<int>(() => 0);
Parallel.For(0, N, i =>
+= Compute(i);

int result = localResult.Values.Sum();

If you’re familiar with the Parallel Patterns Library (PPL)
in Visual C++ 2010, this feature would make ThreadLocal<T> very similar in
capability to the combinable class.


In .NET 4, it’s already possible to do aggregations using
the thread-local data support that’s built in to the parallel loops.


    () => 0,
    (i, loopState, localResult) =>
        return localResult
+ Compute(i);
    localResult => Interlocked.Add(ref result, localResult));

This approach of using Parallel.For has less overhead than
accessing the ThreadLocal<T> instance on each iteration, which is one of
the reasons Parallel.For has the support built-in.  However, there are some advantages to also
having the ThreadLocal<T> approach available:

  1. Fewer
    delegates to understand
    .  Wrapping
    your head around three different delegates (and how data is passed between
    them) in a single method call can be tough. 
    It may also be unintuitive that an interlocked operation is required for
    the final step (though this approach has performance benefits, as each thread
    gets to perform its final reduction in parallel).
  2. Certain
    scenarios may enjoy less overhead
    .  There is potentially a subtle performance
    issue with the Parallel.For approach, depending on why the local support is
    being used.  In an effort to be fair to
    other users of the ThreadPool, the Tasks that service a parallel loop will
    periodically (every few hundred milliseconds) retire and reschedule replicas of
    themselves.  In this way, the threads
    that were processing the loop’s tasks get a breather to optionally process work
    in other AppDomains if the ThreadPool deems it necessary.  The ThreadPool may also choose to remove the
    thread from active duty if it believes the active thread count is too high. Consequently,
    the number of Tasks created to service the loop may be greater than the number
    of threads.  In turn, the delegates that
    initialize and finalize/aggregate the local states will be executed more,
    because they are run for each new Task rather than each new Thread.  Of course, this would only be an issue if the
    initializer and finalizer delegates are very expensive, but it’s worth noting
    that the ThreadLocal<T> approach does not suffer from this.
  3. Usable in
    places where built-in local support isn’t available
    .  You can do aggregations where we do not have
    built-in thread-local data support.  For
    example, Parallel.Invoke does not provide local support.

Your input could help! 
If you’ve got a minute, feel free to answer the following questions
and/or provide other thoughts:

  1. Would you find writing aggregations easier with
    ThreadLocal<T>.Values compared to using the support built in to the
    parallel loops?  If so, does the
    convenience make the feature worthwhile?
  2. Do you need/want support for writing
    aggregations outside of parallel loops?
  3. When you do aggregations, are the routines for
    initializing and finalizing local state expensive?  Examples would be great.