Feedback requested: Enumerating Concurrent Collections

The June 2008 CTP of Parallel Extensions contained a first look at some of the work we’re doing to augment the .NET Framework with a set of additional coordination data structures that aid in the development of highly concurrent applications.  This included two thread-safe collections, ConcurrentQueue<T> and ConcurrentStack<T>, and it’s probably not giving away too much to say that we’re working diligently on additional collections that aid in key scenarios for parallel programming.  For the most part, these collections are similar to the exsiting non-thread-safe counterparts in the .NET Framework, except with APIs designed with concurrency patterns in mind, and of course implementations that tolerate multithreaded access.  Some of these designs come naturally and, from an implementation standpoint, are easily enabled.  However, there are times when we have to make significant trade-offs in functionality or performance in order to achieve certain designs, and for those situations, it’s very important that we have a good understanding of the key scenarios and use cases in which the collections will be used.

One important design point we keep in mind for all of the collections we’re working on is how to deal with concurrent modifications while a collection is being enumerated.  With the standard collections in the .NET Framework today, modifications to a collection invalidate any outstanding enumerators.  For example, consider the following code:

var q = new Queue<int>(new[] { 1, 2, 3, 4, 5 });
foreach (var item in q)
{
    if (item <= 5) q.Enqueue(item * 6);
}

This will compile fine, but upon execution, the call to q.Enqueue inside of the foreach loop will throw an InvalidOperationException: “Collection was modified after the enumerator was instantiated.”  In contrast, if you change the Queue<int> to a ConcurrentQueue<int>, e.g.:

var q = new ConcurrentQueue<int>(new[] { 1, 2, 3, 4, 5 });
foreach (var item in q)
{
    if (item <= 5) q.Enqueue(item * 6);
}

no exception will be thrown, and upon examination of the queue after the loop, you’ll find it to contain 10 elements.  This is an explicit design decision we’ve made to break the mold from the existing collections, the idea behind such a change in direction being that there are key scenarios where one thread needs to be enumerating while other threads are reading and writing to the collection (of course, as shown in this example, it could be the same thread reading and writing as the one enumerating).

This does raise some questions, of course, especially around the guarantees that can be provided for such an enumeration.  If data is added to the list while another thread is enumerating, must that enumerating thread see the new data?  If data is removed while another thread is enumerating, must that enumerating thread not see the new data?  Is it acceptable for data to be observed twice in the same enumeration or not at all due to the organization of the collection being changed concurrently?  And so forth.  For some collections, like ConcurrentStack<T> and ConcurrentQueue<T>, we’re able to provide moment-in-time snapshot semantics, but for other collection types, doing so isn’t necessarily possible without incurring serious performance penalties.

Given this, we’re very curious to hear your feedback on a few things:

  1. Given the standard array of collections that exist in .NET, if you had thread-safe versions of them, are there scenarios where you would benefit from being able to enumerate concurrently with other threads modifying the same collection?  We already know about some, but we’d love to hear about more.
  2. Assuming you do have scenarios for #1, what are the minimum guarantees you’d need about the data returned in the enumerator for it to be useful?  For example, you could say for a thread-safe dictionary that if there are no concurrent modifications (adds/updates/removals), you’ll get back in the enumeration exactly what’s in the dictionary, and if there are concurrent accesses, you’ll never get back something that wasn’t in the dictionary, you may not see concurrent adds or updates, and you may still see items that are concurrently removed.
  3. If we did support enumeration on collections that were thread-safe in the face of concurrent modifications, would you ever want the ability to revert to the “throw on modifications” behavior?
  4. Finally, what are the most important collections that you’d like to see thread-safe and scalable counterparts for?

Thanks in advance for any input you may have!