Preview of Immutable Collections Released on NuGet

Over the last years .NET added many features to make writing multithreaded applications easier. This includes the Task Parallel Library (TPL) as well as the new async/await keywords features to reduce the friction when writing asynchronous code. However, it’s still challenging to keep mutable state under control when multiple threads are involved. A common approach is to make use of immutable state that can be passed freely between different threads. Unfortunately, implementing immutable data structures can be quite hard, especially when collections are involved. In this post, Andrew Arnott, a developer from the Visual Studio team , will talk about a set of collection types we developed to make that easier. –Immo

Today we released a preview of the immutable collections for the .NET Framework 4.5 and Windows Store apps on NuGet, as planned.

If you’re not already familiar with immutability concepts and where it’s valuable in your application you may want to read Read-only, frozen, and immutable collections.

Background and reasons for immutable collections

The .NET Framework already includes many highly-tuned generic collections. And in several cases read-only interfaces or wrapper classes exist as well. So how are immutable collections different and why do we need them? In short, because read-only collection types are merely facades over collections that can change.

If someone hands you a ReadOnlyCollection<T> , an IReadOnlyList<T> , or an IEnumerable<T> the only guarantee is that you can’t change the data – there is no guarantee that the person that handed you the collection won’t change it. Yet, you often need some confidence that it won’t change. These types don’t offer events to notify you when their contents change, and if they do change, might that occur on a different thread, possibly while you’re enumerating its contents? Such behavior would lead to data corruption and/or random exceptions in your application.

In contrast, if someone provides you with data via an immutable type, the guarantee is that the data will never change. You can hold a reference to it as long as you want, enumerate it, search it, etc., from any thread at any time, and be completely confident that you’ll get the expected behavior.

There are several kinds or levels of immutability, in fact. These immutable collections never allow mutation, yet offer mutating methods that return new collections, while sharing data across versions very efficiently. You can read up more on various types of immutability from Eric Lippert blog post Immutability in C# Part One: Kinds of Immutability.

Why should I use immutability?

There are many good reasons to use immutable collections. Here are a few:

  1. Snapshot semantics, allowing you to share your collections in a way that the receiver can count on never changing.
  2. Implicit thread-safety in multi-threaded applications (no locks required to access collections).
  3. Any time you have a class member that accepts or returns a collection type and you want to include read-only semantics in the contract.
  4. Functional programming friendly.
  5. Allow modification of a collection during enumeration, while ensuring original collection does not change.
  6. They implement the same IReadOnly* interfaces that your code already deals with so migration is easy.

Introducing the immutable collections

The NuGet package preview includes these types:

  • ImmutableStack<T>
  • ImmutableQueue<T>
  • ImmutableList<T>
  • ImmutableHashSet<T>
  • ImmutableSortedSet<T>
  • ImmutableDictionary<K, V>
  • ImmutableSortedDictionary<K, V>

Interfaces for each of these types are also defined to facilitate exchange of immutable collection types that may be implemented differently to optimize for very specific performance or memory requirements.

I’ll look at how you might use ImmutableList<T>. First, I’ll start by acquiring an empty immutable list of fruit:

         var emptyFruitBasket = ImmutableList<string>.Empty;

Notice that rather than a public constructor we provide a  static Empty property. Using a static Empty property is a departure from a well-entrenched pattern, and there is a reason for it. A constructor must allocate an object, yet given the immutability of these collections, a new object could only ever represent an empty list. Since mutating the collection creates a new collection, using a constructor would cause the creation of multiple objects to represent an empty list, which would be wasteful. A static property allows us to return an empty list singleton which all code in your app can share.

An empty list isn’t that interesting. Next I will add some elements to it. As just stated, this is an immutable list, so when I request a change, a new list is allocated and the results are assigned to some variable:

         var fruitBasketWithApple = emptyFruitBasket.Add("Apple");

The Add method returned a new list with one element, and I assigned this list to a new variable. The original emptyFruitBasket list is still empty. This is similar to System.String, which has an Empty static property, and mutating methods that always return new strings that include the requested change.

You may not want to come up with unique names for new local variables on each assignment, so you can simply reassign to the same local variable:

        var fruitBasket = ImmutableList<string>.Empty;
       fruitBasket = fruitBasket.Add("Apple");
       fruitBasket = fruitBasket.Add(“Banana”);
       fruitBasket = fruitBasket.Add(“Celery”);  

Just as before, the original empty list was not modified. But I’ve replaced the reference to the empty list with a reference to the new populated list.

Watch out for this common pitfall though: it’s easy to forget to reassign the result of a mutation to a local variable:

        var list = ImmutableList<int>.Empty;
       list.Add(3); // forgot to reassign result to a local variable
       // contents of the “list” local variable is still empty!    

To avoid creating so many intermediate list objects, you can add several elements at once:

         fruitBasket = fruitBasket.AddRange(new [] { “Kiwi”, “Lemons”, “Grapes” });    

Since every mutating operation returns a new object that you can then mutate, you can also use a fluent-like syntax:

         var fruitBasket = ImmutableList<string>.Empty
            .Add(“Apple”)
            .Add(“Banana”)
            .Add(“Celery”)
            .AddRange(new [] { “Kiwi”, “Lemons”, “Grapes” });

We do not simply copy the contents of the immutable collections with each change. This would result in poor performance with collections of any interesting size, occupy too much memory and stress the GC. Instead, we built these immutable collections to share as much memory between instances as possible. Following is a contrived extreme case of a list with 10,000 elements:

         var hugeList = ImmutableList<int>.Empty.AddRange(Enumerable.Range(1, 10000));  

Now I’ll add one more item to the list:

         var hugeListPlus1 = hugeList.Add(10001);    

How much memory have we used between both hugeList and hugeListPlus1? Only slightly more than for hugeList alone. We did this by storing the immutable collections in immutable binary tree structures instead of flat arrays. When we create a list based on an existing list, we start with the existing binary tree and change just a few nodes in it, allocating new objects for each changed node, and return a list that points to the new root node of the binary tree. So hugeList and hugeListPlus1 actually share almost all their memory. If the app releases its reference to either list, the data unique to that particular list is immediately a candidate for garbage collection, so we don’t create memory leaks.

Introducing the Builders

Sometimes you need to “mutate” an immutable collection many times as you’re initializing it. Each top-level mutation call into an immutable collection results in allocating a few new nodes in that binary tree. All those extra allocations can add up in terms of GC pressure and performance. The same problem exists when you mutate strings repeatedly, which you can avoid by using the StringBuilder in those cases. Immutable collections use the same approach as strings. All the immutable collections that create GC pressure from many mutations provide ToBuilder() methods. The builder classes implement the same mutable interfaces that you’re used to, allowing you to pass these builders around as needed to initialize or update your collections. Then you can restore immutability by calling ToImmutable() . Here’s an example:

         /// <summary>Maintains the set of customers.</summary>
        private ImmutableHashSet<Customer> customers;

        public ImmutableHashSet<Customer> GetCustomersInDebt()
        {        
            // Since most of our customers are debtors, start with
            // the customers collection itself.

            var debtorsBuilder = this.customers.ToBuilder();      

            // Now remove those customers that actually have positive balances.

            foreach (var customer in this.customers)
            {
                if (customer.Balance >= 0.0)
                {
                    // We don’t have to reassign the result because
                    // the Builder modifies the collection in-place.
                    debtorsBuilder.Remove(customer);
                }
            }
         
            return debtorsBuilder.ToImmutable();
        }  

These builders are mutable collections that use exactly the same data structure as the immutable collections, but without the immutability requirement, allowing them to be very efficient about their memory use as you add or remove elements. In the above sample, the ToBuilder() did not copy the customers collection, but rather keeps using the existing immutable collection you already had. When the builder collection is modified, it allocates a new reference to the collection that points to a slightly modified collection, which preserves the original this.customers collection unmodified. Finally, when ToImmutable() is called, the builder “freezes” its modified internal structure and returns the new reference to the immutable collection. Important note: No immutable collections are modified in the execution of the above example, nor was the entire collection copied at any time.

You don’t always need to use builders though. If you can mutate your collections with just one method call then a builder isn’t necessary. For example, the above sample could be rewritten without builders using a lambda:

        private ImmutableHashSet<Customer> customers;

       public ImmutableHashSet<Customer> GetCustomersInDebt()
       {
           return this.customers.Except(c => c.Balance >= 0.0);
       }    

The above example is just as efficient as the Builder approach. However, not all bulk modifications to immutable collections are expressible as a single method call with a lambda, and you can use the Builders in those cases.

Performance characteristics

How do immutable collections perform relative to their well-known mutable BCL counterparts? These immutable collections have been heavily tuned for maximum performance and minimum GC pressure. They’re very fast, and in most cases they will be appropriate to use in your applications.

When they are typically faster

Immutable collections’ performance shines best when your application already requires snapshot semantics, whether for thread-safety or just for the simplicity it offers your codebase. For example, you might have code like this in your application, in order to provide some “snapshot” semantics or provide thread-safety:

      private List<T> collection;
     
     public IReadOnlyList<int> SomeProperty
     {
         get
         {
             lock (this)
             {
                 return this.collection.ToList();
             }
         }
     }  

Every time this property is queried, it makes an entire copy of your internal collection. Replacing this mutable collection with an immutable collection can dramatically improve this scenario because returning a reference of the current immutable collection implicitly retains the snapshot semantic and no copying (or locking) is required. It becomes:

     private ImmutableList<T> collection;

    public IReadOnlyList<int> SomeProperty
    {
        get { return this.collection; }
    }

Switching to immutable collections in some components of Visual Studio has helped us dramatically improve the performance of some features.

When they might be slower

There are some noteworthy differences in algorithmic complexity between mutable and immutable collection types:

 

Mutable (amortized)

Mutable (worst case)

Immutable

Stack.Push

O(1)

O(n)

O(1)

Queue.Enqueue

O(1)

O(n)

O(1)

List.Add

O(1)

O(n)

O(log n)

HashSet.Add

O(1)

O(n)

O(log n)

SortedSet.Add

O(log n)

O(n)

O(log n)

Dictionary.Add

O(1)

O(n)

O(log n)

SortedDictionary.Add

O(log n)

O(n log n)

O(log n)

Why are the immutable types more prone to O(log n) time where mutable types are O(1)? It’s a consequence of the internal data structures that allow sharing across instances of the collections. For instance a mutable HashSet<T> allocates a single array, and stores elements in that array at the index determined by their hash code. This providers O(1) lookup and storage times. If an ImmutableHashSet<T> used this same kind of data storage, then every mutation would require reallocating and copying the entire array just to change one entry, which would quickly tank performance and GC pressure as the collections grew large. Instead, these immutable collections use immutable balanced binary trees to store their data, allowing very efficient data sharing across mutated instances of the collections. As a result however, every lookup or change requires a tree traversal that has (at worst) an algorithmic complexity of O(log n).

Another point to consider when considering algorithm complexity however is worst case. Mutable collections can outgrow their initially allocated capacity and have to reallocate and copy their contents into a larger array. These immutable collections grow organically so they don’t experience the O(n) worst case on growth that mutable collections do. The hash-based immutable collections can still degrade to O(n) time on hash collisions just like mutable collections however.

Memory characteristics

The immutable collections spend a bit more memory per element for storage than their mutable counterparts, but without the slack space that mutable collections tend to have in their arrays. This means you win some, but you lose some too. Given a single instance of a collection, your actual mileage may vary as far as how its memory consumption compares between the two types.

When removing elements from your collection, the mutable collections generally don’t shrink their arrays so you don’t reclaim that memory. Immutable collections immediately shrink their binary trees for each removed element so the memory is a candidate for garbage collection.

How can I get started?

Install the Microsoft.Bcl.Immutable NuGet package into one of your Visual Studio projects. Before consuming the package, make sure you've got the NuGet 2.1 or newer installed. (What is Nuget?)

Call for feedback

The NuGet package will be in a prerelease form for a brief time, giving you an opportunity to tell us what you think so we can make necessary changes. Add comments to this post, or contact the authors of our NuGet package.

Frequently asked questions

Q: Can I only store immutable data in these immutable collections?
A: You can store all types of data in these collections. The only immutable aspect is the collections themselves, not the items they contain.

Q: Do Builders let you modify an immutable collection?
A: No. Immutable collections can never be modified. Builders give you the ability to start and finish a series of mutations with immutable instances, but in between use mutability for greater efficiency. Both the collections you start and finish with are in fact immutable.

Q: Is it safe to use immutable collections in a multi-threaded environment?
A: Yes. All truly immutable object graphs in .NET are safe in a multi-threaded environment.

Q: Mutable collections are fine for me, aren’t they? I’m not into F# or a math buff. How can immutable collections help me?
A: Immutable collections don’t entirely supersede the mutable collections found in the .NET Framework. You can and often should continue to use them as you have before. To learn more about immutability and what its benefits are, you may be interested in reading Read only, frozen, and immutable collections.

Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important?
A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

Q: I want to try these immutable collections out but I need a convenient escape hatch in case their performance doesn’t meet my requirements. What can I do?
A: Use the new .NET 4.5 readonly interfaces on all your public APIs, such as IReadOnlyCollection<T> and IReadOnlyList<T>. In your implementation you may use immutable collections, which implement these interfaces. If in the end you need to switch to mutable collections, you’ll be able to do so without any breaking changes.

Q: I’m totally jazzed about these immutable collections. Where can I find out more?
A: There are lots of fun details about optimizations these collections have, best practices and anti-practices, etc. that don’t fit in this post. But these details may appear in future posts if you follow the immutability tag on my blog here on MSDN.