Immutable collections are now RC

Over the past several months, we’ve been working on a new set of collection types that offer an immutable design. Today we are happy to announce that we are one step closer to a stable version of this work: we’ve just shipped the release candidate (1.0.23 RC) of the Microsoft.Bcl.Immutable NuGet package.

Because we could: an immutable representation of our release timeline

What has changed in RC?

The changes in this update can be summarized into the following areas:

  • Removal of ImmutableArray<T>. With a heavy heart we’ve decided to remove ImmutableArray<T> for now because we aren’t happy with the current design.
  • Comparers. We’ve simplified the exposed concepts by not exposing the comparers through the interfaces.
  • Consistent construction. In the last update, we split the factory methods into methods that deal with scalars and ones that deal with other collections. We’ve improved this by renaming some APIs in order to apply the split consistently.
  • Consistency with ToString. Previously, we didn’t override ToString consistently. We’ve fixed this by not overriding ToString at all.
  • Minor improvements. We’ve added a few convenience APIs.

I’ll discuss these areas in more detail.

ImmutableArray<T> removed for now

After careful consideration we decided to remove ImmutableArray<T> from our 1.0 release plans. We aren’t happy with the design as it is today. This doesn’t mean we have given up on having an ImmutableArray<T> – it just means we need more time to do it properly, and we don’t want to block releasing a stable version of immutable collections because we aren’t happy with a single type. After we ship the stable 1.0, we’ll ship a 1.1 beta that will include ImmutableArray<T>.

Here’s why we weren’t happy with ImmutableArray<T>:

The primary goal for ImmutableArray<T> is to be a zero-overhead wrapper around an ordinary array. The current implementation, although it had little overhead, didn’t satisfy the performance needs of one of our most important partners, Project Roslyn, which served as an excellent benchmark for building a large-scale immutable system.

Because of performance requirements, ImmutableArray<T> needs to be a value type, i.e., a struct. In the CLR, all value types have an implied default state. This state is defined as having an underlying memory of all zeros. For ImmutableArray<T> that means that the underlying array it wraps is null. In order to provide meaningful semantics, we believe it’s best for an immutable array in its default state to behave exactly like an empty array. That’s consistent with the way the .NET Framework deals with value types in general.

Unfortunately, we discovered that this design causes a general performance issue. The JIT compiler is normally able to eliminate the bounds checks for standard for loops. For ImmutableArray<T>, the indexer and Length property are inlined so one would think the bounds check would be eliminated as well. Unfortunately, that’s not the case, because these properties don’t just forward to the underlying array but also contain an additional null check to provide the behavior I explained. This causes the JIT to no longer be able to eliminate the bounds checks. Our partner Roslyn uses ImmutableArray<T> throughout its system, so this caused an unacceptable performance loss.

Today, the only way to get the performance we need is to remove the additional null checks from the indexer and Length property. However, we believe this has significant usability issues.

For example, consider the following code:

int[] ints = new int[] { 1, 2, 3 };

ImmutableArray<int> empty = new ImmutableArray<int>();
ImmutableArray<int> immutableInts = empty.AddRange(ints);

Since all value types have an automatic default constructor that initializes the value type to its default state, empty will be an ImmutableArray<T> whose underlying array is null. Thus, the implementation of AddRange will crash with a NullReferenceException.

There are other places where this problem manifests itself. Since ImmutableArray<T> implements certain collection interfaces (such as IEnumerable and IReadOnlyList), you can pass it to methods that work against the interface. Since the interface reference will be non-null, consumers won’t expect any methods or property invocations to cause a NullReferenceException.

Not exactly a happy place. However, we do have some ideas on how to address this issue and decided to give it more baking time by removing ImmutableArray<T> for now.

Reduced complexity around comparers

After reviewing some of your feedback, we decided that certain APIs that exposed comparers were overkill within the current design. Thus, we removed IHashKeyCollection<TKey> and ISortKeyCollection<TKey> as well as the ValueComparer property from IImmutableDictionary<TKey,TValue>.

Consistent construction

When we introduced factory methods for constructing immutable collections, we also removed all Empty properties. So instead of:

return ImmutableList<int>.Empty;

you have to write:

return ImmutableList.Create<int>();

We received feedback that this looks quite bad in code reviews, because most reviewers assume that Create would return a new instance. For added expressiveness, we decided to bring back the Empty properties, so now you have both options.

In the past update, we introduced the From factory method so you could differentiate between factory methods that work on scalars (Create) and and factory methods that operate on collections (From). We realized that the name From is an unfortunate choice because it doesn’t show up together in IntelliSense with Create. In the RC update, we renamed From to CreateRange.

Also, our naming wasn’t fully consistent. There are some Create overloads on ImmutableDictionary that accepted an IEnumerable. We renamed those to CreateRange as well.

Consistency with ToString

ImmutableDictionary<TKey, TValue> now no longer overrides ToString (it was the only collection that did that anyway). This means that calling ToString on any immutable collection will now return the fully qualified type (which is the behavior of Object.ToString).

For debugging purposes, all collections already make use of DebuggerDisplayAttribute and DebuggerTypeProxyAttribute.

Minor improvements

The current signature of the ToImmutableDictionary method requires two selectors: one that determines the key and one that determines the value. This is inconsistent with Enumerable.ToDictionary, which provides an overload that allows you to specify only the key. The RC update includes the equivalent overload for ToImmutableDictionary. Thus, code like this will now compile:

ImmutableList<Customer> customers = GetCustomers();
ImmtuableDictionary<int, Customer> customerById = customers.ToImmutableDictionary(c => c.Id);

Performing a binary search is a fairly common operation for collections. Since ImmutableList<T> is using a tree behind the scenes, indexing is O(log n). Thus, providing a BinarySearch method on ImmutableList<T> can do a better job because it can traverse the internal tree and avoid having to repeatedly re-locate the node in the loop. For symmetry, we also added the BinarySearch method to the ImmutableList<T>.Builder.

What has not changed?

We’ve received a lot of feedback on immutable collections. There were two major design points many of you commented on: the lack of constructors and the implementation of mutable interfaces. I’d like to explain these design decisions.

The lack of constructors

Some of you asked us to bring back the constructors. The concern was that this code:

ImmutableList<int> list = new ImmutableList<int>(new int[] { 1, 2, 3});

is, by far, more intuitive than this code:

ImmutableList<int> list = ImmutableList.Create<int>(new[] { 1, 2, 3});

That’s absolutely true. When designing APIs, we generally avoid factory methods because they are often an indication of overengineering and generally hurt usability. For example, most developers would agree that XDocument is easier to use than XmlDocument. The fact that XDocument avoids factories is one of the reasons why it’s more usable.

The challenge in designing immutable APIs is that it’s very important to avoid unnecessary allocations. For example, most immutable collections start off with an empty instance. Offering a constructor will inevitably cause significant GC pressure due to the fact that this is the most intuitive way to construct them.

Using factory methods also provide other interesting opportunities for optimization. For example, consider this code:

class NumberProcessor
{
    private readonly ImmutableList<int> _numbers;

    NumberProcessor(IEnumerable<int> numbers)
    {
        _numbers = ImmutableList.CreateRange<int>(numbers);
    }

    // other methods elided for brevity
}

The goal of this code is to keep the numbers around. Since the input is typed as IEnumerable<T>, you can pass virtually any collection, including collections that are already immutable. In case numbers is already an instance of ImmutableList<int>, the CreateRange factory method can simply cast it and thus avoid the construction of a new object. A constructor would have forced us to create a brand new immutable list.

There is also the problem of collection initializers. Suppose we provide a default constructor for ImmutableList<int>. In that case, the following code would compile just fine:

ImmutableList<int> items = new ImmutableList<int> {1, 2, 3};
Console.WriteLine(items.Count);

You would probably expect the output to be 3 but in fact it’s 0. The reason is that collection initializers follow a pattern. As long as the type has a constructor, implements IEnumerable, and provides a public Add method, the compiler lets you use the syntax above. The code that the compiler generates looks like this:

ImmutableList<int> items = new ImmutableList<int>();
items.Add(1);
items.Add(2);
items.Add(3);
Console.WriteLine(items.Count);

Now it’s clear why the output is 0: It’s because collection initializers don’t expect the Add method to return a new instance and instead rely on side effects to mutate the existing instance. Since the result is never used, the items variable still refers to the original, empty collection.

Thus, we decided to exclusively offer factory methods to ensure that you get the best behavior and cannot accidentally do the wrong thing.

Implementing the mutable interfaces

Several people raised the issue that our immutable collection types implement the mutable interfaces. For example, this is what the declaration of ImmutableList<T> looks like:

public sealed class ImmutableList<T> :
        IImmutableList<T>,
        IEnumerable,
        IEnumerable<T>,
        IReadOnlyCollection<T>,
        IReadOnlyList<T>,
        ICollection,
        ICollection<T>,
        IList,
        IList<T>
{
   …
}

It’s worth pointing out that this declaration does not have a correctness issue – you cannot mutate an instance of ImmutableList<T> by casting it to a mutable interface. For example, calling IList<T>.Add would throw NotSupportedException.

The mutable interfaces always supported a read-only mode, which means that consumers can’t change the contents. This mode is exposed via ICollection.IsReadOnly, which all mutable interfaces inherit from. Thus, the concrete immutable collection types all implement the existing, mutable interfaces in a read-only fashion, which means that IsReadOnly returns true and all methods that would mutate the collection throw NotSupportedException.

That’s not fundamentally incompatible with immutable collections; for all intents and purposes, immutable collections provide an even stronger guarantee that nobody, not just the consumer, can change their contents. Some of you suggested that we shouldn’t implement the mutable interfaces at all, so you shouldn’t be able to accidentally pass an immutable collection to a method that takes an IList<T>. This would compile just fine, but might cause a NotSupportedException at runtime if the method in question tries to modify the collection through the mutable interface.

Based on the data we have, we found that in many cases where methods take a mutable interface (such as IList<T>), the consumer only needs to read the contents. We believe that enabling interoperability with mutable collections is very important, because otherwise it would be difficult to adopt immutable collections into existing code.

Also, this interoperability is consistent with how the rest of the .NET Framework is designed. For example, Streams can support reading, writing, and seeking. Instead of providing separate types, we’ve decided to expose the capabilities via the optional features pattern because it is typically much simpler and doesn’t result in an explosion of combinatorial types. The IsReadOnly property on ICollection<T> is another instance of this pattern.

Thus, we decided that it’s best to implement the mutable interfaces on the concrete types.

Summary

Today we shipped the release candidate of our Microsoft.Bcl.Immutable NuGet package. We plan to release a stable 1.0 package by the end of September. We’ve removed ImmutableArray<T> from 1.0 because we weren’t happy with its design. It will re-appear in 1.1, which will be available as a beta release shortly after we ship a stable 1.0 package.

Please play around with the RC and let us know what you think!