Please welcome ImmutableArray

Immo Landwerth

We’ve just released an update to our immutable collection package which adds a new member to the family of immutable collection types: ImmutableArray<T>.

In this post, I’ll talk about why we added another collection and how it relates to the existing types. I’ll also cover some minor updates we did to our package.

Introduction ImmutableArray<T>

Sometimes a little bit of code says more than a thousand pictures, so let’s look at the declaration of immutable array:

        public struct ImmutableArray<T> : IList,
                                          IList<T>,
                                          IReadOnlyList<T>,
                                          IImmutableList<T>,
                                          IEquatable<ImmutableArray<t>>,
                                          IStructuralComparable,
                                          IStructuralEquatable
        {
            // ...
        }

As you can see ImmutableArray<T> implements IImmutableList<T> which begs the question how it is different from the existing implementation ImmutableList<T>.

The answer: performance.

Arrays are hard to beat in several ways: they provide an O(1) element access, they are very cache friendly as all data is co-located, and they provide low overhead for small collections (< 16 elements).

ImmutableArray<T> is a very thin wrapper around a regular array and thus shares all the benefits with them. We even made it a value type (struct) as it only has a single field which holds the array it wraps. This makes the size of the value type identical to the reference of the array. In other words: passing around an immutable array is as cheap as passing around the underlying array. Since it’s a value type, there is also no additional object allocation necessary to create the immutable wrapper, which can reduce GC pressure.

The key operations on ImmutableArray<T> just forward to the underlying array, including the indexer. In contrast to List<T> or ArraySegment<T> the underlying array always has the same length as the outer type, i.e. the immutable array. This means we can avoid having additional bounds checking and just rely on the underlying array. This allows the CLR code generation to inline the indexer access.

We’ve also implemented custom LINQ operators for ImmutableArray<T>. This avoids boxing immutable arrays into an IEnumerable<T>.

Using ImmutableArray<T>

Creating an immutable array is similar to creating an immutable list, i.e. it follows the factory pattern via static Create methods:

        ImmutableArray<int> array = ImmutableArray.Create(1, 2, 3);

You can also create an immutable array via the ToImmutableArray() extension method:

        IEnumerable<int> someInts = Enumerable.Range(1, 100);
        ImmutableArray<int> array = someInts.ToImmutableArray();

It also supports the builder pattern which allows constructing immutable arrays via mutation:

        ImmutableArray<int>.Builder builder = ImmutableArray.CreateBuilder<int>();
        builder.Add(1);
        builder.Add(2);
        builder.Add(3);
        ImmutableArray<int> oneTwoThree = builder.ToImmutable();

You may wonder how using the builder differs from using a List<T> with the ToImmutableArray() extension method. Generally, all immutable collection builders are functionally equivalent to their ordinary mutable collection types (List<T>, Dictionary<TKey, TValue> etc). They are purely there to provide better performance. Using the ImmutableArray<T> builder can improve performance as the implementation uses a trick to avoid type checks. Normally the CLR has to perform additional type checks at runtime to ensure that storing an element in an array is type safe, because arrays are covariant which means the correctness can’t be easily checked statically. ImmutableArray<T> avoids this by wrapping references in a pointer-sized value type. For more details on this subject I recommend this blog post from Eric Lippert.

Reading data from an ImmutableArray<T> is similar to reading regular arrays:

        ImmutableArray<int> array = //...
        for (var i = 0; i < array.Length; i++)
        {
            Console.WriteLine(array[i]);
        }

We’ve decided to go with having a Length property instead of Count because we see ImmutableArray<T> as a replacement for regular arrays. This makes it easier to port existing code. It also makes the design a bit more self-contained.

Of course we also support enumerating the elements via foreach. ImmutableArray<T> follows what other collection types do as well: it implements IEnumerable<T> but also provides a custom enumerator which allows the compiler to generate more efficient code which avoids boxing the enumerator.

Since ImmutableArray<T> is a value type, it overloads the equals (==) and not-equals (!=) operators. They are defined as using reference equality on the underlying array.

The default value of ImmutableArray<T> has the underlying array initialized with a null reference. In this case it behaves the same way as an ImmutableArray<T> that has been initialized with an empty array, i.e. the Length property returns 0 and iterating over it simply doesn’t yield any values. In most cases this is the behavior you would expect. However, in some cases you may want to know that the underlying array hasn’t been initialized yet. For that reason ImmutableArray<T> provides the property IsDefault which returns true if the underlying array is a null reference. For example you can use that information to implement lazy initialization:

        private ImmutableArray<byte> _rawData;
        public ImmutableArray<byte> RawData
        {
            get
            {
                if (_rawData.IsDefault)
                    _rawData = LoadData();
                return _rawData;
            }
        }

ImmutableArray<T> vs. ImmutableList<T>

In contrast to what I said earlier, we decided to make immutable array a persistent data structure. In other words: similar to ImmutableList<T> it has methods that allow creating new instances of immutable arrays with different contents. Mind you, the performance characteristic of those operations is fairly different from ImmutableList<T>. ImmutabeList<T> uses a tree data structure that allows sharing. It also allows for making sure updates can be done in O(log n).

The following table summarizes the performance characteristics of ImmutableArray<T>. You can find a similar table for the existing types here.

Operation ImmutableArray<T> Complexity ImmutabeList<T> Complexity Comments
Create() O(n) O(n) Requires to copy the input array to ensure the underlying array can’t be modified
this[] O(1) O(log n) Directly index into the underlying array
Add() O(n) O(log n) Requires creating a new array

Reasons to use immutable array:

  • Updating the data is rare or the number of elements is quite small (<16)
  • you need to be able to iterate over the data in performance critical sections
  • you have many instances of immutable collections and you can’t afford keeping the data in trees

Reasons to stick with immutable list:

  • Updating the data is common or the number of elements isn’t expected to be small
  • Updating the collection is more performance critical than iterating the contents

In general, when all you need is an immutable array and you don’t plan on changing the data ever, use ImmutableArray<T>. If you need to update the data, use ImmutableList<T>.

If you do update the data but think ImmutableArray<T> could perform better overall, you should try both and measure. Remember that designing for performance means to consider different trade-offs. It’s key to measure those in actual scenarios, under real-world workloads.

What does this mean for code that operates with the interface IImmutableList<T>? The interface could be backed by either an immutable list or an immutable array (or a custom type). Due to the different complexities the code cannot rely on the exact complexity. So in general the code should use bulk operations whenever possible in order to make sure the cost is minimized. For example, instead of calling Add() in a loop you should prefer a single call to AddRange().

Other Changes in this Update

Create() and From() factory methods

In the last release, we added the Create() factory methods for constructing immutable collections. We created overloads for both scalars as well as collections:

        public static ImmutableList<T> Create<T>();
        public static ImmutableList<T> Create<T>(params T[] items);
        public static ImmutableList<T> Create<T>(IEnumerable<T> items);

We discovered that the overload that takes the IEnumerable<T> can have surprising results. You’d think you can use the overload that takes IEnumerable<T> by creating collections from other collections:

        var list = new List<string>();
        list.Add("One");
        list.Add("Two");
        // Doh! Actually I wanted to get ImmutableList<string>
        ImmutableList<list<string>> il = ImmutableList.Create(list);

Instead of creating an ImmutableList<string> you end up creating an ImmutableList<List<string>> because overload resolution prefers the params overload over an implicit conversion from List<string> to IEnumerable<string>.

For that reason we’ve decided to remove the ambiguity by renaming all factory methods that operate over IEnumerable<T> to From:

        public static ImmutableList<T> Create<T>();
        public static ImmutableList<T> Create<T>(params T[] items);
        public static ImmutableList<T> From<T>(IEnumerable<T> items);

We decided not to rename the params overload as it is usually called with one or more scalars and not with an array. We believe this design will appear more consistent from typical call sites.

CreateBuilder() factory methods

Previously, you couldn’t create builders directly. You had to create them from an immutable collection like this:

        ImmutableList<int>.Builder builder = ImmutableList.Create<int>().ToBuilder(); 

Now you can simply call a factory method, similar to creating an immutable collection itself:

        ImmutableList<int>.Builder builder = ImmutableList.CreateBuilder<int>();

Value Comparers

Our original design of IImmutableList<T> was based on the idea of keeping it aligned with IImmutableDictionary<TKey, TValue> and IImmutableSet<T>. Both of them have a built-in notion of comparers. When implementing IImmutableList<T> on ImmutableArray<T>, we were faced with the problem of supporting a custom comparer on the type. We could have added one more field to store it, but in this case ImmutableArray<T> would no longer be a cheap wrapper around an array. So we decided to drop the idea of storing a customer comparer on the list itself and instead require consumers to pass in the equality comparer they want to use.

As a result we removed the ValueComparer property and WithComparer methods from IImmutableList<T>.

For the members that implicitly used the comparer we’ve changed their signature to take a comparer (for example, the IndexOf method). This is a breaking change for implementers. For consumers, most of the signature changes shouldn’t be source breaking as we also added extension methods that pass in the default comparer.

TryGetValue() for sets

We’ve received the feedback that IImmutableSet<T> doesn’t have a way to get the actual value that is stored in the set – it only allows for checking whether it contains a value that is considered equal. In most cases, that’s not a problem at all. But consider a set of strings that uses a case-insensitive comparer, for example a set of filenames. If you want to know for a given filename what the canonical casing is, you are out of luck. Therefore we added a TryGetValue method that allows retrieving the original value that was added to the set:

        var set = ImmutableHashSet.Create<string>(StringComparer.OrdinalIgnoreCase);
        set = set.Add("D:\Src\Test.cs");
        string original;
        if (set.TryGetValue("d:\src\test.cs", out original))
        {
            // original contains "D:\Src\Test.cs"
        }

GetValueOrDefault() for dictionaries

Lastly, we’ve fixed an issue with the GetValueOrDefault() extension method we’ve added for dictionaries.

        ImmutableDictionary<string, string> dictionary = ImmutableDictionary
                                                        .Create<string, string>()
                                                        .Add("key1", "value1")
                                                        .Add("key2", "value2");
        string value = dictionary.GetValueOrDefault("key1");

Unfortunately, this results in a compilation error:

The call is ambiguous between the following methods or properties: ‘ImmutableDictionary.GetValueOrDefault<string,string>(IReadOnlyDictionary<string,string>, string)’ and ‘ImmutableDictionary.GetValueOrDefault<string,string>(IDictionary<string,string>, string)’

The reason is that we offered this extension method for IDictionary<TKey, TValue> as well as IReadOnlyDictionary<TKey, TValue>. Since an instance of the concrete ImmutableDictionary<TKey, TValue> class is both, the compiler cannot decide which one to use. Typing the local variable to either of the types will work. You can also type the local variable as the IImmutableDictionary<TKey, TValue> interface as the interface only extends IReadOnlyDictionary<TKey, TValue> (in fact that’s why we didn’t catch it earlier as this was the code we used in the unit tests).

We’ve solved this issue by removing the overloads for IDictionary<TKey, TValue> and IReadOnlyDictionary<TKey, TValue>. Instead we added one for IImmutableDictionary<TKey, TValue>. This solves ambiguity and keeps the immutable package focused on types relating to immutable data structures.

Summary

The latest iteration of the immutable collections preview adds an immutable array type. It’s a zero overhead wrapper around a regular array that ensures it never changes.

Go play around with it and let us know what you think!

0 comments

Discussion is closed.

Feedback usabilla icon