BigArray, getting around the 2GB array size limit

I’ve received a number of queries as to why the 64-bit version of the 2.0 .Net runtime still has array maximum sizes limited to 2GB. Given that it seems to be a hot topic of late I figured a little background and a discussion of the options to get around this limitation was in order.

First some background; in the 2.0 version of the .Net runtime (CLR) we made a conscious design decision to keep the maximum object size allowed in the GC Heap at 2GB, even on the 64-bit version of the runtime. This is the same as the current 1.1 implementation of the 32-bit CLR, however you would be hard pressed to actually manage to allocate a 2GB object on the 32-bit CLR because the virtual address space is simply too fragmented to realistically find a 2GB hole. Generally people aren’t particularly concerned with creating types that would be >2GB when instantiated (or anywhere close), however since arrays are just a special kind of managed type which are created within the managed heap they also suffer from this limitation.

<Sidenote> managed arrays: arrays are a first class type in the CLR world and they are laid out in one contiguous block of memory in the managed garbage collected heap. In the CLR 1.1 they can be thought of as the only generic type (in 2.0 we’re introducing a much more universal concept of generics) in that you can have an array that is of the type of any managed type that you like (primitive types, value types, reference types). It is interesting to think about what that means in context of the 2GB object instance size limit imposed on objects in the managed heap. With value types (bool, char, int, long, struct X {}, etc…) the actual data of the instance for each element in the array will be laid out contiguous with the next element in memory, since the 2GB limit discussed earlier applies to the total array size, and the array size is a factor of the type size the maximum number of elements you can store in an array of type X will vary proportionally to the size of X.

Differing from this are arrays of reference types (e.g. objects, strings, class Y {}, etc…), for these arrays the actual array will be that of a bunch of references, initially null. To initialize the array your code will need to go through one element at a time and create or assign an appropriate instance of the type to that array element. The 2GB size limit for arrays applies to this array of references, not the instances of the objects themselves. On a 32-bit machine if you create an array of type object (object[]) and one instance of type object per element in the array then your available virtual address space will end up limiting the size of your array as you will never be able to fit enough objects in memory to be able to fill up a 2GB object array with unique object references.</Sidenote>

The developer visible side of this is that array indexes are signed integers (with a byte[] you can use the full positive space of the signed integer as an index (assuming the array is 0 based), with other types you use some subset of that space until the total array size is 2GB). While some of the BCL APIs that deal with arrays have alternate signatures that take longs this isn’t yet ubiquitous in the framework (i.e. the IList interface (which the BCL’s Array class implements) uses int indexes).

It is debatable whether or not we should have included a “Big Array” implementation in the 2.0 64-bit runtime, and I’m sure that debate will rage for some years to come. However, as 2.0 is getting ready to ship and there currently isn’t any support for this we are going to have to live without it until at least the next version.

So, what is there to do in .Net 2.0 if you have an application which requires arrays that are very large?

Well, first switch to 64-bit! As mentioned, it is next to impossible to allocate a full 2GB array on 32-bit because of the way that the virtual address space is broken up by modules and other various allocations. Simply switching to 64-bit will buy you the ability to allocate those full 2GB blocks (well, close anyway, the total object size is limited to 2GB, but there is some CLR book-keeping goo in there that takes a few bytes).

What if that still isn’t enough? You have a couple of choices:

A) Rethink your application’s design? Do you really need a single gigantor array to store your data? Keep in mind that if you’re allocating 8GB of data in a single array and then accessing it in a sparse and random manner you’re going to be in for a world of paging pain unless you have a ton of physical memory. It is very possible that there is another data organization scheme you can use where you can group data into frequently used groups of some sort or another and manage to keep under the 2GB limit for an individual group. If you choose correctly you can vastly improve your applications performance due to lower paging and better cache access characteristics that come from keeping things that are used together close to one another.

B) Use native allocations. You can always P/Invoke to NT’s native heap and allocate memory which you can then use unsafe code to access. This isn’t going to work if you want to have an array full of object references, but if you just need a huge byte[] to store an image this might work out fine, even great. The added cost of the P/Invoke is low because the NT APIs have simple signatures that don’t require marshaling and the code executed when allocating an 8GB block is probably mostly zeroing the memory anyway. If you choose this option you will have to write a small memory management class of some kind and be comfortable using unsafe code. I know that Paint.Net (https://blogs.msdn.com/joshwil/archive/2005/04/07/406218.aspx) uses this very method for allocating the memory in which they store the image (and it’s various layers) which you’re editing. This is a good solution for the case where you really need a single unbroken allocation of some large size. While it isn’t a very general purpose solution it works out great for the Paint.Net guys.

C) Write your own BigArray class.

I’d stress that option C is my least favorite of the above three, but I will acknowledge that there are probably cases where it is the right thing to do. Given that, I have gone and written one myself. This is a very bare bones implementation, just the array allocation and accessors are implemented, I will leave implementing any extended functionality (like the functionality provided by the static members of the Array class, Sort, Copy, etc… or writing big collections on top of it) as an exercise for the reader.

// Goal: create an array that allows for a number of elements > Int.MaxValue
class BigArray<T>
{
// These need to be const so that the getter/setter get inlined by the JIT into
// calling methods just like with a real array to have any chance of meeting our
// performance goals.
//
// BLOCK_SIZE must be a power of 2, and we want it to be big enough that we allocate
// blocks in the large object heap so that they don't move.
internal const int BLOCK_SIZE = 524288;
internal const int BLOCK_SIZE_LOG2 = 19;

    // Don't use a multi-dimensional array here because then we can't right size the last
// block and we have to do range checking on our own and since there will then be
// exception throwing in our code there is a good chance that the JIT won't inline.
T[][] _elements;
ulong _length;

    // maximum BigArray size = BLOCK_SIZE * Int.MaxValue
public BigArray(ulong size)
{
int numBlocks = (int)(size / BLOCK_SIZE);
if ((numBlocks * BLOCK_SIZE) < size)
{
numBlocks += 1;
}

            _length = size;
_elements = new T[numBlocks][];
for (int i=0; i<(numBlocks-1); i++)
{
_elements[i] = new T[BLOCK_SIZE];
}
// by making sure to make the last block right sized then we get the range checks
// for free with the normal array range checks and don't have to add our own
_elements[numBlocks-1] = new T[NumElementsInLastBlock];
}

    public ulong Length
{
get
{
return _length;
}
}

    public T this[ulong elementNumber]
{
// these must be _very_ simple in order to ensure that they get inlined into
// their caller
get
{
int blockNum = (int)(elementNumber >> BLOCK_SIZE_LOG2);
int elementNumberInBlock = (int)(elementNumber & (BLOCK_SIZE – 1));
return _elements[blockNum][elementNumberInBlock];
}
set
{
int blockNum = (int)(elementNumber >> BLOCK_SIZE_LOG2);
int elementNumberInBlock = (int)(elementNumber & (BLOCK_SIZE – 1));
_elements[blockNum][elementNumberInBlock] = value;
}
}
}

The beauty of this implementation is that the JIT already understands single dimensional array accesses intrinsically, including range checking code. In practice this class ends up being almost as fast as real array access for small arrays (< BLOCK_SIZE) and not too much slower once you get to reasonably big arrays. It doesn’t waste much space compared to a normal array because the last block is right sized and the performance is good because the getter and setter for array elements are simple enough that they should get inlined into the calling method, this becomes very important for getting anywhere close to normal array access speeds.

Here is an example of big array usage:

public static void Main()
{
long size = 0x1FFFFFFFF;
BigArray<int> baInt = new BigArray<int>(size);
long len = baInt.Length;
for (long i=0; i<len; i++)
{
baInt[i] = i;
}
Console.WriteLine("baInt[len/2]=" + baInt[len/2]);
}

You could imagine also exposing the fact that this BigArray<T> implementation has blocks through a couple of properties and a indexer of this[int block, int element] which would allow people to intelligently write code to do block based access on the array (e.g. merge sorts that are block intelligent). This can be important for performance as we know that elements within a single block are contiguous in memory, however we cannot make that guarantee about elements in neighboring blocks.

It is worth noting that given the allocation scheme of the BigArray<T> constructor we may very well have multiple garbage collections while it runs, because of this you don’t really want to be using large instances of this class in a throw away manner. My advice would be to use this carefully and sparingly, instead favoring architectures which don’t require such large single arrays.