ArrayList’s vs. generic List for primitive types and 64-bits

<edited post, 3:51pm, something in the formatting was messing up the whole page...>

The dynamically growing ArrayList structure is an interesting one, the other day I was looking at memory usage of some performance scenarios on 64-bit machines vs. 32-bit machines and noticed that ArrayLists are double the size on 64-bit machines. After looking at the screen for a second wondering why that was it hit me… Under the covers an ArrayList is just a big Object[], and what’s an Object[]? It’s an array of references to Objects, references which are bigger on 64-bit platforms because they have to be able to reference more memory. Unfortunately that’s just a fact of life with bigger pointers.

But that realization did make me wonder what’s the difference in memory usage characteristics between using an ArrayList and using our new (v2.0) generic List<T> when you’re filling the list with primitive (non-reference) data types. It turns out to be interesting (at least to me) though predictable, so I thought I’d show it here…

As a spoiler, if you think that using the generic List<T> is better, you’re right! But you might not have guessed how right you are!!

Here’s my test code (I use a compile time define to compile with the generic v. non-generic list), I decided to just play with differences between lists containing ints here because they have an interesting property in that the size of a primitive int is the same as an Object reference on a 32bit machine. More about that at the end.


 // SIZE is big enough that the list size dominates my memory usage
const int SIZE = 1000000; 

 public static void Main()
{
#if ARRAYLIST
ArrayList list = new ArrayList();
#else
List<int> list = new List<int>();
#endif
for (int i=0; i<SIZE; i++)
{
list.Add(i);
}
}


First off let’s think about what happens in these two scenarios:

1) ArrayList, under the covers this is an Object[], so to put an int value into the ArrayList we’ll have to box it (in the process creating a heap allocated Int32) and then put the reference to the boxed value into the array.
2) List<int>, with the generic implementation under the covers we’re using an int[], no boxing needed and instead we just shove the actual value being added into the array (thereby avoiding creating the heap allocated Int32).

Given that, we would expect the ArrayList implementation to be more of a memory hog, and it is

Results from a 32bit machine:

Here we can see that 19MB is allocated by the 32-bit ArrayList test, 11MB of which is simply heap allocated Int32 objects which result from the boxing of the int values. We also have 8MB worth of Object[] that have been allocated, notice that the amount of allocated Object[] is larger than the actual amount of space needed because I didn’t set the ArrayList capacity to something near what I knew we would actually. Instead I relied on the ArrayList auto-resizing which uses a 2x growth factor thus possibly ending up with up to ½ of the underlying array being unused, in this example that doesn’t effect the point that I’m trying to make so I ignored it.

On the other hand, using the List<int> version of the test we under the hood allocate a large Int32[] instead Object[], and we don’t need to heap allocate any Int32 objects. These differences result in memory usage of only 8MB for the Int32[]. Conveniently for my example on a 32-bit machine the size of an Object reference is the same as a 32-bit int so these arrays are the same size.

What do we see on a 64bit machine?

In this case the ArrayList test uses approximately two times more memory than the 32-bit version, this is unfortunately a fact of life in the 64-bit world… Given this data we can see that the reason is two fold:

1) References are bigger, and therefore our Object[] which underlies the ArrayList is double the size of it’s 32-bit version (16MB instead of 8MB)
2) Objects take more space in memory in a 64bit process. Every managed object has a couple pieces of CLR goo attached to it, a MethodTable* and a sync block, and these each end up being 8 bytes instead of 4 bytes like they are on 32-bit. In a small object like Int32 this combined with the fact that we keep the total size pointer size aligned results in an object that is two times as big on 64-bit. This results in 23MB of Int32 objects in a 64-bit process vs. 11MB in a 32-bit process. (see appendix below)

However, we can see that if we had instead used the generic List<int> we would have only allocated the same 8MB that we did in a 32-bit process. It is of note that since object references are 8 bytes on a 64-bit machine but an int is still 4 bytes we end up with the size of the underlying array being 50% smaller in the List<int> test case than the ArrayList case whereas on 32-bit they were the same. With smaller primitive data types (short, char, bool) the difference is of course more significant and is visible on both platforms since at that point the types are smaller than a object reference on 32-bit platforms as well.

Given that, what’s the difference in memory usage on the two platforms for the two solutions?

              ArrayList List<int> Difference (%)
32-bit 19MB 8MB 237%
64-bit 39MB 8.1MB 481%

We can see that though it is a definitely a better idea to use the generic List<int> on both platforms it becomes especially important on a 64-bit machine…

Then again, how many people are using ArrayList’s for big lists of ints and other primitive types? That’s a question that I don’t know the answer to…

The analysis (from a memory perspective) would be a lot different if we were talking reference types instead of ints. In that case we don’t have the option to avoid creating object instances; and the size of the reference to the reference type is the same size as that of a reference to an object. There are still plenty of reasons to use the generic List<T> in that case… But memory usage would no longer factor in.

 

 

Appendix: using SOS to look at Int32 objects in memory

64bit, Int32:
0:003> !do 000007ffe724d768
Name: System.Int32
MethodTable: 000000005cbf4408
EEClass: 000000005cc66380
Size: 24(0x18) bytes
(C:\WINDOWS\Microsoft.NET\Framework64\v2.0.amd64chk\assembly\GAC_64\mscorlib\2.0.3600.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type Attr Value Name
000000005cbf4408 40003cf 8 System.Int32 instance 3 m_value

32bit, Int32:
0:003> !do 00a56adc
Name: System.Int32
MethodTable: 78c12528
EEClass: 78c4ae44
Size: 12(0xc) bytes
(C:\WINDOWS\Microsoft.NET\Framework\v2.0.x86ret\assembly\GAC_32\mscorlib\2.0.3600.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type Attr Value Name
78c12528 40003bc 4 System.Int32 instance 437 m_value