Slightly lighter than HashSet – HashSet (an experiment)

Article
12/15/2010

In playing around with the possible solutions or improvements to [previous post:] the heaviness of finalizable WeakReference objects, an obvious thing to do was replace WeakReference objects with using GCHandle structs.

Here’s a test program which adds many WeakReferences to a HashSet. (Note, HashSet uses arrays as the underlying storage.) The WeakReferences will point at some temporary objects which soon become collectable.

The results are that in a minute or two we can create up to about 12 million weak references, and our program will consume an impressive 1.2 GB of RAM, then it crashes:

Now here are the results for the same program with a HashSet of GCHandle instead of WeakReference. The number of objects we can weakly refer to is twice as large.

Even though we had twice as many weak object references (in the form of GCHandles) when we crashed, we were only using about half as much RAM(!). So when used in HashSet, a GCHandle appears to cost only 25% as much memory as a WeakReference.

I’m moderately surprised we ran out of memory allocating an array probably sized < 1 GB. Which is less memory than the other version could successfully allocate. Memory Fragmentation is probably involved.

One last point of difference is that the version using less memory ran a little longer. But this should be expected; overall it got more actual work done. For full and fair comparison it’s important to realize no WeakReference finalizers were executed during the experiment (since they were stuck being held in the hash set), nor were WeakReference objects garbage collected, so the first program is artificially spared some costs that would be incurred in a happier non-crashing scenario with actual recycling of WeakReference objects.

Conclusion

WeakReference seems like an overly costly implementation choice for creating a weak collection. [Actually I’m a little late to the party with that conclusion. If you search you can find a few other articles saying the same thing. But hey, it’s fully independent verification, and I hope you like the experimental results.]

There is also the nice secondary benefit of the GCHandle approach – doing batch Free() of dead GCHandles pulled from an array is going to be more efficient than queuing and running all those finalizers. And potentially less multithreaded. (The finalizers wouldn’t even run unless we did scanning of the collection to evict the dead references, which is what we would do in the batch Free()).

Slightly lighter than HashSet – HashSet (an experiment)

Additional resources