Performance Quiz #4 -- Of Sets and Sundry -- Solution

Well thank you all very much for those great responses, I was very impressed!

What I did to get my answers was build a little harness with the test cases and then used QueryPerformanceCounter and friends to get the most accurate times I could. Then seperately ran CLRProfiler to get allocation results.  I added two additional competitors to the field for interest sake -- in addition to the original two which were ListDictionary and Hashtable I also tried HybridDictionary and my own custom collection class affectionately called "CheatyBitSet".  Some words on that last one -- the cheaty implementation is supposed to illustrate the kind of result you might be able to achieve if you knew something more about your input -- in this case the total number of strings and a perfect hash function for them.  That may very exotic but it happens fairly often -- if you were looking at state names for instance or playing cards or some such.  In those cases you could do some extra work to take advantage of your special knowledge and get more performance.  Now this part is really important: almost all of the time these kinds of optimizations really are the "root of all evil" -- so while it's good to know what kind of horsepower you have under the hood do be careful out there!  Enough said.

OK First let me give you the top level answers to the questions -- I measured these on my very own machine here using today's build of the CLR so your times are definately going to vary -- but still they should be interesting (and the benchmark is included at the end, just remember csc /o+ test.cs).

The times are microseconds per full test iteration (not per string test) as shown in the benchmark code.

cStrings 1 2 4 8 9 10 12 16 32 64
ListDictionary 0.3 0.7 2.1 5.3 6.3 8.4 10.6 20.6 92.3 285.8
HybridDictionary 0.4 0.9 2.3 5.7 8.7 10.0 11.5 16.7 34.8 67.7
Hashtable 1.0 1.7 3.1 6.9 7.3 8.1 10.1 13.6 31.8 63.3
CheatyBitSet 0.1 0.2 0.3 0.6 0.7 0.8 1.0 1.8 3.6 5.9

The first thing to observe is that when the list gets to be size 10 the hash table starts winning -- it was still losing at size = 9.  That's part of the answer to question 3.  And of course you can see that the list dictionary is following a quadratic progression because of its linked list implemenation whereas the hash table based solution has time that is growing linearly with the size of the problem.  The HybridDictionary never wins outright but it does all right time-wise in all the cases.  And lastly CheatyBitSit wins as expected -- the domain specific knowlege (fancy word for cheating) makes it a full 10x faster.

OK so why am I not rushing out and telling everyone to go build their own cheaty collections?  You must remember that this is just a microbenchmark that does no real work -- normally it's the actual work that needs to be done during the processing that dominates the runtime and not the iteration itself so what you would actually see in terms of wall clock improvement might not even be measurable. 

Now looking at space, it turns out that the Hashtable is always bigger than the ListDictionary -- the linked list is fairly frugal after all (and that's the other part of question 3).  Here's a sample of what the sizes look like for cStrings=5 (hash table losing on speed) and cStrings=15 the hash table is winning on speed)

Allocation Summary Strings 5 Objects Bytes
ListDictionary 12 228
HybridDictionary 13 248
Hashtable 8 312
CheatyBitSet 1 16
Allocation Summary Strings 15 Objects Bytes
ListDictionary 32 588
HybridDictionary 29 844
Hashtable 19 760
CheatyBitSet 1 16

By cStrings=15 HybridDictionary is already using the most space but it will tend to stay neck and neck with HashTable.  CheatyBitSet hasn't even gone to its second dword yet but it too will grow linearly.  All these solutions are growing linearly.

And that about wraps it up for the questions posed, what follows are some more details on space and calls and the source code for my benchmark.  In the interest of keeping this already long posting from getting even longer I'm only including the space analysis for cStrings=15 but of course with the source you can do all the analysis you like.  The space detail shows first how many of each type was allocated and then it shows which functions were called and how many bytes the function and its children allocated (so you can't add up the allocated bytes of the functions as they overlap)

ListDictionary Summary cStrings = 15

Times Called

Bytes Allocated

System.Collections.DictionaryEntry 15 240
System.Collections.Specialized.ListDictionary.NodeEnumerator 1 24
System.Collections.Specialized.ListDictionary.DictionaryNode 15 300
System.Collections.Specialized.ListDictionary 1 24
Total Allocations 32 588
System.Collections.Specialized.ListDictionary::.ctor 1 0
System.Collections.Specialized.ListDictionary::Contains 75 0
System.Collections.Specialized.ListDictionary::Add 15 300
System.String::Equals 672 0
System.String::EqualsHelper 672 0
System.Collections.Specialized.ListDictionary::GetEnumerator 1 24
System.Collections.Specialized.ListDictionary.NodeEnumerator::.ctor 1 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::MoveNext 16 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::get_Current 15 240
System.Collections.Specialized.ListDictionary.NodeEnumerator::get_Entry 15 0
Total Calls 1483
HybridDictionary Summary cStrings = 15

Times Called

Bytes Allocated

System.Collections.Hashtable.HashtableEnumerator 1 36
System.Collections.Specialized.HybridDictionary 1 20
System.Collections.DictionaryEntry 15 240
System.Collections.Specialized.ListDictionary.NodeEnumerator 1 24
System.Collections.Specialized.ListDictionary.DictionaryNode 8 160
System.Collections.Specialized.ListDictionary 1 24
System.Collections.Hashtable.bucket [] 1 288
System.Collections.Hashtable 1 52
Total Allocations 29 844
System.Collections.Hashtable::.ctor 1 288
System.Collections.Hashtable::GetHash 76 0
System.Collections.Hashtable::Insert 15 0
System.Collections.Hashtable::KeyEquals 55 0
System.String::GetHashCode 76 0
System.Collections.Hashtable::Add 15 0
System.Collections.Specialized.ListDictionary::Contains 13 0
System.Collections.Specialized.ListDictionary::Add 8 160
System.String::Equals 133 0
System.String::EqualsHelper 133 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::.ctor 1 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::MoveNext 9 0
System.Collections.Specialized.HybridDictionary::.ctor 1 0
System.Collections.Specialized.HybridDictionary::Contains 75 0
System.Collections.Specialized.HybridDictionary::Add 15 548
System.Collections.Specialized.HybridDictionary::ChangeOver 1 364
System.Collections.HashHelpers::GetPrime 1 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::get_Key 8 0
System.Collections.Specialized.ListDictionary.NodeEnumerator::get_Value 8 0
System.Collections.Hashtable::Contains 61 0
System.Collections.Hashtable::ContainsKey 61 0
System.Collections.Specialized.HybridDictionary::GetEnumerator 1 36
System.Collections.Hashtable::GetEnumerator 1 36
System.Collections.Hashtable.HashtableEnumerator::.ctor 1 0
System.Collections.Hashtable.HashtableEnumerator::MoveNext 16 0
System.Collections.Hashtable.HashtableEnumerator::get_Current 15 240
Total Calls 800
Hashtable Summary cStrings = 15

Times Called

Bytes Allocated

System.Collections.Hashtable.HashtableEnumerator 1 36
System.Collections.DictionaryEntry 15 240
System.Collections.Hashtable.bucket [] 2 432
System.Collections.Hashtable 1 52
Total Allocations 19 760
System.Collections.Hashtable::.ctor 1 144
System.Collections.Hashtable::GetHash 90 0
System.Collections.Hashtable::Insert 15 288
System.Collections.Hashtable::KeyEquals 60 0
System.String::GetHashCode 90 0
System.Collections.Hashtable::Add 15 288
System.String::Equals 60 0
System.String::EqualsHelper 60 0
System.Collections.HashHelpers::GetPrime 1 0
System.Collections.Hashtable::ContainsKey 75 0
System.Collections.Hashtable::GetEnumerator 1 36
System.Collections.Hashtable.HashtableEnumerator::.ctor 1 0
System.Collections.Hashtable.HashtableEnumerator::MoveNext 16 0
System.Collections.Hashtable.HashtableEnumerator::get_Current 15 240
System.Collections.Hashtable::rehash 1 288
System.Collections.Hashtable::putEntry 7 0
Total Calls 508
CheatyBitSet Summary cStrings = 15

Times Called

Bytes Allocated

System.UInt32 [] 1 16
Total Allocations 1 16
Test.BitSetCheatForComparisonOnly 1 12
Test.BitSetCheatForComparisonOnly::PerfectHash 75 0
Total Calls 76

--- snip snip --- test.cs ---

using System;
using System.Collections;
using System.Collections.Specialized;

class Test
{
[System.Runtime.InteropServices.DllImport("Kernel32.dll")]
private static extern bool QueryPerformanceCounter(out long count);

    [System.Runtime.InteropServices.DllImport("Kernel32.dll")]
private static extern bool QueryPerformanceFrequency(out long frequency);

    static String[] testString;
static int[] testOrder;

    static int cStrings;
static int cTests;
static int cIterations;

static void Main(String[] args)
{
long t0, t1, freq;
int i;

        if (args.Length != 1)
{
Console.WriteLine("Usage: test <string-count>");
return;
}

        cStrings = Int32.Parse(args[0]);
cTests = cStrings * 5;

        // try to do a constant amount of net work to keep the times measurable
// so with smaller number of strings we'll do more iterations to compensate

cIterations = 10000000/cTests;

InitializeTest();

        QueryPerformanceFrequency(out freq);

        GC.Collect(2);
GC.WaitForPendingFinalizers();
GC.Collect(2);

        QueryPerformanceCounter(out t0);

for (i = 0; i < cIterations; i++)
Test1();

        QueryPerformanceCounter(out t1);

        Console.WriteLine("Test 1: {0,16:s} -- {1,8:n1} microseconds per iteration",
"ListDictionary", (t1-t0)/(double)freq/cIterations*1000000);

        GC.Collect(2);
GC.WaitForPendingFinalizers();
GC.Collect(2);

        QueryPerformanceCounter(out t0);

for (i = 0; i < cIterations; i++)
Test2();

        QueryPerformanceCounter(out t1);

        Console.WriteLine("Test 2: {0,16:s} -- {1,8:n1} microseconds per iteration",
"HybridDictionary", (t1-t0)/(double)freq/cIterations*1000000);

        GC.Collect(2);
GC.WaitForPendingFinalizers();
GC.Collect(2);

        QueryPerformanceCounter(out t0);

for (i = 0; i < cIterations; i++)
Test3();

        QueryPerformanceCounter(out t1);

        Console.WriteLine("Test 3: {0,16:s} -- {1,8:n1} microseconds per iteration",
"HashTable", (t1-t0)/(double)freq/cIterations*1000000);

        GC.Collect(2);
GC.WaitForPendingFinalizers();
GC.Collect(2);

        QueryPerformanceCounter(out t0);

for (i = 0; i < cIterations; i++)
Test4();

        QueryPerformanceCounter(out t1);

        Console.WriteLine("Test 4: {0,16:s} -- {1,8:n1} microseconds per iteration",
"CheatyBitSet", (t1-t0)/(double)freq/cIterations*1000000);

    }

    [System.Runtime.CompilerServices.MethodImplAttribute(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
static void InitializeTest()
{
int i;

        Console.WriteLine("Preparing performance tests {0} strings, {1} lookups, {2} iterations",
cStrings, cTests, cIterations);

        testString = new String[cStrings];

        for (i = 0; i <cStrings; i++)
testString[i] = i.ToString();

        testOrder= new int[cTests];

        Random r = new Random();

        for (i= 0; i < cTests; i++)
testOrder[i] = r.Next() % cStrings;

        // this makes sure there isn't a gross bug in the tests (they should print the same answer)
// and also makes sure that they are Jitted before the benchmark runs

        Console.WriteLine("Test 1 sanity check for same result: {0}", Test1());
Console.WriteLine("Test 2 sanity check for same result: {0}", Test2());
Console.WriteLine("Test 3 sanity check for same result: {0}", Test3());
Console.WriteLine("Test 4 sanity check for same result: {0}", Test4());

}

    [System.Runtime.CompilerServices.MethodImplAttribute(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
static int Test1()
{
ListDictionary t = new ListDictionary();

        int i;

        for (i = 0; i < cTests; i++)
{
String key = testString[testOrder[i]];

if (!t.Contains(key))
t.Add(key, key);
}

        int count = 0;
foreach (Object k in t) count++;

return count;
}

    [System.Runtime.CompilerServices.MethodImplAttribute(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
static int Test2()
{
HybridDictionary t = new HybridDictionary();

        int i;

        for (i = 0; i < cTests; i++)
{
String key = testString[testOrder[i]];

if (!t.Contains(key))
t.Add(key, key);
}

        int count = 0;
foreach (Object k in t) count++;

return count;

    [System.Runtime.CompilerServices.MethodImplAttribute(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
static int Test3()
{
Hashtable t = new Hashtable();

        int i;

        for (i = 0; i < cTests; i++)
{
String key = testString[testOrder[i]];

if (!t.ContainsKey(key))
t.Add(key, key);
}

        int count = 0;
foreach (Object k in t) count++;

return count;
}

    // here we have a class that isn't a fair competitor but it's interesting as a baseline
// it shows what we could accomplish if we were willing to do a lot of work
// and we knew a little more about the input -- such as a perfect hashing function

class BitSetCheatForComparisonOnly
{
private System.UInt32[] bits;

        public BitSetCheatForComparisonOnly(int size)
{
bits = new System.UInt32[size/32+1];
}

        // we're assuming we know the magic perfect hash for our input -- which in this case is nothing more than Int32.Parse
int PerfectHash(String key)
{
int hash = 0;

for (int k = 0; k < key.Length; k++)
hash = hash*10+key[k] - '0';

            return hash;
}

        public void Add(String key)
{
SetBit(PerfectHash(key));
}

        public void SetBit(int i)
{
bits[i/32] |= (UInt32)(1<<(i%32));
}

        public bool GetBit(int i)
{
return 0 != (bits[i/32] & (1<<(i%32)));
}
}

    [System.Runtime.CompilerServices.MethodImplAttribute(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
static int Test4()
{
BitSetCheatForComparisonOnly t = new BitSetCheatForComparisonOnly(cStrings);

        int i;

        for (i = 0; i < cTests; i++)
{
String key = testString[testOrder[i]];
t.Add(key);
}

        // again we're cheating by doing manual enumeration
int count = 0;
for (i = 0; i < cStrings; i++)
if (t.GetBit(i)) count++;

        return count;
}
}