Performance Quiz #4 — Of Sets and Sundry


By popular demand, here is the next exciting Performance Quiz — your chance for fame, glory and recognition or at least a few minutes of amusement anyway 🙂
 
Below are two methods for taking an array of strings in some random order, removing the duplicates from the array and then counting up how many distinct strings there were.
 
For the test setup the number of strings is the input “cStrings” — we make the strings “0”, “1”, and so forth up to cStrings as our test strings and then set “cTests” to be 5 times the number of strings.  “testOrder” is an array of cTests random numbers in the range [0, cStrings-1]
 
The questions are:
 
Q1: What is the time complexity of each algorithm?
Q2: The space usage?
Q3: Is the ListDictionary ever faster than the hashtable or smaller?  If so, for what values of “cStrings” is this true?
 
 
    static int TestViaListDictionary()
    {
        ListDictionary t = new ListDictionary();
 
        int i;
 
        for (i = 0; i < cTests; i++)
        {
            String key = testString[testOrder[i]];
           
            if (!t.Contains(key))
                t.Add(key, key);
        }
 
        int count = 0;
        foreach (Object k in t) count++;
           
        return count;
    }
 
    static int TestViaHashtable()
    {
        Hashtable t = new Hashtable();
 
        int i;
 
        for (i = 0; i < cTests; i++)
        {
            String key = testString[testOrder[i]];
           
            if (!t.ContainsKey(key))
                t.Add(key, key);
        }
 
        int count = 0;
        foreach (Object k in t) count++;
           
        return count;
    }
 
The specific initialization code looks like this:
 
    static String[] testString;
    static int[] testOrder;
 
    static int cStrings;
    static int cTests;
    static void InitializeTest()
    {
        int i;
 
        testString = new String[cStrings];
 
        for (i = 0; i < cStrings; i++)
            testString[i] = i.ToString();
 
        testOrder= new int[cTests];
 
        Random r = new Random();
 
        for (i= 0; i < cTests; i++)
            testOrder[i] = r.Next() % cStrings;
    }
Comments (20)

  1. IlkkaP says:

    I take my chanches:

    Q1. What is the time complexity of each algorithm?

    ListDictionary is inherently O(N) complexity, and HashTable should be near O(1).

    Q2: The space usage?

    I guess ListDictionary uses less space, it’s key should be smaller than the hashtables and uses less resources since it is less complex.

    Q3: Is the ListDictionary ever faster than the hashtable or smaller? If so, for what values of "cStrings" is this true?

    Hashtable has much more overhead (calc/recalc hash, for example) so ListDictionary should be faster until some break-even with hashcalc. Maybe cStrings<20?

  2. Eric Wilson says:

    Question 1:

    ListDictionary is O(N^2), because the call to "Contains" is a linear search, and Contains is called cTests times.

    HashTable is O(N), because ContainsKey for a hashtable is a O(1) operation.

    Question 2: Both are pretty close for small values. For larger values, ListDictionary would use less space because when the number of elements in the list reaches the total elements, in list doubles in size.

    Compare this to Hashtable, which must increase in size to the next largest PRIME bucket size. Also, Hashtables by their nature never use up all of the buckets they have allocated, so they are almost always going to use more space for the same amount of data.

    Question 3: According to the documentation, ListDictionary is faster for values of cTests < ~10. This is probably do to the fact it is faster to simply walk the list instead of during all the bucket lookups that a hashtable would require.

  3. Ryan Heath says:

    Didnt know ListDictionary, but if it behave like a linked list then:

    A1:

    ListDictionary

    checking:

    on the average O(N)

    counting:

    O(N)

    Hashtable

    checking:

    on the average O(1)

    counting:

    O(N)

    A2:

    since we always test 5 times cStrings

    the probability of having all available chars

    in both container is near 100%

    (cStrings^(5*cStrings) – cStrings) / cStrings^(5*cStrings) => ~1

    Thus

    ListDictionary

    cStrings times space required for a single entry.

    Hashtable

    cStrings times space required for a single entry plus more…

    Guess that Hashtable is using more space

    since it has to deal with buckets and allocating space upfront…

    A3:

    After a certain cStrings (very high, I think)

    the complexity of checking of the Hashtable will be O(n) thus more or less equal to ListDictionary.

  4. David Levine says:

    Here’s some observed behavior…

    I ran several tests. The first thing I noticed was that the ListDictionary must do some one-time initializations because the results I got were different if I ran some warmup tests first or if I simply starting running tests for measurements.

    Using a simple TimeSpan to measure test runs (Perf counters would be much better for small numbers of strings…)

    For 10 strings, both ran in less then 1 millisecond if ListDictionary was warmed up. If it was not then it took 46 mS and ths hashtable took 15 mSec.

    100 strings both ran in less then 1 mSec

    1000 strings, ListDictionary ran in 93 mSec and hashtable less then 1 mSec

    10000 strings, ListDictionary ran in 25781.25 mSec (25 seconds), and hashtable in 46.875 mSec, which is 550 times faster then a ListDictionary.

    I didn’t have time to check memory allocations.

    I’d guess that hashtable uses more memory due to extra allocations for buckets, but that might be swamped by ListDictionary reallocations. I’d want to run it under a memory profiler to be sure but I’m out of time.

  5. Rico Mariani says:

    Impressive comments so far, nice work guys! 🙂

  6. David Levine says:

    BTW…why use:

    foreach (Object k in t) count++;

    when t.Count was available?

    I ran the test both ways and while the actual measured time took longer to run the test the overall difference between the two collections still remained roughly the same so it did not seem to be worth making a distinction.

  7. Rico Mariani says:

    Oh I guess I wasn’t very clear about the point there — I wanted to benchmark the enumeration. The fact that it’s a count is incidental to walking the elements which is the point.

  8. Matthew W. Jackson says:

    I think he’s trying to compare the performance of the enumerators. The count++ is just there so the loop can do something and won’t get optimized away.

    Which brings up a good point. Everybody is discussing the performance of lookups, but nobody is looking at the enumeration.

    The ListDictionary can enumerate items very fast compared to a Hashtable.

    When enumerating a ListDictionary, the enumerator only has to visit N Nodes, so enumeration is O(N).

    When enumerating a Hashtable, the enumerator has to skip over empty buckets, so if the capacity is very large compared to the count, the enumerator may be doing extra work. Enumeration of a hashtable is O(C), and C is always some prime number bigger than N by an amount dependant on the load factor.

    However, when enumerating a large dictionary that has had many items added and removed during its lifetime, the efficiency of the ListDictionary may degrade due to locality. If the nodes end up being stored in different areas of memory, you could possibly wind up with a cache miss for every item in the dictionary.

    However the Hashtable is always stored as a contiguous array of buckets, and will always maintain its locality. This is why, on modern machines with fast processors and slow memory, arrays generally perform better than linked lists. Even a Red Black Tree can suffer from this problem, and I assume that this is why Microsoft didn’t usa a tree implementation for their SortedDictionary. While a tree might have better guaranteed performance, there are no guarantees on garbage-collected languages running on machines with lots of data and limited cache.

    If you have a good guess ahead of time on the upper bound size of your dictionary, you can create a Hashtable with a big enough capacity so that it never has to be resized and rehashed, and it should beat most implementations hands down (assuming you don’t have stupid hash functions). If you specify a capacity too small (or rely on the default capacity all of the time), you’re completely destroying any performance gains of a Hashtable.

    You can always use the HybridDictionary, but you should measure its performance first and weigh the slight savings in storage that you might gain from using it.

    Besides, there’s no generic version of it…which is fine because I don’t think I ever used it. However, it appears that the framework uses it in several places, including XML, ASP.NET, Configuration, Data, Component Model, and Regular Expressions. I guess that this means that the HybridDictionary works fairly well, assuming that somebody actually tested the performance and memory usage in at least a few instances of it.

  9. Matthew W. Jackson says:

    Darn it you posted that response about enumeration while I was typing in my lengthy post.

  10. Rico Mariani says:

    Yes but yours is so good 🙂

    I may not have to publish an answer at all if this keeps up 🙂

  11. IlkkaP says:

    Some empirical observations:

    Running cStrings at 10000, the ListDictionary is generally faster. I ran the loops for 50 times for both (from cold start), with 100ms pause between each run, and the ListDictionary is 66000 ticks (0,0184s) faster. At 100000 the Hashtable is 504367 ticks (0,141s) faster. I think the zero point is somewhere 15000-18000 cStrings. The initialization time for both is almost the same. This was something I was expecting (typo in my first feedback, should have written <20k, not 20)

    From the allocation point, there are no big differences as far as I can see. Using performance monitor, the biggest difference is that Hashtable uses a lot of more Gen0 (760250 bytes vs. 538012 of ListDictionary, average).

    IMHO, the allocation profiler (from Gotdotnet) indicated for both apparently good locality of memory allocation for the lists/tables (if I understand correctly the graphs).

    I noticed a curious thing because of my error in the code, where I left the cTests at 5. The loop times for Hashtable and ListDictionary are almost identical. The curious thing is that the first 16 (always the first 16, tested 8 times to be sure) iterations took 300-500ticks, the subsequent iterations fell under 100, mostly at 80ticks. I tried 100 iterations and they remained under 100 after the first 16 iterations. GC gave up, or warmed up..?

  12. StefanG says:

    I wrote a small test program that called your test functions with a varying number of strings.

    I also added a HybridDictionary test.

    I ran the test in three different variations for HybridDictionary and HashTable:

    1) Using the default constructor

    2) Always setting the capacity to 1000

    3) Setting the capacity to cStrings

    The results can be seen below.

    From the results we can form the following general guidelines:

    1) Never use ListDictionary if there is a chance that the number of elements might grow above 10-20

    2) HybridDictionary is always safe to use. It will always give good performance

    3) Avoid setting the capacity unnecessarily high for HashTables and HybridDictionaries

    Regarding the space usage, the hashTable object is larger than a ListDictionary object, but a ListDictionaryNode occupies more space than a Hashtable.bucket (since it is an object, while the bucket is a struct)

    This means that for small collections (say below 1000 items) the ListDictionary will be smaller, but for larger collections

    the HashTable will be slightly smaller (Assuming that the capacity has been set properly).

    RESULTS

    Times in microseconds.

    Using default constructor

    size ListDictionary HashTable HybridDictionary

    1 0.9 2.9 0.9

    2 1.5 3.9 1.8

    5 5.8 6.9 6.0

    10 17.0 13.9 17.6

    20 54.4 33.2 36.8

    50 279.2 76.1 83.2

    100 994.5 158.9 168.0

    200 4701.6 379.0 403.5

    500 26353.7 959.2 1014.6

    1000 94681.6 1861.4 1971.3

    Always setting capacity to 1000

    size ListDictionary HashTable HybridDictionary

    1 0.9 35.5 33.3

    2 1.6 36.6 35.6

    5 5.5 39.4 38.3

    10 14.1 45.1 44.9

    20 49.6 58.1 57.1

    50 299.5 95.6 99.0

    100 1007.5 151.9 155.3

    200 5109.4 302.5 314.8

    500 26353.7 723.1 746.2

    1000 96502.4 1528.9 1587.1

    Setting capacity to cStrings

    size ListDictionary HashTable HybridDictionary

    1 0.9 2.7 0.9

    2 1.6 4.0 1.7

    5 5.3 7.5 5.7

    10 14.7 13.1 14.1

    20 56.9 28.1 31.1

    50 275.8 65.8 70.9

    100 952.8 155.6 144.7

    200 5135.6 313.2 329.9

    500 27336.6 781.2 813.5

    1000 96502.4 1543.1 1660.8

  13. David Levine says:

    I decided to get more accurate measurements so I wrote a wrapper around the QueryPerformanceCounter Win32 API to get counts at 100 nanoSecond intervals. I also refactored the test to measure the times separately for enumerating the collection and for invoking the methods Contains and Add for each collection type.

    The results were interesting. The enumeration time is roughly the same for both collections. For the ListDictionary class the method Contains is definitely the time hog, but the method Add takes 2nd place. The same methods for the Hashtable are far more efficient for large sets of data. I still haven’t checked memory allocations so I don’t know if the difference is due to trading data space for speed or just differences in algorithms. I’d suspect the former.

    Cache warmups (10 strings)

    Dictionary warmup: Enumeration 969 uSecs; Total Test 64.93 mSecs.

    Hashtable warmup: Enumeration 79 uSecs; Total Test 2.15 mSecs.

    Testing 1 strings

    Dictionary: Enumeration 10 uSecs; Contains: 23 uSecs; Add: 5 uSecs; Total: 98 uSecs.

    Hashtable: Enumeration 9 uSecs; Contains: 24 uSecs; Add: 5 uSecs; Total: 100 uSecs.

    Testing 1 strings

    Dictionary: Enumeration 8 uSecs; Contains: 22 uSecs; Add: 5 uSecs; Total: 95 uSecs.

    Hashtable: Enumeration 6 uSecs; Contains: 23 uSecs; Add: 5 uSecs; Total: 120 uSecs.

    Testing 1 strings

    Dictionary: Enumeration 7 uSecs; Contains: 21 uSecs; Add: 5 uSecs; Total: 92 uSecs.

    Hashtable: Enumeration 6 uSecs; Contains: 22 uSecs; Add: 5 uSecs; Total: 91 uSecs.

    Testing 5 strings

    Dictionary: Enumeration 8 uSecs; Contains: 105 uSecs; Add: 22 uSecs; Total: 360 uSecs.

    Hashtable: Enumeration 8 uSecs; Contains: 110 uSecs; Add: 22 uSecs; Total: 368 uSecs.

    Testing 10 strings

    Dictionary: Enumeration 13 uSecs; Contains: 505 uSecs; Add: 45 uSecs; Total: 993 uSecs.

    Hashtable: Enumeration 9 uSecs; Contains: 214 uSecs; Add: 57 uSecs; Total: 710 uSecs.

    Testing 100 strings

    Dictionary: Enumeration 36 uSecs; Contains: 2.947 mSecs; Add: 609 uSecs; Total: 7.740 mSecs.

    Hashtable: Enumeration 24 uSecs; Contains: 2.170 mSecs; Add: 504 uSecs; Total: 7.167 mSecs.

    Testing 1000 strings

    Dictionary: Enumeration 193 uSecs; Contains: 107.119 mSecs; Add: 22.6 mSecs; Total: 171.765 mSecs.

    Hashtable: Enumeration 221 uSecs; Contains: 22.488 mSecs; Add: 4.805 mSecs; Total: 69.792 mSecs.

    Testing 10000 strings

    Dictionary: Enumeration 1.932 mSecs; Contains: 22.315.314 seconds; Add: 5.85.854 seconds; Total: 27.955.303 seconds.

    Hashtable: Enumeration 2.747 mSecs; Contains: 245.769 mSecs; Add: 51.936 mSecs;Total: 756.750 mSecs.

    test done

  14. Matthew W. Jackson says:

    I feel like pointing out that the memory footprint of a ListDictionary will increase somewhat drastically running on a 64-bit machine when compared to a Hashtable. Of course, both will increase quite a bit due to all of the boxed references for every key and value. Then again, boxed references to ValueTypes are a bit of a waste to begin with.

    Luckily, we don’t have that problem in Whidbey.

  15. Rico Mariani says:

    There is no inherent boxing requirement in this problem — the collection class usages all involve strings.

  16. Rico Mariani says:

    btw: I’ll be posting "the solution" later today including source code for a benchmark program I wrote so you can try it yourself. Like David I used QueryPerformanceCounter to get the best measurements. It’s pretty easy to set it up.

  17. Matthew W. Jackson says:

    Oh, right….strings. It seems I was getting off topic. On the other hand, the size of all string references will double on a 64-bit machine as well. There’s no way around that. <g>

    But seeing as most dictionaries are in some form of string->object, the issue of boxing overhead isn’t usually that big a deal.

  18. This is not part of my series on performance tuning a specific app. Problem: I have a Registry class in which I want to place a generic collection of objects. In this way I can add new items to the…