Why are NameValueCollection lookups slower than Hashtable? [Kim Hamilton]

An internal discussion came up recently on the performance difference of lookups in Hashtable versus NameValueCollection. Benchmarks revealed that NVC lookups were ~2-8 times slower than Hashtable.

For example, when doing 40,000 lookups on a collection size of 100,000, NameValueCollection is about 2.6x worse:

NameValueCollection time: 0.050 sec
Hashtable time: 0.019 sec

This leads to the question of whether NameValueCollection and Hashtable have different asymptotic behavior or if it simply has additional overhead. By repeating the benchmark over increasing collection sizes, it’s clear that the asymptotic behavior is the same for lookups.

n = 100,000
NameValueCollection time: 0.050 sec
Hashtable time: 0.019 sec
n = 200,000
NameValueCollection time: 0.051 sec
Hashtable time: 0.019 sec
n = 400,000
NameValueCollection time: 0.050 sec
Hashtable time: 0.019 sec

This confirms what we expected – that lookup cost is constant for both. But what causes the additional overhead for NameValueCollection? NameValueCollection allows a key to be associated with one or more values. So the additional cost is caused by the way lookups are performed internally: NameValueCollection actually delegates the hash key lookups to an internal Hashtable, which may contain multiple entries associated with that key.

If there are multiple entries associated with the key, it will return them appended together. So in fact, if you’ve assigned multiple entries to the same key (the trials above did not), then the lookup cost will be linear in the number of items you’ve assigned to the key, because the accessor will append them together. This is demonstrated in the following NameValueCollection.

NameValueCollection c = new NameValueCollection();
c.Add("dogs", "Bubba");
c.Add("dogs", "Mojo");
Console.WriteLine(c["dogs"]);
// returns Bubba,Mojo

But back to the original question – even if multiple values aren’t assigned to the same key, the additional bookkeeping in place causes the additional constant overhead seen above.

Let’s move on to some other interesting performance aspects of NameValueCollection. NameValueCollection allows items to be looked up by index as well. Expanding the above example, you can see how “Bob” can be looked up by key or by index.

NameValueCollection c = new NameValueCollection();
c.Add("dogs", "Bubba");
c.Add("dogs", "Mojo");
Console.WriteLine(c["dogs"]);
// returns Bubba,Mojo

c.Add("cats", "Bob");
Console.WriteLine(c["cats"]);
// returns Bob
Console.WriteLine(c[1]);
// returns Bob

The internal bookkeeping allowing index-based lookups causes even more noticeable performance differences in adds and removes. Adding is again constant, but with a slightly larger overhead than lookups, caused by the fact that the item is also being added to an array that allows lookup by index. The following shows 6,000 iterations of adds performed on collections of varying starting size.

n = 10,000
NameValueCollection time: 0.017 sec
Hashtable time: 0.002 sec
n = 20,000
NameValueCollection time: 0.018 sec
Hashtable time: 0.002 sec
n = 40,000
NameValueCollection time: 0.017 sec
Hashtable time: 0.002 sec

In this test, NameValueCollection is performing about 8.5x worse for adds.

The difference is most apparent on removes. For NameValueCollection, the cost of removes is linear in the size of the collection, caused by the need to shift the index-based lookup array. Note that this means NameValueCollection and Hashtable have different asymptotic behavior for removes.

n = 10,000
NameValueCollection time: 8.699 sec
Hashtable time: 0.002 sec
n = 20,000
NameValueCollection time: 24.657 sec
Hashtable time: 0.002 sec
n = 40,000
NameValueCollection time: 39.541 sec
Hashtable time: 0.002 sec

So you may be wondering when you’d want to use NameValueCollection. NameValueCollection only accepts keys and values that are Strings, so this is a very specialized collection. It’s useful in a situation in which you either need to associate multiple values with a key, or to do hash-based lookups as well as lookup by index (and hopefully not perform too many removes).

However, if you need to store string key/value pairs and you don’t need to perform index-based lookups or associate multiple values with a key, you may prefer to use the generic Dictionary class. This has the same asymptotic behavior as Hashtable in all cases and furthermore avoids any costs due to boxing.

Credits go to Anthony Moore and Joe Huffman for their input into this internal thread.