The change from Hashtable to Dictionary


As some of you with the Whidbey preview bits have no doubt noticed, we introduced set of generic collection classes in System.Collections.Generic.  Far from just making generic versions of the current collections, we took the time to revisit how we really expect people to use these types and, in some cases, learned from our mistakes in V1 of the framework.  


 


One of the new collections is Dictionary<K, V>.  The big change that we made from Hashtable has caused a fair amount of internal debate so I thought it would be interesting to see what actual, real-live, customers think of the issue.   In brief, with Hashtable’s indexer, if the key is not found null is returned, with Dictionary<K,V>’s indexer an exception is thrown.


 


For example:


            Hashtable hashtable = new Hashtable();


            hashtable.Add(“key1”, “data1”);


            hashtable.Add(“key2”, “data2”);


            Console.WriteLine(hashtable[“key1”]); //displays “data1”


            Console.WriteLine(hashtable[“key42”]==null); //displays “true”


 


 


            Dictionary<string, string> dic = new Dictionary<string, string>();


            dic.Add(“key1”, “data1”);


            dic.Add(“key2”, “data2”);


            Console.WriteLine(dic[“key1”]); //displays “data1”


            Console.WriteLine(dic[“key42”]); //throws a KeyNotFoundException


 


 


The change has already caused me pain in porting some code that uses Hashtable over to Dictionary<K, V> and some have argued that this use of exceptions is overkill.  The primary reason Dictionary<K, V> throws is that there is no “error” value that works over any V.  Hashtable is able to return null because the key is always a reference type (even if you try to make the key be an int (for example) it will be boxed before added to the hashtable).  If V happens to be a value type then “null” is not a valid return value.   We do have Nullable<T:ValueType> but that is a topic for another day.  


 


All types in the CLR do have a notion of their “default value”.  In C# 2.0 you access this value by using the default operator.  The rules for default() are pretty easy: null for reference types and zero for value types ( 0, 0.0, false, etc). 


default(object) ==null


default(int)    ==0


default(bool)   ==false


default(string) ==null


 


So we could have Dictionary<K, V> return default(V) if the key does not exists.  We have proven with Hashtable that this is a pretty workable plan when V is a reference type.  However, it is arguably a little misleading with V is a valuetype.  After all zero is often in the valid range for int values.  I should also mention that you can always use Dictionary<K,Nullable<V>> if default(V) isn’t an appropriate marker value. For example, with Dictionary<string,Nullable<int>> you will indeed get a null if an entry isn’t found.


 


So this really comes down the question of how commonly do we expect V to be a value type times how bad using the default(V) as a flag verses how bad it is to throw the KeyNotFoundException as described above.


 


Questions for you:



  1. How commonly do you use a value type for V? 

  2. In those cases will default(V) or the Nullable designs address the issue?

  3. Do you like the idea of the indexer throwing a KeyNotFoundException?

We would love to have your feedback!


 

Comments (72)

  1. Aaron Lewis says:

    1. Often.

    2. Nullable, please.

    3. Just personal preference, but I’d really like it if it would return null instead of an exception. Design would lead us away from accessing something that doesn’t exist anyway, but at least it’d be one less "except when using the X collection type, in which case you’ll need to wrap exception handling around it" thing to remember.

  2. >How commonly do you use a value type for V?

    It will happen. But in the most case, I use reference types.

    I like the idea of the indexer throwing a KeyNotFoundException, the reason why, is that if anyone passes a key that doesn’t exists, I think it’s a miss by the programmer. The programmer should in almost every case know what key they have added to the Dictionary, and if they try to get a value of a key that doesn’t exists, they have missed something in their programming. Every kind of programming misses should return in an exception.

  3. Aaron Lewis says:

    But imagine a situation where you have a host of settings that are applicable to something in general, but individual instances of that something may or may not have values for all of the settings. An example would be a programmatic representation of a stylesheet. If you’re iterating through all of the elements, and you have specific actions that your code has to take based upon the contents of the collection, but the collection itself is dynamic, then you’ve got a choice of reacting to a null value (meaning this instance simply doesn’t have a setting for the key you’re ready to provide custom behavior for), or catching an exception.

  4. Marius Cristian CONSTANTIN says:

    1. I haven’t use a Hashtable extensively, but I never used a value type for V.

    2 & 3. The idea of throwing exceptions doesn’t sound very good to me. I started programming in C, with no exceptions and traditional error code handling was fine. Now everybody like to use exceptions because they are fancy. Now when I write code I try not to use exceptions, mainly due to performance implications and multiple point of exit from the function. In conclusion, in my opinion using default(V) or Nullable would be a better approach.

  5. shawn hinsey says:

    I have to admit that the switch to exceptions doesn’t really bother me, as long as the access pattern of collection.ContainsKey() isn’t broken. For example:

    if(collection.ContainsKey(key)

    {

    // access collection[key];

    }

  6. David Levine says:

    Hi Brad,

    1. Other then strings I cannot think of a single case where I actually use a value type for V (e.g. int, float). However, I can think of cases where storing structs in a table would be useful.

    2/3. The issue of using a sentinel return value to indicate KeyNotPresent versus throwing an exception is complicated. As you point out, this effectively removes that value from the range of possible valid values, or it introduces an ambiguity that may be difficult for program logic to distinguish between. However, in many cases I would prefer that then having to constantly deal with the potential of an unexpected exception during a table access.

    In performance sensitive code the cost of an exception is so high that I would prefer that exceptions not be used to indicate KeyNotFound. I’d prefer a collection that did not throw.

    You could offer two types of classes, those that throw and those that do not, or make its behavior configurable. In the case where it does not throw you could allow the user of the class to programmatically set the value to be returned when the key is not present. This way they could use the sentinel value that best suits their needs. You could use the normal defaults when they are not specified.

    The downside is that using two types of collection classes creates its own set of problems. It introduces ambiguity into the design of an implementation.

    Dave

  7. I think a good code design to first use the ContainsKey method to check if a value exists, ok it could affect the performance but it is one solution to use to check if the key exists before accessing it. But I’m not against that the indexer should not throw an exception, but I think it’s a better design if it does so, for example, you will get an index out of range if you try to access an index that is bigger than the size of an array. I think it is the same for "containers" that uses use keys. If you try to get a value from an index of an array that doesn’t exists, you have done something wrong when you implement your code (a miss), and I think it’s the same even if keys are used instead of index. But as I said, this is my opinion.

  8. Daniel O'Connell says:

    1. How commonly do you use a value type for V?

    Fairly rarely actually. Not only do I find it rather rare that the situation comes up where a key is associated with just a value such as an int, when it does I generally find that they come up when the key isn’t a lookup value so much as an identifier that will probably be located via enumeration anyway, in which case I generally just use an ArrayList.

    2. In those cases will default(V) or the Nullable designs address the issue?

    When it comes to value types, I think assuming default(V) is safe is almost always a bad idea. Just like using DateTime.MinDate as a effective null sometimes bites back, using 0 could as well. And when you consider the situations where a "bad" value of -1 may be preferable becuase 0 – int.MaxValue is the acceptable range, the complexity becomes too great. Nullable would probably help, to some degree, but the code to ensure a key exists before accessing it should be pretty close to the same as the code to ensure that the Nullable reference isn’t null. I personally think

    Dictionary<string,int> dict = //get dictionary;

    if (dict.ContainsKey("Key"))

    {

    //retrieve value

    //do set

    }

    is clearer than

    Dictionary<string,Nullable<int>> dict = //get dictionary;

    Nullable<int> tempNullableInt = dict["Key"];

    if (tempNullableInt.HasValue)

    {

    //do set

    }

    If you consider that cases occur when the value may be genuinely null(be it a reference type or a Nullable<T> type), you still have to check that the key is contained unless the dictionary throws an exception. Its an edge case, granted, but its one the current Hashtable implementation forces you to consider.

    3. Do you like the idea of the indexer throwing a KeyNotFoundException?

    I do personally. However, as with all solutions, it isn’t going to suit every possible problem. I think that a Dictionary without a key should be an exceptional state in most situations.

  9. Stefan Rusek says:

    Exceptions are thrown when a method cannot achieve its specified goal. In this case the goal is to retrieve a member of a collection. If the member is not there then the method cannot acheive its goal and an exception must be called. if the programmer wants to check to see if the entry exists there is a method for that. The second way has two advantages, 1) it is more intuitive, and 2) it allows you to discriminate between from whan a key doesn’t exist and when you’ve stored null in a key.

    Hashtable h;



    myval = h[key];

    if (myval != null)

    // do stuff with myval

    else

    // key doesn’t exist or null is stored

    this is better written as

    Hashtable h;



    if (h.ContainsKey(key))

    // do stuff with myval

    else

    // key doesn’t exist

  10. Ricky Datta says:

    It should throw an exception.

  11. Paul Wilson says:

    Throw an exception as long as I can check containment separately first.

  12. Paul Tyng says:

    I don’t mind an exception as long as its only thrown from the indexer, not thrown into contains key and handled internally. I didn’t check the current implementation of ContainsKey, but obviously it would be better if its not:

    try

    …hit indexer…

    catch

    return false

    return true

    I guess my main point is that there needs to be a way to avoid the exception (as others have stated, just adding another voice)

  13. I prefer to check with .Contains(K) before calling so I vote to throw an exception when missing.

  14. In our code we use Hashtable with Int32 values quite heavily. We’re running on .NET 1.1 and want to avoid boxing, so we have developed our own hash table implementation for that. It has a property "MissingKeyValue", allowing us to specify which value is returned in case of missing key access. (Usually it’s either 0 or Int32.MaxValue.)

    I don’t have a good idea how Nullable would work, but I guess that would suit us just fine. The exception throwing is not desirable – if only for the reason that two hash table lookups (one for ContainsKey() and another to retrieve the value) would often be needed to retrieve the value. And we use hash tables in performance-critical code a lot.

  15. Johan Normén says:

    Throw one exception. It’s so obvious. Then it’s up to the programmer to do what ever he/she wants to, if the programmer build a class with a method that access this hash table the programmer can then return a null object if he/she wants to, or return the thrown exception.

    Eg.



    catch(KeyValueNullException exception)

    {

    return ( new NullKey() );

    }

    or.



    catch(KeyValueNullException exception)

    {

    return ( exception );

    }

    JN

  16. Dan says:

    Why not allow us to specify the default value in the constructor. If the default value is not specified in the constructor, then throw the exception

  17. I don’t like the fact that accessing an element in the Dictionary will require two hashtable lookups. The first for checking to make sure to key exists and the second for actually getting the value. It seems that using Nullable<T> for value types would satisfy everyone and have the big added benefit of not breaking existing code when you migrate from Hashtable to Dictionary.

  18. 1. Quite often. Since there’s no replacement for the string/object model of the hashtable, we usually decide to take the unboxing performance hit.

    2. Nullable. The default value is too bug prone.

    Question: Could those default values be changed for a particular AppDomain or something of the kind? This would mean that if I use a mock-null value for my integers for database interaction (for instance -999), I would not have to use a constant, but rather just use default(int). No component would be harmed by this, since they all treat it as default(int).

    3. Yes. It might be a bit of a problem to get used to this, but I can tell you that I, for one, never assume an indexer does not throw an exception if the key does not exist and always check using the Contains method, if such a method exists. Getting a KeyNotFoundException in a long, complicated code statement, helps understand where the problem comes from more easily than a NullRerefenceException for using the value anyway…

  19. From my use, I would prefer the exception NOT be thrown, and ContainsKey be used for cases where a default(V) would be inconclusive.

  20. LateNightCoder says:

    1. Typically use strings

    2. Nullable

    3. Definitely like exceptions, as long as ContainsKey can still be used

  21. Erik Sargent says:

    After reading the question, I had my opinions, but read everyone else’s first. Now I am more convienced than ever that throwing the exception is the best idea. All the objections against it are laziness and "it’s always been this way."

    I commend the MS team for making the right choice so that ambiguity between a value that doesn’t actually exist and a value that exists but is null can not be confused. If I were writing software for a nuclear power plant or a spaceship I would sure like to be guaranteed that when code was executing I knew the difference between those two situations.

    There are a lot of things that used to happen because memory was expensive and CPUs were slow. Software engineering was also very immature as a discipline. But computers are cheap and fast now, so lets not continue to make the same mistakes we made in the past at fundamental levels such as this. There were a lot of people who used to argue against VMs and managed code in the past too.

  22. Luc Cluitmans says:

    I use a template based solution (using CodeSmith) for deriving classes from NamedObjectCollectionBase that expose an indexer. I have set up an boolean parameter to the template to generate an indexer that, in the case where an item is not found, either returns null, or throws an exception.

    I have noticed that I use both options a lot. Sometimes is one solution better or more natural, sometimes the other. There is no single good solution for all applications. The throwing option gets used somewhat more than the null option, btw.

    What I end up with most however is a mixed model: the indexer-getter throws an exception when the item is not found, but the generated collection also has a method (called ‘Find’) that behaves just like the getter, except that it returns null when the item is not found.

    There is another related issue. What to do when setting an item to null? I have found it very useful to handle that case as equivalent to removing the key from the set, though I see that will not always be a valid approach.

    And now for your questions:

    1) Very rarely. In fact, I can’t remember ever having used a value type as value. Using them as keys is much more common.

    2) dunno – as I said I didn’t encounter the situation. However, my gut feeling says that returning ‘0’ from an integer storing table when the key is not found is a Very Bad Idea.

    3) As I said above, that is what that template (that I use a lot) generates for me in about 80% of the cases I use it (well, ok, it throws an IndexOutOfRangeException). However, in 75% of those 80%, I also have a method that acts similar as the getter, but it returns null instead of throwing an excption.

    One more last thought: what are the implications of providing multiple hashtable-like classes, different in their semantics of handling missing values and inserting nulls? I can imagine that having different classes that have similar, but different, behaviours is not the best solution, but it just might be better than the alternative: only having a ‘standard’ class that does just not exactly do what you want from it.

  23. Nicholas Allen says:

    I scanned a project I’m working on to get some data. This is a Java project as I don’t have C# generics yet.

    1. There were 163 maps for which K and V types were known. 8 maps had a value type for V (or boxed a value type). I wouldn’t expect this to be significantly different for a C# version of the same task- most of the Vs were complex objects and unlikely to be replaced with value types.

    2. Of the 8 maps:

    5 were using V as a count, default would be correct

    1 was using V as a memory address, null or default are both correct although I’d prefer default

    1 was using V as flags, neither null nor default is correct but I’d use null if I had to

    1 was using V as an index, neither null nor default is correct but I’d use null if I had to

    3. I’d really hate to get KeyNotFoundExceptions. In almost all cases the code is designed so that I don’t need to distinguish an empty entry from no entry. That’s intentional because it means I only have to reason about two possible outcomes rather than three at each access. Actually, it looks like for about 90% of the accesses, the default value is sufficient so that no reasoning needs to be done right away (in the same method as the access) about the outcome. Most of the remaining cases are to automatically create an entry if one isn’t present.

    The second reason why I don’t like the exception is because checking before retrieval has different semantics than retrieval with default in a concurrent setting. In some situations I can interleave readers and writers as long as the collection is internally synchronized. If I had to check before retrieving, I would always have to externally synchronize the collection when reading.

    The third reason why I don’t like the exception is because I don’t see a way to access the collection with equivalent performance to a default or null value. I either take a hit in the successful case by checking before retrieval, or I take a hit in the not successful case by not checking and getting an exception. Checking a potentially null value afterwards is much lighter than ContainsKey. I have hot code where both cases are used. I would prefer not to switch between the two access patterns. I would very prefer not to have to profile each use and determine what access pattern is preferred.

    And the final reason why I don’t like the exception is because it would significantly increase the risk of introducing bugs during porting. Generification of code was a relatively safe process, it probably found more existing bugs than caused new bugs. Changing the error handling semantics of a common operation would be a compelling reason to not port existing, well tested code.

  24. S N says:

    Hi,

    The code becomes ugly when we wrap it up with try catch. If the code wants to throw any exceptions for not finding a key in the hashtable, then they can change the access place with

    if (!hashtable.ConstainsKey(key))

    throw new Exception();

    object value = hashtable[key];

    This above piece of code is more readable, maintainable and performanc friendly then a collection throwing an exception.

    In general, if the collection is created and consumed by the same module then there is no problem at all.

    But when the collection is created by one component and consumed by another component, then the code becomes messy.

    That is, if the consumer is already handling this case by throwing an exception or executting fault-case, it is going to add a catch block which is going to make things worse. Creation of multiple exceptions. Code is messy and unmanageable.

    Note: Since Collections provide ContainsKey function, the collections implementor should treat the user with some respect. In other words, a developed who doesn’t know existence of this function or feel lazy to use this function when appropriate can’t develop a good product and he/she would always do round-about way ending with more bugs and error. Rather than catering a lazy developer to add more bugs in their system, it would be wise to encourage them write more appropriately.

  25. Addy Santo says:

    At first glance throwing exceptions is the more elegant solution in my mind, however it does go against the cardinal rule of exception handling, which is "never use exception handling for application flow control". And that is exactly how 99% of the "mort" type developers will use it. And then they will complain that it is slow 😉

  26. Gia says:

    1. Often use value types

    2. Nullable workaround is fine when dictionary is for internal use in my component. However if I return IDictionary generic to callers I would not like to impose my workaround semantics on them. And default(V) just doesn’t work for ints.

    3. Looking from performance nut perspective, both solutions suck A. throwing exception, and B. The if(lookup()) get() pattern that java people are accustomed to. Maybe add a lookup method that returns struct containing value and found flag? This would not be a most pleasant API, but will satisfy most demanding scenarios, mentioned nuts, while more casual scenarios would use if(lookup()) get() pattern.

  27. Nicole Calinoiu says:

    1. Often enough that any design that strongly favours reference types would be very inconvenient.

    2. Default(V) would render the class essentially useless for value types. Nullable types would be a workable approach, but could get a bit kludgy for the consumer.

    3. Wonderful as long as there’s some less costly mechanism (e.g.: Contains method) for verifying whether any given K value is present.

    BTW, are you planning to seal the class? If not, it shouldn’t exactly be a terrible burden for folks to modify the behaviour of the indexer to better suit their particular needs.

  28. ToddM says:

    1. 20%

    2. Yes

    3. No — exceptions can be too slow. Or, why not do something like Type.GetType() and allow me to specify if I want null or an exception. But make null/default(V) be the default.

  29. Ed Ball says:

    It should throw an exception, but isn’t calling ContainsKey before using the indexer a bit inefficient? Perhaps a non-throwing indexer-like method would be useful, e.g., bool TryKey(K key, out V value)? Or perhaps a new method V GetValue(K key) returns the "default" rather than throwing an exception?

  30. I like the idea of having a non-exception-throwing method, if the indexer does indeed need to throw exceptions. I wouldn’t like to see exceptions being thrown at all.

    If someone wants a nullable type, then someone can pass that as the type parameter (so just return default()).

  31. I can agree, that a non-exception throwing method will be useful, if the item is not found, a "nullable" object of some kind should be returned. But the indexer should throw an exception.

  32. Jon Dowell says:

    Given that both patterns (returns null & throws exception) are reasonable and neither is really clearly better, could we have two dictionary types?

    Dictionary< K, V > (throws exception if not found, since V can be value type)

    DictionaryRef< K, V: ref-type > (returns null, since that’s always valid for ref types)

    or maybe pass the constructor a parameter:

    ValueNotFoundBehavior { ReturnDefault, ThrowsException }? Of course, then you have to decide which should be the default…

  33. Jesse Ezell says:

    I agree with Jon Dowell, except, no need for two different template classes, just use the template constraints to change the behavior when applied to a value type.

  34. Daniel O'Connell says:

    I think allowing both by a constructor option or multiple types is going to be troublesome. If you add the option, you end up with too much complexity. It sounds good, but you are ignoring that client code is going to start having to check that option. In many cases the dictionary is not going to be intialized by the code using it. I for one don’t want to be checking every dictionary to figure out what is going to happen. Nor do I want to write trycatch blocks around every dictionary access I didn’t generate because I don’t know ahead of time how that dictionary was generated. Adding it as an option solves nothing except appeasing lazy developers. It really should be one or the other.

    There is also a matter of expectation of implementation. The question to me that matters most is what we should expect in the usual IDictionary<K,V> implementation(Dictionary or otherwise). The standard set forth by Dictionary will probably become the de facto expectation of all implementations of IDictionary<K,V>. Allowing the option in one class not only suggets that you should allow it in all, but that it should be expressed in the interface itself. A clean expectatoin of behavior should help here, although I do think that IDictionary<K,V> should address the issue.

  35. Using ContainsKey before the indexer is indeed inefficient, since it effectively requires two lookups. In fact, last time I measured, this was more expensive than using the old Hashtable and relying on the indexer returning null.

    Fortunately it seems they heard the feedback, bacause in the March tech preview there *is* a non-throwing lookup method called TryGetValue().

  36. I agree with most others else here: avoid the exception. Exceptions are extremely slow in .NET, and if you force people to use ContainsKey, they will have two lookups, defeating part of the purpose of the dictionary (forcing people to write their own). So this one is a no-brainer — you have no other option but to return a unique not-found value. Since reference types will be used more often, it will not cause a problem.

  37. Ken Cowan says:

    In the past week, I’ve been working with the WMI CIM repository. System.Management has a bunch of collections, like QualifierDataCollection, that throw ManagementException if the qualifier is not present.

    It’s really painful.

    This is a slightly different situation since the collection is over an external data source rather than one the programmer has created explicitly. OTOH, the model you set for Dictionary<K,V> will set the tone for custom collections like in System.Management.

    Part of the pain is some inconsistency on which qualifiers are applied to which types. I started with lots of very local try blocks, which quickly got messy. I’m not using helper functions that eat the exception and return null.

    1) Rarely.

    2) I’m not familiar with Nullable

    3) Please don’t throw exceptions

    KC

  38. AT says:

    3. AFAIK, compiler does some tricky optimizations for value types to reduce memory footprint and boxing while storing. But there is nothing wrong with returning boxed (read NullAble) types ? Cost of exceptions and backward compatibility are too high.

    I wish IDictionary was able return null for classes and some tricky things for value types.

    It must be given to developer to chose to throw or get null.

    Attempts to avoid exceptions by using Contains in addition to doing double lookup for same value also may cause synchronization errors. (BTW, there is multiple readers sync for methods, while usual developers will use single-reader "lock" for double checking = big performance hit)

    To avoid double lookup while preserving older versions compatibility single method with correct locking must be given for this – GetItemValueOrDefault(key, defaultValue)

    Calling GetItemValueOrDefault("myKey",null) give previous result (as HashTable), while new cool features like a GetItemValueOrDefault("config.maxMemory",Int32.MaxValue) will be given ;o)

    Currently this can be done using TryGetValue – but will be nice to not use "out" params.

    There is one-time burden for developers to understand how to use, but there is unlimited runtime benefits.

  39. I personally would like to see the indexer throw an exception if the key was not in the dictionary. I would also like to have a method that would look up a value in the dictionary and return a specified default if it was not found. e.g.:

    public V FindValue(K key, V default);

  40. dhananjay singh says:

    I agree with the idea of throwing Exception, but it is going to make coad ugly.

    Nullable is prity good but default value for value type is bad idea.

  41. Yuriy says:

    1. Exceptions are slow, so not suitable for ordinary flow. Or how do you suggest writting "if not found load and add else use existing". I do not like to check for existance and ask for value separately

    2. No matter about indexer, but in this case it is necessary to have another method returning default valueor nullable

  42. Ian Horwill says:

    Lot of feedback – hot issue!

    1) Not often (not at all on current project); however, as some folk have pointed out you might actually want to store null in the dictionary so the distinction between value and reference types is maybe not so great – using a default value in either case ain’t great.

    2) They would, but feels clunky.

    3) I think in general it’s the correct design decision but in cases where you don’t necessarily know what’s in the dictionary (e.g. when using it as a cache) I’d like to avoid having to always call ContainsKey or wrap the indexer in a try block. How about another method that return the result in an out param and a true/false indicator as the method result (or vice-versa). I think I’ve read elsewhere on your blog about the evils of "give it to ’em both ways" but in cases where this doesn’t cause confusion I think it could be considered.

  43. Johan Normén says:

    "Exceptions are slow, so not suitable for ordinary flow. "

    Slow? for what? It’s not that slow everybody think. And Eceptions are thrown when something is not ok. And I think it’s not OK to try a get for a key that doesn’t exist, without telling the code wooo you did something wrong here. As I told earlier if you want a null value then handle the exception and then return a nullobject of somekind within your code. And stop talking performance issues regarding Exceptions, they are not that slow.

    JN

  44. Johan Normén says:

    I have changed my mind… 🙂

    default(object) == null

    default(int) == Exceptiion

    default(bool) == Exception

    default(string) == null

    When you use Value types I think Exception is the best choice. And when you use

    Ref types you return null.

    Because it’s not a good idea to change how Hashtables work now. (in .Net or other languages)

    The programmer must learn how to code, and that means they must know what they are doing. If you use an valuetype, do a ContainsKey() first, then get the value and no Exception are thrown. But if you make bad code and try to access a key that does not exist throw one exception. As for reftypes return null because it can be handle as null. I know you can use the null generic implementation as well for valuetypes. Bur it doesn’t feel right.

    JN

  45. Geert Baeyaert says:

    I don’t like the use of a sentinel value to indicate that the item does not exist. I much prefer throwing an exception.

    As other people have already suggested though, this does require an extra hashtable lookup. However, I feel there is a reasonable solution for this.

    First let me state that I am no performance wonk, and that I have not measured any of this, so this may not quite work out as planned.

    But I’ll go ahead anyway.

    I think a large percentage of the uses of ContainsKey are as follows.

    if (ht.ContainsKey(k))

    {

    object value = ht[k];

    // …

    }

    This means, a key is checked for existence, and immediately after, the dictionary item for that same key is retrieved.

    This could easily be optimised in the hashtable, by storing the dictionary item that was last checked with ContainsKey in a private field in the hashtable. This would only require one extra field in the hashtable. The impact of the extra field on the total size of the hashtable would only be minimal.

    In the indexer, this private field could be checked first. Only if it does not contain the requested dictionary item, does the hashtable perform a lookup.

    This would, I feel, cover a large percentage of the uses of the ContainsKey method, so it should be worth the associated overhead.

    I did check the implementation of HashTable with ILDASM, because I suspected this optimisation might already be in place, but I didn’t see it.

    So, is there any reason ContainsKey was not implemented this way? Are there problems with what I’ve suggested that I don’t (yet) see?

    Geert

  46. Frans Bouma says:

    As the new dictionary class is new, I don’t mind a change of behaviour. I just wonder if the new dictionary class uses hashing to fast-find a value using the key, or that it uses a linear search (like SortedList). Often hashtables are used to fast-find objects back based on a given value. If Dictionary uses a linear search, it’s a useless object.

  47. Personally I’d have thought it was "better" to throw an exception by default, given that ContainsKey exists. However, judging by some of the comments above plenty of people are using this in either "hot" code (where they wouldn’t want the performance hit of the double lookup) or multi-threaded scenarios where you’d have to externally synchronise the ContainsKey and indexer operation. What about adding a TryLookup (c.f. TryParse) method that doesn’t throw (either uses Nullable or takes a default value) and implementing the indexer in terms of this, but having it throw if the sentinel value is returned. Then maybe, just maybe, everyone would be happy (other than anyone complaining about yet another method on the interface, that is 😉

  48. ToddM says:

    I’d just like to suggest to everyone that is saying exceptions are acceptable because "they are not too slow" to go ahead and write a small application to time some thrown exceptions. For example, write some code using a hashtable with 1.1, and, when a lookup returns null, throw an exception. Populate the hashtable and start looking for random kets — some of which you know not to be in the hashtable. You’ll see that the version which throws exceptions is substantially slower, and, in my opinion, less readable. Some of us need to write extremely high-performing C# code, and, if you do this exercise, you’ll see that exceptions are EXTREMELY slow.

    Also, many of us use Hashtables/Dictionaries as caches — so a key not being in the cache is not ‘exceptional’ — it is expected. (That’s why I feel the exception code is less readable.) While I would never expect to be able to reference past the end of an array (a valid use of exceptions), I certainly do expect to have lookup in caches fail. That failure should not be an exception in these cases.

    I can always write a wrapper that makes failed lookup throw exceptions — and, perhaps, that’s the way to go — a generic wrapper. And while technically, you can do the opposite — write a wrapper class around the version which throws an exception and returns nll/default(V) — you cannot make such a class AND MAINTAIN PERFORMANCE. The DictionaryWithExceptions<Dictionary<K,V>> wrapper makes more sense, because anyone who would care to use it already made the decision that performance doesn’t matter for this block of code.

    I agree with Daniel O’Connell that having the exception be optional via the Dictionary constructor is a bad idea, for the excellent reasons stated.

  49. My takeouts on this, including an examination of how Dictionary might implement the Nullable<T> scenario and how the code for null return semantics when available versus throwing when not available is *dirty*

    http://weblogs.asp.net/justin_rogers/archive/2004/04/27/120976.aspx

  50. CLP says:

    I’ll vote "aye" for Dan’s earlier suggestion:

    Why not allow us to specify the default value in the constructor. If the default value is not specified in the constructor, then throw the exception (4/26/2004 12:54 PM | Dan)

  51. Sigh says:

    Brrrrr…

    Will there be a version of .NET Reflector that supports Whidbey?

    Looking at this stuff is absolutely no fun right now.

  52. Daniel O'Connell says:

    Alot of these issues are interesting. The usages of Dictionary are going to make this difficult. As some have suggested that they used Dictionary for caches, others for other purposes. I generally use them as dictionarys, if a key isn’t there, then that key is invalid. TryGetValue helps with that, however it makes me wonder about the actual intended target of a Dictionary. While people argue that in one situation a missing key is expected, they are not bothering to argue that in others it is exceptional. It makes it pretty clear that one collection isn’t good enough to suit all the possible needs. The definition "expected" and "exceptional" can’t be pinned with a generic dictionary. And niether one is right. You’ll end up with a class that is "wrong" by the definition of some situations. Perhaps the multiple class system would work(Just please, not DictionaryWithExceptions or some other lame name, ;)).

    One thing, however, is whether TryGetValue should be part of IDictionary<K,V> or not. As I’ve said, if Dictionary throws an exception, that behaviour will be expected in IDictionary, if it should be or noti s another matter, but I suspect it will.

  53. Jon Dowell says:

    I think I’ve been convinced that having an option for NotFound == null v. KeyNotFound exception is a bad thing.

    One thing that might help this discussion is performance data. In particular:

    How much of a penalty is incurred by using Nullable<V> versus V directly?

    How does this penalty compare to the time Dictionary takes to do hash lookups?

    If the penalty for using Nullable<valuetype> is smaller than the has lookup time, then having one Dictionary< K, V : any nullable type > (with NotFound == null) is probably the right answer:

    It would have the same KeyNotFound behavior as the existing hashtable, making it more of a drop in replacement

    It would avoid exception handling performance penalties for NotFound case

    It would likely be faster than Hashtable even for value types that have to use Nullable< T >

    If it’s truly desirable to have a ValueDictionary< K, V: value> for performance reasons (and I’d think Dictionary< K, Nullable<V> > would be faster than Hashtable boxing in any case) then one be put into Collections.Specialized.

    Also, a wrapper class for KeyNotFound Exception style would be better than a wrapper for NotFound == null; in the first case, only Exception-using coders pay the perf penalty, and in the second everyone does.

    So, can anyone with VS2005 comment on Nullable<T> performance?

  54. oasis says:

    Are there plans to redo the entire Collections Framework in the Generic version and bring it on par with the corresponding Java classes? There are some pretty basic collections missing… most notably LinkedList.

  55. Royston Shufflebotham says:

    I use Hashtables and Dictionaries a lot, and performance and atomicity/thread-safety are extremely important to me as these classes are very frequently used in caching.

    Whilst feeling ‘cleaner’, throwing an exception on a hash/cache lookup miss would be an absolute performance disaster. Exception throwing and catching *is* expensive, and so should be kept, where possible, to ‘exception’al situations. A cache miss is undesirable but not in the same league as most other exception types.

    And the option of having to do a ContainsKey first to avoid the exception is bad for two reasons:

    1) It’s unnecessarily slow – I’d be doing all lookups twice to check my key was there and then retrieve it.

    2) It’s not an atomic operation and so we’d need to introduce much more explicit locking in order to do safe reads without danger of hitting the exceptions; it’s *so* nice being able to do an atomic "If there’s a value for this key please give it to me" operation without exceptions and having to lock.

    So, I’d advocate the core class *not* throwing exceptions for misses.

    BUT I think I’d also provide an extra class (which wrappers the core class) which threw exceptions instead of returning null (or equivalent) for those developers who (perhaps rightly in their own applications) value tidiness over speed. You can always put exception handling on top of fast code, but you can’t necessarily make exception-loaded code go faster.

  56. After the messy code I’ve ended up writing while dealing with collections of objects in VB due to it throwing errors, personally I’d vote for returning null. I’ve often used collections as cache type objects so I prefer my code to look like:

    X=dic[MyKey];

    if (X==null)

    { X=new MyThing;

    dic.Add(MyKey,MyThing);

    }

    X.DoStuff();

    Rather than writing silly wrapper functions whose sole purpose in life is to catch exceptions. Using ContainsKey stinks because potentially it means two key lookups.

    I quite like the idea of passing the value to be returned for no match to the constructor otherwise throwing an exception.

  57. Vikas Apte says:

    Its really a difficult problem to solve. When you are designing API for thousand of users(programmers). Having said that here is what I think.

    3. No exception

    How bad is it for Dictionary<K, V> to not find a key?. I believe for Dictionary it does not matter, What matters is that how it reports the caller that it did not find a key.

    Now is it really bad or serious and it warrants an exception, I don’t think so.

    Sometimes its perfectly valid for an implementation to not find a key and do something about it.

    If you get an exception for this and then do something about it in your try..catch then this really spoils the readability of the code.

    In some cases one would throw an application exception indicating a key value not found, likely with better error message.

    And exceptions incurs performance hits, programmer not handling them correctly, wrapping them incorrectly.

    e.g

    Cleaner more readable…

    value = d[key];

    if (value == null) {

    // Ok, not found

    d[key, <some value>];

    }

    and

    if (value == null) {

    throw(new OrderNotFoundException("Order id: "+key+" not found"));

    }

    Not very readable:

    try {

    value = d[key];

    } catch(KeyNotFoundException kntfe) {

    // Does not help reader as this is expected or or unsual case. So this exception unnecessary. Here user is forced to handle perfectly valid condition via exception handling

    // Ok, not found

    d[key, <some value>];

    }

    and

    try {

    value = d[key];

    } catch(KeyNotFoundException kntfe) {

    throw(new OrderNotFoundException("Order id: "+key+" not found", kntfe));

    }

  58. Daniel O'Connell says:

    Upon quick analysis…

    I don’t know how much I’m allowed to speculate, but in simple tests using the existing dictionary and Nullable<int> for the V parameter have been disappointing. Adding and retrievial of a large number of nodes, right now, takes noticably longer for Nullable<int> than it does for a plain int.

    Here are the issues, as I see it, with Nullable<T>.

    1) It is a structure, so when you pass the int in you result in two copies, copying the int into the Nullable structure and then copying the Nullable structure into the Add method. The same issue exists when you access the table. To get at the stored value you have to perform a copy of Nullable and then a conversion or a copy of the value in nullable into the variable which you wish to place the data. End result of using Nullable is atleast 4 copy operations(probably not worse than boxing, but still an issue) while the non-nullable int should only be two. Using Nullable also causes a guarenteed check against the null(either you do it yourself or the conversion does) and I would imagine if that check fails you still receive an exception(just this time NullReferenceException).

    2) It doesn’t appear to implicitly convert back to T, as a result code using Nullable<T> either has to use an explicit conversion or do a null check themselves.

    3) Same problem as all default options, people may decide to try to use the data without bothering to check the HasValue member. Now, if get_Value checks HasValue and throws an exception when it is false, then we have another exception to deal with.

  59. McR says:

    I’m not sure about default(V) or Nullable.

    I actually do not really know about the usage and overhead of Nullable. How would I use it? Does Nullable implement implicit cast?

    Another (maybe strange) idea is to allow ref and out params for indexers. Then provide an overloaded indexer.

    V this[K key] { … }

    V this[K key, out keyValid] { … }

    Then you could use that overloaded which would best fit your needs.

    Moreover there is only one hashtable lookup done. => performance

    bool valid;

    Dictionary<string, string> dic = new Dictionary<string, string>();

    dic.Add("key1", "data1");

    dic.Add("key2", "data2");

    // we know that key exists:

    string s1 = dic["key1"];

    string s2 = dic["invalidKey", out valid];

    if (valid)

    // use s2, otherwise (but also possible if valid) it has default(V).

    What about this idea? Why don’t indexers allow ref and out params?

    How does Nullable exactly work?

    Can I write

    Nullable<int> val = myIntDict["key"];

    int i = val; // with implicit cast

    or do I have to write something like

    Nullable<int> val = myIntDict["key"];

    int i = val.Value; // and use HasValue to check for valid value

    I also like the idea of caching the last ContainsKey lookup.

  60. Frank Hileman says:

    Johan Normén, and anyone else who thinks .NET exceptions are not slow: I have seen a single .NET exception throw and catch take as long as 15 milliseconds. This is not slow? I don’t know how they can possibly be so slow, but they are, and we must avoid them as much as possible in all APIs. Anyone who thinks they are not slow simply needs to work with an API that throws them frequently (for example, System.Drawing), in some performance-critical code. You will quickly change your opinion.

  61. Daniel O'Connell says:

    Exceptions aren’t *THAT* slow most of the time. The aren’t as fast as most constructs but considering that, unless you are writing a very bad implementation, in most cases you shouldn’t be using unknown keys in performance critical situations. A fundamental rule in performance critcial systmes is that if something is taking up alot of time with no value, you are doing something wrong. I would suggest to all of you concerned with exceptions to try designing your algorithms so that they are a bit more than guesswork. I would also suggest that if you are missing the dictionary most of the time, then it really is your own fault. Also, the TryGetValue solution certainly gets around performance issues and, IMHO, creates clearer code(IIOW, something that matters all of the time, unlike bare metal performance, which everyone seems so worried about, which is irrelevent almost every time).

  62. Om says:

    if you don’t want the overcost of Exception, you can precede your call with a Contains

    1/ Often

    2/ fails soon is a good option in that sense Exception is better,

    3/ if u use Nullable use it on the return value only. Nullable SHOULDN’T be used as a value adapter in any collections unless the overcost of one extra bool per entry is shot definitively

  63. I’ve covered some performance based on the tech preview. Turns out Nullable types are faster than the integral types in the tests I ran. Must be something I’m doing wrong, since it shouldn’t be faster to use Nullable types. I also cover the usage of the TryGetValue method for those interested in not incurring the exception or the dual look-up of ContainsKey + an indexer.

    http://weblogs.asp.net/justin_rogers/archive/2004/04/30/124355.aspx

  64. cody says:

    1. How commonly do you use a value type for V?

    It happens.

    2. In those cases will default(V) or the Nullable designs address the issue?

    No! Let the user pass a value to the ctor of Hashtable and return that value in case the key was not found. Make this value a public property so one can do:

    int val = table[key];

    if (val==table.InvalidValue)

    ..

    3 .Do you like the idea of the indexer throwing a KeyNotFoundException

    We have the choice: We can use Hashtable if we do not know wheather the value is contained in the list or not and we can use Dictionary if we assume that the value is in the table. if a key is not found this is an errorcondition in which it is best to throw an exception, otherwise it could be dangerous if the programmer forgets to test if value!=InvalidValue.

    Nullable types are an interesting idea but please do not use them in hashtables since this would affect the performance and it is not needed.

    In a Datatable maybe Nullables are a great idea.

  65. Ross says:

    Simple solution: Overload the indexer please!

    table[key] … throws exception if not found

    table[key, default] … returns the given default if not found

    I know we can do this in VB, so i assume also c# can support an overloaded indexer?

    That, combined with the TryGetValue() will be perfect for all possible scenarios 🙂

  66. I would prefer default(V) to be returned as opposed to an exception.