HashSet, equality and identity

I'm taking a short intermezzo from the mutability / circular reference series, to post a short public announcement-type.

You may have noticed in the sample from my last post that I used a HashSet to keep track of which of the two instances I had visited before. This is of course overkill when you have two instances, but if you have a graph, or expect a tree and don't want to overflow your stack while recursinv if there's an error, then this isĀ  ahandy mechanism: you just keep track of the elements you've already processed, and stop processing or fail if you find you're processing the same element multiple times.

Here is an important thing to note, however: do you even know how HashSet determines equality? For integer it will obviously compare the values; for strings, presumably the contents - but will it change if the user changes the culture, and is it case sensitive? What about other objects? Does it use the object identity, does it use the Equals method, does it look for an IEquatable implementation, or what?

For production code, my rule of thumb is: be clear in your codeĀ about how you want your objects compared for equality, and use the constructor that takes in an explicit IEqualityComparer<T>. Then you can pick your comparison and stick with it, and not have it change if in the future one of the elements you're tracking changes to support custom equality comparison.

Oh, and for tracking visited objects in a graph, you really want an implementation that uses Object.ReferenceEquals, unless the members you're traversing are allocated on-demand (in which case you'll always get different instances and will need something a bit more elaborate).

Enjoy!

PS: the same consideration applies to uses of Dictionary, by the way, and that type has a much better explanation of how equality is compared on keys.