Performance Quiz #12 -- The Cost of a Good Hash -- Solution

Article
01/26/2007

Well once again there have been many thoughtful replies to both the original question as well as the followup with hints.

Recall the original question asked a bit of trivia: could you name 5 implementations of GetHashCode in the framework that do things you might not expect to see in a hash function? It's a little bit of a vague question and also requires a good deal of arcane knowledge of the CLR. The most interesting thing about the discussion I think is just what people considered "unusual." But as always there was a method to my madness.

So without further ado here's my list of "unusual" hash functions, there are 53 of them.

**GetHashCode Methods that Allocate**
java.lang.reflect.Method.GetHashCode	4.2
com.ms.lang.MulticastDelegate.GetHashCode	4.2
com.ms.lang.Delegate.GetHashCode	4.2
System.Uri.GetHashCode	3.8
System.Security.Policy.HashMembershipCondition.GetHashCode	3.8
java.lang.reflect.Field.GetHashCode	3.5
com.ms.wfc.ui.Region.GetHashCode	3.5
com.ms.wfc.ui.Pen.GetHashCode	3.5
com.ms.wfc.ui.Font.GetHashCode	3.5
com.ms.wfc.ui.Color.GetHashCode	3.5
com.ms.wfc.ui.Brush.GetHashCode	3.5
System.Security.Policy.NetCodeGroup.GetHashCode	3.5
System.Security.Policy.FileCodeGroup.GetHashCode	3.5
System.Security.Policy.CodeGroup.GetHashCode	3.5
System.Security.PermissionSet.GetHashCode	3.3
System.Security.NamedPermissionSet.GetHashCode	3.3
System.Configuration.SettingElement.GetHashCode	3.1
System.Diagnostics.ListenerElement.GetHashCode	3.0
System.Configuration.ConfigurationElement.GetHashCode	3.0
java.math.BigDecimal.GetHashCode	2.8
System.Web.Configuration.TagPrefixInfo.GetHashCode	2.8
System.Web.Configuration.TransformerInfo.GetHashCode	2.4
System.Web.Configuration.TagMapInfo.GetHashCode	2.4
System.Web.Configuration.ProfileGroupSettings.GetHashCode	2.4
System.Web.Configuration.CustomError.GetHashCode	2.4
System.Web.Configuration.BuildProvider.GetHashCode	2.4
System.Web.Configuration.NamespaceInfo.GetHashCode	2.1
System.Security.Policy.StrongNameMembershipCondition.GetHashCode	2.0
System.Xml.Serialization.CaseInsensitiveKeyComparer.System.Collections.IEqualityComparer.GetHashCode(System.Object)	1.9
System.Security.Policy.PublisherMembershipCondition.GetHashCode	1.7
System.Xml.Schema.KeySequence.GetHashCode	1.6
System.Data.SqlTypes.SqlString.GetHashCode	1.6
System.Data.SqlTypes.SqlDecimal.GetHashCode	1.6
java.text.CollationKey.GetHashCode	1.5
System.Security.Policy.UrlMembershipCondition.GetHashCode	1.5
System.Security.Policy.SiteMembershipCondition.GetHashCode	1.5
System.Globalization.SortKey.GetHashCode	1.5
System.Net.Cookie.GetHashCode	1.3
System.Management.ManagementBaseObject.GetHashCode	1.3
java.math.BigInteger.GetHashCode	1.0
System.Windows.Forms.TableLayoutPanelCellPosition.GetHashCode	1.0
System.Windows.Forms.DataGridViewCellStyle.GetHashCode	1.0
System.Windows.Forms.DataGridViewAdvancedBorderStyle.GetHashCode	1.0
System.Windows.Forms.DataGridView+HitTestInfo.GetHashCode	1.0
System.Web.UI.ControlCachedVary.GetHashCode	1.0
System.Web.UI.AttributeCollection.GetHashCode	1.0
System.Web.FileSecurity+DaclComparer.System.Collections.IEqualityComparer.GetHashCode(System.Object)	1.0
System.Web.Configuration.RootProfilePropertySettingsCollection.GetHashCode	1.0
System.Web.Caching.CachedVary.GetHashCode	1.0
System.Security.Cryptography.X509Certificates.X509CertificateCollection.GetHashCode	1.0
System.Security.AccessControl.GenericAce.GetHashCode	1.0
System.Drawing.FontFamily.GetHashCode	1.0
System.Configuration.ConfigurationElementCollection.GetHashCode	1.0

And you can already see why I think they are unusual: They allocate. In my opinion any "normal" hash function should be able to do its job without creating any temporary objects. Given the frequency with which things can be hashed I think that's important. That doesn't mean that I'm going to go to Red Alert about any of the above but it does mean that we should think more carefully about creating big hash tables of the above.

OK, so, great, an interesting observation. What's the point?

Well, I while I believe that the particulars of this question are not profound, the notion that you can have a cost metric applied universally to the runtime -- even one that is as simple minded as the one I just proposed -- *is* profound.

Even though my "allocation complexity" metric may be feeble it is already useful and actually it is my hope that many people find it feeble and write about how their metric is much better and by the way here it is the much better metric in text format etc. etc. Imagine if we had the methods scored in a variety of ways all readily available and pluggable into intellisense. You could have scores relating to synchronization, i/o, memory, algorithmic complexity, even measured data where appropriate. Other engineering disciplines have abundant information about their raw materials but for those of us in the software business its often either guesswork or our own hard-won data.

Let's look at what you can do with even my feeble metric. In my very first quiz I asked (in part) whether it would be better to call Write three times on a stream or create a format string and call it once. Can we answer that question with the published metric?

Here are the relevant two lines from my costs file:

System.IO.TextWriter.Write(System.String) 0
System.IO.TextWriter.Write(System.String,System.Object[]) 1.5

That's not as good as the actual measurement but, wow, that's something now isn't it? Rough guidance to give you a clue! A great "tip" that the var-args formatting option has some underlying cost.

Even these rough costs are very powerful in terms of helping you write code. Suppose you're writing a new "Foo" and you want it to have cost that is roughly the same as existing "Foo". You could go and look at how the existing "Foo" methods score which gives you some idea what costs you can afford in your new "Foo".

Importantly you must observe Rico's Rule: if you are writing a new "Foo" method and you want that method to have a cost of X (on any metric) then you must not use any methods whose cost is known to be greater than X. Just that simple rule can save you from many costly mistakes.

Finally you have some way to answer the question: Is it even remotely reasonable to use a given method in a given context. Certainly this particular metric is imperfect, maybe even feeble, but it's something. And it will only get better if we all work on it.

Performance Quiz #12 -- The Cost of a Good Hash -- Solution

Additional resources