Caching Dispatch Identifiers Is A Bad Idea

About two years ago I wrote a bit about when it was legal to cache a dispatch identifier so that you didn’t have to look it up a second time.

I was reminded of this today because the scripting sustaining engineering team was working on a bug that involved a customer incorrectly caching a dispatch identifier. I’d like to take this opportunity to expand a bit on the rules I mentioned two years ago.

Suppose you've got two script engines that are totally identical – same script code added in the same order, similarly the same named items – running on two different threads. I’m sure you can see why that would be useful; think of an ASP server serving up two pages on two threads, for example. The customer needed to call a particular script function by invoking each script engine’s global dispatch. They were getting the dispatch identifier once, caching it and using it for calling both engines. Depending on what build of the script engine they were using, that either worked fine or died horribly.

Unfortunately it turns out that dying horribly is both legal and expected. A given dispatch identifier can only be cached in the context of a particular dispatch object. Two identical dispatch objects with identical functions are allowed to give different dispatch identifiers for a given method!

The script engines have changed their algorithm for how dispatch identifiers are generated several times over the years. In some builds of the script engines, dispatch identifiers are generated sequentially, so that the first function added is one, the second is two, and so on. In those builds, two identical engines that have had the same functions added in the same order will have identical dispatch identifiers for a given name. In other builds of the script engines, the dispatch identifiers are globally unique, so that two engines with two functions will always have different dispatch identifiers even if everything is identical otherwise.

Unfortunately the customer built and tested their code against the former kind of engine and then upgraded to a more recent version using the latter algorithm, which broke them. I regret the problem -- I'm the guy who screwed around with the algorithm -- but still, you've got to follow the rules. Calling

GetIdsOfNames is almost always very fast compared to Invoke, so please don’t cache dispatch identifiers for performance reasons. We don’t do so in the script engines, as I mentioned two years ago, because the trouble you get into when you do it wrong is not worth the time you save.

While I’m on the subject, a quick word of advice for implementers too: If you are implementing

IDispatch on an expando object it is very tempting to say "I’ll use a pointer to my function object as the dispatch identifier for that function". Just cast the pointer to int and you’ve guaranteed uniqueness and have an easy way to map back to the function when invoked, right? Please do not do this. First off, it’s not portable to 64 bit machines. Second, it’s dangerous – a hostile or broken caller can make you invoke pointers to garbage. Third, on processes with 3GB of user-accessible memory, pointers can be cast to negative integers and negative integers are illegal dispatch ids. Come up with some safer way to guarantee uniqueness of your dispatch ids, such as maintaining a global counter and lookup table.