Caching Dispatch Identifiers Is A Bad Idea

About two years ago I wrote a bit about when it was legal to cache a dispatch identifier so that you didn’t have to look it up a second time.

I was reminded of this today because the scripting sustaining engineering team was working on a bug that involved a customer incorrectly caching a dispatch identifier. I’d like to take this opportunity to expand a bit on the rules I mentioned two years ago.

Suppose you’ve got two script engines that are totally identical – same script code added in the same order, similarly the same named items – running on two different threads. I’m sure you can see why that would be useful; think of an ASP server serving up two pages on two threads, for example. The customer needed to call a particular script function by invoking each script engine’s global dispatch. They were getting the dispatch identifier once, caching it and using it for calling both engines. Depending on what build of the script engine they were using, that either worked fine or died horribly.

Unfortunately it turns out that dying horribly is both legal and expected. A given dispatch identifier can only be cached in the context of a particular dispatch object. Two identical dispatch objects with identical functions are allowed to give different dispatch identifiers for a given method!

The script engines have changed their algorithm for how dispatch identifiers are generated several times over the years. In some builds of the script engines, dispatch identifiers are generated sequentially, so that the first function added is one, the second is two, and so on. In those builds, two identical engines that have had the same functions added in the same order will have identical dispatch identifiers for a given name. In other builds of the script engines, the dispatch identifiers are globally unique, so that two engines with two functions will always have different dispatch identifiers even if everything is identical otherwise.

Unfortunately the customer built and tested their code against the former kind of engine and then upgraded to a more recent version using the latter algorithm, which broke them. I regret the problem — I’m the guy who screwed around with the algorithm — but still, you’ve got to follow the rules. Calling

GetIdsOfNames is almost always very fast compared to Invoke, so please don’t cache dispatch identifiers for performance reasons. We don’t do so in the script engines, as I mentioned two years ago, because the trouble you get into when you do it wrong is not worth the time you save.

While I’m on the subject, a quick word of advice for implementers too: If you are implementing

IDispatch on an expando object it is very tempting to say “I’ll use a pointer to my function object as the dispatch identifier for that function”. Just cast the pointer to int and you’ve guaranteed uniqueness and have an easy way to map back to the function when invoked, right? Please do not do this. First off, it’s not portable to 64 bit machines. Second, it’s dangerous – a hostile or broken caller can make you invoke pointers to garbage. Third, on processes with 3GB of user-accessible memory, pointers can be cast to negative integers and negative integers are illegal dispatch ids. Come up with some safer way to guarantee uniqueness of your dispatch ids, such as maintaining a global counter and lookup table.

Comments (3)

  1. Stewart Tootill says:

    On that last point, you can’t even guarantee function pointers are unique even on good old 2GB 32bit. Depending on the functionality, you might end up with two functions which look the same to the linker, and with COMDAT folding enabled their pointers, and therefore IDs, would be the same.

  2. EricLippert says:

    Well, sure, that’s a good point. But that scenario is highly unlikely in an expando object situation. If you have functions that are known at link time then you can specify their dispatch identifiers at link time.

    In the expando situation though, like the script engines have running in IE, you can end up with new functions added to a dispatch object at runtime, and need some way to determine what the dispatch identifier is. In the script engines we had a heap-allocated pointer to a block of information representing the function object, and at one point we were using the pointer as a dispatch identifier, with bad results on 3GB systems.

  3. Joydeep Yadav says:


    Quoting you on the subject:

    "A given dispatch identifier can only be cached in the context of a

    particular dispatch object. Two identical dispatch objects with identical

    functions are allowed to give different dispatch identifiers for a

    given method!"

    To clarify the second sentence : when you say "Two indentical dispatch

    objects with identical functions…", do you mean two different

    instances of dispatch objects with identical functions ?  Two disptach objects

    with identical functions may return different dispids – is that correct ? I’m looking at reliable ways

    of caching dispids in my C++ code without making major changes to it. As a first

    attempt tried globally caching dispids mapped to  dispid name and did get very good

    results for one of the test cases I’m considering right now ( make ~25000 GetIDsOfNames and Invoke ) – not sure if this is correct and reliable.