Reading a contract from the other side: SHSetInstanceExplorer and SHGetInstanceExplorer


Shell extensions that create worker threads need to call the SHGetInstanceExplorer function so that Explorer will not exit while the worker thread is still running. When your worker thread finishes, you release the IUnknown that you obtained to tell the host program, “Okay, I’m done now, thanks for waiting.”

You can read this contract from the other side. Instead of thinking of yourself as the shell extension running inside a host program, think of yourself as the host program that has a shell extension running inside of it. Consider a simple program that displays the properties of a file, or at least tries to:

#include <windows.h>
#include <shellapi.h>
#include <tchar.h>

int __cdecl _tmain(int argc, TCHAR **argv)
{
  SHELLEXECUTEINFO sei = { sizeof(sei) };
  sei.fMask = SEE_MASK_FLAG_DDEWAIT | SEE_MASK_INVOKEIDLIST;
  sei.nShow = SW_SHOWNORMAL;
  sei.lpVerb = TEXT("properties");
  sei.lpFile = TEXT("C:\\Windows\\Explorer.exe");
  ShellExecuteEx(&sei);
  return 0;
}

Oh dear! When you run this program, nothing happens. Well, actually, something did happen, but the program exited too fast for you to see it. To slow things down, add the line

  MessageBox(NULL, TEXT("Waiting"), TEXT("Title"), MB_OK);

right before the return 0. Run the program again, and this time the properties dialog appears, as well as the message box. Aha, the problem is that our program is exiting while the property sheet is still active. (Now delete that call to MessageBox before something stupid happens.)

The question now is how to know when the property sheet is finished so we can exit. That’s where SHSetInstanceExplorer comes in. The name “Explorer” in the function name is really a placeholder for “the host program”; it just happens to be called “Explorer” because the function was written from the point of view of the shell extension, and the host program is nearly always Explorer.exe.

In this case, however, we are the host program, not Explorer. The SHSetInstanceExplorer lets you register a free-threaded IUnknown that shell extensions can obtain by calling SHGetInstanceExplorer. Following COM reference counting conventions, the SHGetInstanceExplorer performs an AddRef() on the IUnknown that it returns; the shell extension’s worker thread performs the corresponding Release() when it is finished.

All that is required of the IUnknown that you pass to SHSetInstanceExplorer is that it be free-threaded; in other words, that it support being called from multiple threads. This means managing the “process reference count” with interlocked functions rather than boring ++ and -- operators. Of course, in practice, you also need to tell your main program “Okay, all the shell extensions are finished; you can exit now” when the reference count drops to zero.

There are many ways to accomplish this task. Here’s one that I threw together just now. I didn’t think too hard about this class; I’m not positioning this as the best way of implementing it, or even as a particularly good one. The purpose of this article is to show the principle behind process references. Once you understand that, you are free to go ahead and solve the problem your own way. But here’s a way.

#include <shlobj.h>

class ProcessReference : public IUnknown {
public:
  STDMETHODIMP QueryInterface(REFIID riid, void **ppv)
  {
    if (riid == IID_IUnknown) {
      *ppv = static_cast<IUnknown*>(this);
      AddRef();
      return S_OK;
    }
   *ppv = NULL; return E_NOINTERFACE;
  }

  STDMETHODIMP_(ULONG) AddRef()
    { return InterlockedIncrement(&m_cRef); }

  STDMETHODIMP_(ULONG) Release()
  {
    LONG lRef = InterlockedDecrement(&m_cRef);
    if (lRef == 0) PostThreadMessage(m_dwThread, WM_NULL, 0, 0);
    return lRef;
  }

  ProcessReference()
    : m_cRef(1), m_dwThread(GetCurrentThreadId())
    { SHSetInstanceExplorer(this); }

  ~ProcessReference()
  {
    SHSetInstanceExplorer(NULL);
    Release();

    MSG msg;
    while (m_cRef && GetMessage(&msg, NULL, 0, 0)) {
      TranslateMessage(&msg);
      DispatchMessage(&msg);
    }
  }

private:
  LONG m_cRef;
  DWORD m_dwThread;
};

The idea behind this class is that the main thread (and only the main thread) creates it on the stack. When constructed, the object registers itself as the “process IUnknown“; any shell extensions that call SHGetInstanceExplorer will get a pointer to this object. When the object is destructed, it unregisters itself as the process reference (to avoid dangling references) and waits for the reference count to drop to zero. Notice that the Release method posts a dummy thread message so that the “waiting for the reference count to go to zero” message loop will wake up.

In a sense, this is backwards from the way normal COM objects work, which operate under the principle of “When the reference count drops to zero, the object is destructed.” We turn it around and code it up as “when the object is destructed, it waits for the reference count to drop to zero.” If you wanted to do it the more traditional COM way, you could have the main thread go into a wait loop and have the object’s destructor signal the main thread. I did it this way because it makes using the class very convenient.

Now that we have a process reference object, it’s a simple matter of adding it to our main thread:

int __cdecl _tmain(int argc, TCHAR **argv)
{
  ProcessReference ref;
  ...

With this modification, the program displays the property sheet and patiently waits for the property sheet to be dismissed before it exits.

Exercise: Explain how the object behaves if we initialized the reference count to zero and deleted the call to Release in the destructor.

Bonus reading: Do you know when your destructors run? Part 2.

Comments (27)
  1. meh says:

    Wow. This will come in handy. Many thanks!

  2. Neil says:

    As a bonus, if you do this then all the property dialogs opened from common dialogs will hang around when the user "exits" your program.

  3. Mathieu Garstecki says:

    Shouldn’t the m_cRef member be declared as volatile ? Without it, the variable  might be put in a register when the loop starts, and not be reloaded, isn’t it ? Am I missing something here ?

    [That would be a violation of the “as-if” rule, because GetMessage might change m_cRef. -Raymond]
  4. Alexandre Grigoriev says:

    BryanK:

    All platforms that run Windows guarantee cache coherency between processors. Also, any function that can cause thread switch (such as GetMessage) acts implicitly as memory barrier.

  5. The question was what would happen if we set the refcount to 0 and omitted the Release() call:

    (While I haven’t actually tried to run the code) it seems to me the ProcessReference object would behave perfectly fine as long as the client/shell extension remembers to call the SHGetInstanceExplorer function. If not it would hang in the message loop in the destructor (because Relase() will never be called).

  6. Yuhong Bao says:

    "(Now delete that call to MessageBox before something stupid happens.)"

    What do you mean?

  7. Yuhong Bao says:

    From MSDN:

    "Note  This function is available through Windows XP Service Pack 2 (SP2) and Windows Server 2003. It might be altered or unavailable in subsequent versions of Windows."

    Should this be removed?

    Clue if you are wondering why that was added: It was added for the same reason another Shell function was misspelled, which is out of scope of this blog entry.

  8. BryanK says:

    "volatile" doesn’t fix concurrency issues.  It only informs the compiler; it does *not* inform the CPU.

    If you’re running on SMP, and you put "volatile" there, then whichever CPU runs this code will load whichever line into its cache when the loop starts.  (Well, actually, before that — speculative execution is fun!)  Then, it may not update that line when another thread (running on a different CPU) updates the variable.  (It depends on the CPU’s memory model.)  This is why e.g. the Linux kernel doesn’t use "volatile" as a memory barrier: it’s not one.

    To make it completely safe, you have to do some kind of interlocked operation when you’re writing *and* when you’re reading m_cRef.  (Maybe not on x86, of course.  But this is C (well, C++), not x86 assembly.)  The proper way to do this is something like:

    while(InterlockedCompareExchange(&m_cRef, 0L, 0L) && GetMessage(…)) { … }

    (Suitably modified for pointer-sized integers, if needed, of course.)

    The "exchange" doesn’t actually modify the variable, but the "read" part of the exchange is still atomic with respect to other interlocked operations.

  9. Todd Greer says:

    There is some documentation on MSDN that says that "As of VS 2005, volatile now implements aquire/release semantics". However, I have heard reports that this is not true, at least for x64 builds. I haven’t verified any of this myself. Whether it is appropriate to rely on compiler-specific behavior is of course very context-dependent. It doesn’t make sense in my context.

  10. CornedBee says:

    it seems to me the ProcessReference object would behave perfectly fine as long as the client/shell extension remembers to call the SHGetInstanceExplorer function. If not it would hang in the message loop in the destructor (because Relase() will never be called).

    The example program was too simple. What if the program only shows the properties dialog under certain circumstances? When it doesn’t, no one will ever call Release() even without a buggy extension.

  11. Michael B. says:

    @CornedBee

    I’m not getting your answer to Raymond’s "exercise question". Even if release() was never called because the object hasn’t been used in the code – why would the destructor block? If m_cRef was initialized to 0, the first condition of the while-statement would prevent GetMessage() from being called in the first place – wouldn’t it?

  12. Gwyn says:

    Hi Michael B,

    after the call to Release() in the destructor, what will the value of m_cRef be?

  13. Richard says:

    Raymond wrote:

    [That would be a violation of the "as-if" rule, because GetMessage might change m_cRef. -Raymond]

    For all I know, GetMessage might start by performing some kind of atomic read with a fast bail-out, which might get inlined by whole-program optimization when statically linking to whatever provides GetMessage. The compiler might decide to not reload m_cRef in that case if it can prove it’s not changed.

    Or perhaps GetMessage doesn’t necessarily provide a memory barrier for m_cRef, meaning that an incorrect value might be read from the current CPU’s cache despite having been modified by another CPU.

    Not that I’m saying that I understand Win32 development enough to know whether the above is valid in this case (perhaps GetMessage is always dynamically-linked, and the semantics of the above loop guarantee that there’ll be a memory barrier before m_cRef is read?). But you seem to be saying that "because it’s a call to an external function, the compiler can’t tell whether it modifies m_cRef", which is not necessarily true.

  14. Medinoc says:

    Just to confirm I got it straight: ShellExecuteEx() with "properties" displays the file’s property sheet in the calling process, only in a new thread?

  15. KJK::Hyperion says:

    BryanK: never believe anything Torvalds says. It’s a GCC issue – in Visual Studio 2003 and later, accessing a "volatile" implies a barrier; in 2005 and later, barriers apply across the call stack, too. It’s all documented

    The core issue is that C and C++ don’t have a standard memory model, but Herb Sutter and others are working on it, specifically based on the assumptions Windows makes: <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2197.pdf&gt;. C++0x will include a memory model specification, too

    Technically, multi-threaded C/C++ applications "never worked", because there was no real reason they should work, it wasn’t like anyone had given it any thought. This doesn’t mean in practice multi-threading never worked, period

  16. BryanK says:

    Gwyn: The call to Release() in the destructor would also be removed.  I don’t know what the issue would be with initializing the ref-count to zero, other than "COM objects’ ref-counts can’t be zero".  Maybe it’s as simple as that?  Not sure.

    Alexandre Grigoriev:

    > All platforms that run Windows guarantee cache coherency between processors.

    Sure, today that may be true (IA64?).  Will it always be true?  Will Windows ever run on a system whose processors are separated by a network, and therefore can’t do cache coherency protocols quickly enough?

    Besides, "volatile" in general is almost always a bad idea (because it’s basically just a superstition).  See e.g. http://lkml.org/lkml/2007/8/17/187 (note that the discussion there was *exactly* this type of setup: atomic_read() with a barrier, versus atomic_read with just a volatile keyword).

  17. mikeb says:

    Dang… Henry posted a good answer to Raymond’s exercise before I got around to it.

    Here’s what Raymond had to say about initializing with a refcount of 1 vs. a refcount of 0 before:

    http://blogs.msdn.com/oldnewthing/archive/2005/09/29/475298.aspx

  18. To Michael B:

    you’re right, after I posted my comment yesterday I also saw that indeed the destructor will not block, because of the check for m_cRef being 0.

    But about that exercise: there might still be another problem hiding in the code, which can be triggered because of the free-threaded requirement on the object.

    Assume that one thread (#1) owns the ProcessReference object, and another thread (#2) uses it.

    Consider the case when the 2nd thread is executing in Release(), specifically this line:

    if (lRef == 0) PostThreadMessage(m_dwThread, WM_NULL, 0, 0);

    If that thread is rescheduled/switched out right after (lRef == 0) but *before* calling PostThreadMessage(), and the 1st thread is the next one running, and it calls the ProcessReference’s destructor at that point, then the 1st thread would prematurely exit the destructor (because it would never run the message loop).

    So the answer to Raymond’s exercise might be, that without the destructor having its own call to Release() (and setting the refcount to 0), a free-threaded environment could raise problems. With a Release() call in the destructor I think those dangers are more or less eliminated.

  19. Denisenko Mikhail says:

    If refCount initialized to 0 then every time refCount will go to 0 message will be posted to thread queue. In destructor only 1 message will be consumed. Problems: unexpected messages in message queue – not good, or there is possible queue overflow, possible? still not good :)

  20. BryanK says:

    mikeb: Ah, I bet that’s it.  Accessing member variables after the object has been destroyed (because the destructor has finished, and so the memory that was allocated on the stack is gone, since main has exited and the C runtime is now running) is definitely bad.  :-)

  21. BryanK says:

    > _never_ believe anything Torvalds says.

    That right there is an ad-hominem fallacy.  Who says it is irrelevant: if there’s something wrong with the argument, feel free to point it out.  ;-)

    > It’s a GCC issue

    No, it’s a C standards issue (as you point out later).  There is *no* real standard for what to do when volatile is present (on a multi-CPU machine: but on a single-CPU machine, it’s irrelevant), so various compiler writers are free to do almost *anything*.  That includes adding a barrier (as VS2005 and later have done with a memory barrier, and VS2003 has done with a compiler barrier), turning off basically all their optimizations (as GCC seems to do), or anything else.

    The fact that <insert favorite compiler here> takes "volatile" to mean "add a bunch of barriers all over" doesn’t mean you should post code that depends on that behavior; not every Windows developer uses VS.  (There are some that use mingw, and — at a guess — many more that use compilers from other companies.)  There are even fewer Windows developers that use VS2005 or later (some people still use VC++6, or perhaps even older versions).  So J. Random Developer coming to this blog in six months and using Some Other C++ Compiler reads "insert volatile to fix it!", and does, and gets completely wrong code out.

    Using "volatile" as a magic wand, to be used to make every lock-free multi-threaded algorithm correct, is still WRONG.  It may work properly with *some* compilers, but it isn’t forced to mean anything to the processor, so SMP machines will still get the lock-free algorithm wrong.

    A memory model spec will help future C compilers, yes — but it doesn’t do a whole lot of good at the moment.  ;-)

    Also, regarding GetMessage being a barrier: You need a (read) barrier *before* reading m_cRef, not after reading it.  (Just like you need a write barrier *after* writing to it.)  Of course, GetMessage is called before the *next* read (in the loop), so it might be OK for all except the first access; I haven’t thought it through very far.

    Henry: I’m not sure whether that’s an issue, since the destructor does still exit.  Yes, it exits early, but I don’t *think* that matters.

    Although, hang on, it might: When the second thread resumes, it will still post a WM_NULL message to the "main" thread, which won’t get consumed by the destructor.  Depending on what happens after the destructor finishes, this may be bad — if that thread goes into another GetMessage loop, then it will be bad, since it’ll pick up a message that nobody should have sent.

  22. Old Coder says:

    Well, if you forget and leave out InterlockedDecrement in the release method, your program falls over pretty quickly.

    File this mistake under I should have used cut and paste…

  23. mikeb says:

    >> Henry: I’m not sure whether that’s an issue, since the destructor does still exit.  Yes, it exits early, but I don’t *think* that matters.

    Although, hang on, it might: When the second thread resumes, it will still post a WM_NULL message to the "main" thread, which won’t get consumed by the destructor.  Depending on what happens after the destructor finishes, this may be bad — if that thread goes into another GetMessage loop, then it will be bad, since it’ll pick up a message that nobody should have sent. <<

    At least one race condition exists regardless of the behavior of message queues or what PostThreadMessage() might do internally.  The m_dwThread member of ProcessReference can be accessed in the Release() method after the dtor finishes if the m_cRef counter is initialized to 0 instead of 1.

  24. yme says:

    How does initializing the reference count to 1 prevent the worker thread from accessing m_dwThread in Release() after the destructor is done?  It seems to me that once the destructor executes Release(), things are the same as if the reference count had been initialized to 0.

    That is: the reference count is 1; the worker thread starts running Release(), which decrements it to 0; the main thread continues running the destructor, which completes; the worker thread continues running Release(), which tries to access m_dwThread.

  25. mikeb says:

    > It seems to me that once the destructor executes Release(), things are the same as if the reference count had been initialized to 0. <<

    The difference is that before the Release() is called in the dtor the ProcessReference has called SHSetInstanceExplorer(NULL) while the refcount is non-zero so nothing else can obtain a new reference.  Of course, this means that the implementations of SHSetInstanceExplorer() and SHGetInstanceExplorer() need to have some synchronization, but that’s an internal implementation detail that one should be able to assume is taken care of (otherwise the API is unsafe no matter what you do as a client).

  26. yme says:

    I don’t follow.

    If the reference count starts at 1 and the destructor calls Release(), then at the beginning of the destructor’s while() loop, there may be any number of outstanding references to the object, and the reference count equals that number, as it should.  And the same holds if the reference count starts at 0 and the destructor omits the call to Release().

    Why does it matter what the reference count was when the references were given out?

    For an object that deletes itself whenever its reference count changes from 1 to 0, I understand that the reference count should never be 0 when anyone holds a reference, because they may call AddRef() and then Release(), expecting things to return to the way they were, but instead the object will disappear.  But here, strictly, the reference count needs to be accurate only when the destructor starts its while() loop, because before that, the object will stay around regardless of its reference count.

    I hope I don’t sound too nitpicky.  Undoubtedly, it’s a good idea to keep the reference count accurate all the time, just for consistency: if the implementation of SHSetInstanceExplorer() and SHGetInstanceExplorer() has a reference to the object, we may as well record that fact in the reference count.  But Raymond did ask as an exercise how the object would behave if we didn’t, and so far, I don’t see how it would behave differently.  (Of course, maybe that’s the right answer.  I don’t know.)

  27. nikos says:

    is this API similar to what you can get with SHCreateThreadRef/SHSetThreadRef ?

Comments are closed.