Using the REFIID/void** pattern for returning COM objects for future-proofing and to avoid problems with dependencies


Suppose you have a function that creates a reference to a COM object:

// pixie.h

STDAPI CreateShellItemFromPixieDust(
    const PIXIEDUST *ppd,
    IShellItem **ppsi);

There are a few issues with this design.

First of all, it requires that whoever uses your header file must have included shlobj.h first, since that's where IShell­Item is defined. You could solve that problem by putting #include <shlobj.h> at the top of pixie.h, but that creates its own problems. For example, many header files alter their behavior based on symbols that have been #defined, and including that other header file as part of pixie.h means that it's going to use the settings that were active at the time pixie.h was included (which may not be what the clients of your header file are expecting). For example:

#include <windows.h>
#include <ole2.h>
#include <pixie.h>
#define STRICT_TYPED_ITEMIDS
#include <shlobj.h>

This program wants to use strict typed item IDs, so it defines the magic symbol before including shlobj.h. Unfortunately, that request is ignored because the pixie.h header file secretly included shlobj.h prematurely.

This can get particularly messy if somebody wants to include shlobj.h with particular preprocessor tricks temporarily active.

#include <windows.h>
#include <ole2.h>
#include <pixie.h>

// The WINDOWDATA structure added in Windows Vista conflicts
// with the one we defined back in 2000, so rename it to
// WINDOWDATA_WINDOWS.
#define WINDOWDATA WINDOWDATA_WINDOWS
#include <shlobj.h>
#undef WINDOWDATA

// Here's our version of WINDOWDATA
#include <contosotypes.h>

This code works around a naming conflict that was created when Windows Vista added a structure called WINDOW­DATA to shlobj.h. The application already had a structure with the same name, so it has to rename the one in shlobj.h to some other name to avoid a redefinition error.

If you made pixie.h include shlobj.h on its own, it would do so without this fancy renaming, and the developers of that code will curse and say something like this:

#include <windows.h>
#include <ole2.h>

// The WINDOWDATA structure added in Windows Vista conflicts
// with the one we defined back in 2000, so rename it to
// WINDOWDATA_WINDOWS.
#define WINDOWDATA WINDOWDATA_WINDOWS

// pixie.h secretly includes shlobj.h so we have to put its #include
// under WINDOWDATA protection.
#include <pixie.h>

#include <shlobj.h>

#undef WINDOWDATA


// Here's our version of WINDOWDATA
#include <contosotypes.h>

Another problem with the Create­Shell­Item­From­Pixie­Dust function is that it hard-codes the output interface to IShell­Item. When everybody moves on to IShell­Item2, all the callers will have to follow the Create­Shell­Item­From­Pixie­Dust call with a Query­Interface to get the interface they really want. (Which, if your object is out-of-process, could mean another round trip to the server.)

The solution to both of these problems is to simply make the caller specify what type of object they want.

// pixie.h

STDAPI CreateShellItemFromPixieDust(
    const PIXIEDUST *ppd,
    REFIID riid,
    void **ppv);

Now that we are no longer mentioning IShell­Item explicitly, we don't need to include shlobj.h any more. And if the caller wants IShell­Item2, they can just ask for it.

Your creation function used to look like this:

STDAPI CreateShellItemFromPixieDust(
    const PIXIEDUST *ppd,
    IShellItem **ppsi)
{
    *ppsi = nullptr;
    IShellItem *psiResult;
    HRESULT hr = ... do whatever ...;
    if (SUCCEEDED(hr))
    {
       *ppsi = psiResult;
    }
    return hr;
}

You simply have to tweak the way you return the pointer:

STDAPI CreateShellItemFromPixieDust(
    const PIXIEDUST *ppd,
    REFIID riid,
    void **ppv)
{
    *ppv = nullptr;
    IShellItem *psiResult;
    HRESULT hr = ... do whatever ...;
    if (SUCCEEDED(hr))
    {
       hr = psiResult->QueryInterface(riid, ppv);
       psiResult->Release();
    }
    return hr;
}

Callers of your function would go from

IShellItem *psi;
hr = CreateShellItemFromPixieDust(ppd, &psi);

to

IShellItem *psi;
hr = CreateShellItemFromPixieDust(ppd, IID_PPV_ARGS(&psi));

If the caller decides that they really want an IShell­Item2, they merely have to change their variable declaration; the call to the creation function is unchanged.

IShellItem2 *psi;
hr = CreateShellItemFromPixieDust(ppd, IID_PPV_ARGS(&psi));
Comments (36)
  1. Joshua says:

    Ah yes namespace pollution and how to mitigate it.

  2. Xv8 says:

    @Karellen

    Even if they do reserve it, it doesn't stop people ignoring the reservation.

  3. Aaron says:

    Compile-time error:

    {code}

    STDAPI CreateShellItemFromPixieDust(

       const PIXIEDUST *ppd,

       REFIID riid,

       void **ppv)

    {

       *ppsi = nullptr;     // ERROR: Variable ppsi is not declared.

       IShellItem *psiResult;

    {code}

  4. Azarien says:

    Yes, every bit of COM is well designed and thought out, and the result is still ugly and hard to use.

    [Is there an easier/prettier solution that still solves the same problems? (Language-independence, remotability, etc.) Otherwise you're just complaining that hard things are hard, which doesn't really advance the discussion. -Raymond]
  5. Joshua says:

    [Is there an easier/prettier solution that still solves the same problems? (Language-independence, remotability, etc.)]

    Maybe a remoteable solution should not be used where remoteability is not required.

    Maybe when using a remoteable solution, call by name is not too expensive.

    Maybe if most objects didn't care what thread model was in use.

    Maybe if the file-open dialog box didn't screw up on threads using COM objects of a different thread model.

    Maybe if the file-open dialog box set the COM model back to unspecified if it set it on a thread that has yet to start COM.

    Maybe people are pissed off at the fragility of the COM registry that still plagues us even in Windows 8 where as system component goes mysteriously unregistered and there's no way to find out which DLL it's supposed to be in.

    [Um, you do recall that remotability was the reason why COM was originally invented? (Embedding a Word document in an Excel spreadsheet.) Also, call-by-name is basically IDispatch, which means you lose compile-time type safety and user-defined types. I don't know why I allow myself to get sidetracked like this. The point of today's article "Assuming you're using COM, here's how to be more forward-compatible. If you're not using COM, then ignore today's article." -Raymond]
  6. Adam Rosenfield says:

    (2/2) In the case of the iostream library, there's actually a separate, smaller header iosfwd which includes just the declarations of the various I/O classes, not their full definitions.  So if you need to, say, declare operator<<(std::ostream&, const MyClass&), you should include iosfwd in your header file but NOT iostream.  For libraries which neither provide forward declaration headers nor documented declarations, you have no choice but to include the full definition header to portably use their types in your headers.

  7. Adam Rosenfield says:

    @Zf-12: Not in the header file you don't.  Of course you need the full definition of IShellItem in the cpp file which implements CreateShellItemFromPixieDust(), but you don't need the definition in just the declaration of the function in pixie.h.

  8. Zf-12 says:

    @Adam Rosenfield: By using a forward declaration in the header you've introduced an additional burden on every user of the library: Not only do they need to include pixie.h, but also shlobj.h. A cleaner design is to have the header file include all its dependencies including shlobj.h. As for your compilation performance problem, this is generally solved via the "Pimpl idiom" instead.

  9. Evan says:

    By the way, I should say that the forward declaration may or may not help with the "IShellItem2" issue; but it would with the first.

  10. 12BitSlab says:

    @Azarien

    MS does have an "easy" version of COM – managed code.  Like every other choice in life, there are trade-offs.  With COM, execution speed is better.  With managed code, development speed is better.  With COM, you get more flexibility.  With managed code, you are encouraged to stay inside the framework.  Either way, you get to choose which things are most important to you.

  11. Goran says:

    I kinda like there advice, but I hate the motivation. Header of the function needs no includes (shlobj.h or other), if dangers of that are real, then a forward declaration of PIXIEDUST and IShellItem is a way out.

  12. Zf-12 says:

    @ Joshua Muskovitz: Well-designed header files shouldn't be a kitchen sink in first place. So the case where a users wants only some part of the interface while at the same time other parts introduce a compilation performance problem should be rare.

    Therefore the solution shouldn't be to forward-declare everything, but rather split the header into it's logical parts. That's basically the opposite of what Windows.h does.

  13. Karellen says:

    "For example, many header files alter their behavior based on symbols that have been #defined, and including that other header file as part of pixie.h means that it's going to use the settings that were active at the time pixie.h was included"

    But that might happen anyway. Any header file might need to include other header files of components "lower" in the stack to ensure that types it uses are visible. This is especially notable for system header files, e.g. if a header defined functions that use time_t or FILE *, but can also include higher components if a header defines functions that use e.g. a database-connection-type (SqlServerConnection * ?)

    Therefore, if you are using #defines to change the behaviour of a header file you use, you need to do that before you include *any* header file.

    e.g.

    #define STRICT_TYPED_ITEMIDS

    #include <windows.h>

    #include <ole2.h>

    #include <pixie.h>

    #include <shlobj.h>

    (Note – this also applies on POSIX systems with e.g. #define _POSIX_C_SOURCE, or with plain C for #define _ISOC99_SOURCE/_ISOC11_SOURCE)

    "// The WINDOWDATA structure added in Windows Vista conflicts

    // with the one we defined back in 2000, so rename it to

    // WINDOWDATA_WINDOWS."

    Does Windows not reserve a part (or parts) of the global namespace for it's own use, such as WINDOW*, much like the C standard does for str*, mem*, is*, to*, _*, etc…, to try and prevent clashes like this?

    [Standard header files are not allowed to include other standard header files. 4.1.2 "Standard headers": "Each header declares and defines only those identifiers listed in its associated section." -Raymond]
  14. Adam Rosenfield says:

    Why would you include shlobj.h when you could just use a forward declaration?  The header file certainly doesn't need the complete type definition to declare a pointer.

    // pixie.h

    class IShellItem;  // or struct, depending on __cplusplus and other macros

    STDAPI CreateShellItemFromPixieDust(const PIXIEDUST *ppd, IShellItem **ppsi);

    That doesn't solve the problem of what if the caller wants to use IShellItem2 instead of IShellItem, but it does solve all of those header file issues (which are rather uncommon, in my experience—I've seen lots of library headers include Windows headers and not ran into these types of issues.  NDEBUG before assert.h is probably the only time I've seen that).

  15. Karellen says:

    @Adam – Why would you write your own forward declarations when you could just include a header file? That's *what the header file is there for*.

  16. Marvin says:

    I usually wholeheartedly agree with everything Raymond says but this particular advice is BS.

    If you want to throw type safety out if the window you need a damn good reason to do so.

    Of the two reasons Raymond shows first is bogus (more in a bit) while the second is likely to be useless optimization in vast majority if cases.

    So the first reason is bogus. In C and C++ you are never guaranteed particular fact or order of header inclusion. The first example – setting a macro "for a header" is a bad anti-pattern because of that. You set such things in your compiler arguments or in a precompiled header before any inclusions. Anything else is inviting ODR violations and shooting yourself in the foot.

    The second example – renaming is why namespaces where invented and why macros and such should have namespace like prefixes. If you are in a crappy situation where you must resort to that then the proper way to set such rename is to first organize your includes: all system headers, followed by standard library, followed by 3rd party in order of generality, followed by project ones. Then wrap what you need in a macro. Not foolproof but close. This is basic dependency management.

    None of these issues warrant breaking type safety.

    The second issue *might* be important in some rare cases but for a vast majority of users? You are using an out of proc server, create new objects and one extra call is going to break you perf-wise? Sounds like your system is broken already :)

    (I assume you are not using cross machine DCOM here. If you do then you deserve what you get)

    [News flash: There are crappy header files out there. You may be forced to use them. -Raymond]
  17. Karellen says:

    "Standard header files are not allowed to include other standard header files."

    Yeah, sorry, I didn't make myself very clear there. By "system header files", I intended to mean those provided by the operating system, rather than those provided by the C implementation, like "windows.h" on Win32, or "unistd.h" on POSIX. Being a layer above the C standard header files, they can include e.g. "stdio.h" for FILE * and "time.h" for time_t, without violating the C standard. (Because C has very little to say about what operating system header files may or may not do.)

    And as you go higher in your include heirarchy, other header files are likely to include "windows.h" to get the definition of things like HANDLE or REFIID or DWORD or TCHAR or… (Unless they were written by Adam Rosenfeld, and forward-declare all these things themselves instead.)

    Therefore, I stand by my opinion that you should always put all your behaviour-altering #defines before you #include your first header file.

  18. > Another problem with the Create­Shell­Item­From­Pixie­Dust function is that it hard-codes the output interface to IShell­Item. When everybody moves on to IShell­Item2, all the callers will have to follow the Create­Shell­Item­From­Pixie­Dust call with a Query­Interface to get the interface they really want.

    Or you make a CreateShellItem2FromPixieDust to go along with the existing CreateShellItemFromPixieDust.

    In theory I could make a single "void DoSomething(void*)" function with a set of dynamically changing rules that prevent me from *ever* having to change my function definition; but, ew.

    [You're presupposing a time machine. pixie.h was written before IShell­Item2 was invented. Suppose IShell­Item3 gets invented tomorrow. Do you call up Contoso Corporation and say, "Hey, can you release a new version of your component that has a Create­Shell­Item3­From­Pixie­Dust function?" -Raymond]
  19. Adam Rosenfield says:

    @Karellen: Because often you only need the *declaration* of a struct/class, not its whole *definition*.  Once you start having every header file including lots of other header files for definitions it doesn't really need, your compile times will start to go through the roof for large projects (to say nothing of potential circular dependency issues).

    If a type is documented as having a particular declaration, it's perfectly acceptable (and encouraged) to redeclare that yourself with the same declaration.  Things get a little trickier when things aren't fully spec'ed out by the documentation, like certain classes in the C++ STL (you can't forward declare std::string because it a typedef for std::basic_string<char, some_more_traits_and_allocator_stuff>, for example).

    (splitting up this post into 2 because the blogging software thinks this comment is spam….sigh)

  20. Zf-12 says:

    > @Adam Rosenfield: Once you start having every header file including lots of other header files for definitions it doesn't really need,

    You *do* need the "IShellItem" in Raymonds example, so there's no point in using a forward declaration.

  21. Evan says:

    @Karellan: "Why would you write your own forward declarations when you could just include a header file? That's *what the header file is there for*."

    I thought Raymond just created a whole post explaining why you might not want to just include the header. If you buy those arguments, you could omit the header *and still keep type safety* with a forward declaration.

  22. Adam Rosenfield says:

    @Zf-12: No, that's not true at all.  Using a forward declaration in a header does *not* force any of your users to include additional headers they wouldn't otherwise need—that's the whole point of forward declarations!  If pixie.h has "class IShellItem;" in it without including shlobj.h, then if foo.cpp includes pixie.h and just wants to use a PIXIEDUST instance, it doesn't need to include shlobj.h at all.

    The Pimpl idiom is just a wrapper around forward declarations, it's not all that different.  The only reason it's needed in C++ is because you can't define a class's interface (i.e. it's member functions) without also defining the implementation details of its member variables and private methods (C# solves this problem by allowing partial classes).

  23. Zf-12 says:

    > Adam Rosenfield: Using a forward declaration in a header does *not* force any of your users to include additional headers

    Yes, a forward declaration does not force the inclusion of additional headers. But the user *using* a feature does.

    To make long matters short, the point is: forward declarations in header files only make sense for private implementation details, but not for features that the user of the header wants to use, e.g. parts of function signatures.

    So if PIXIEDUST is what is needed by the user of the header, but IShellItem is just an implementation detail that happens to be visible in the header, but is not part of the interface then a forward declaration is fine. If IShellItem is not an implementation detail, say it's part of a function signature for users of the header to consume, then the header should include "shlobj.h", because in this case the user will have to include it anyway.

  24. stickboy says:

    > If IShellItem is not an implementation detail, say it's part of a function signature for users of the header to consume, then the header should include "shlobj.h", because in this case the user will have to include it anyway.

    @Zf-12: Only if the caller wants to invoke CreateShellItemFromPixieDust.  If it wants to call some other function declared in pixie.h, it won't need shlobj.h.  Furthermore, callers of CreateShellItemFromPixieDust are going to need to include shlobj.h *anyway* if using the REFIID/void** pattern.

  25. Joshua Muskovitz says:

    @Zf-12,

    You are mistaken. If a user wants to *use* that feature of PIXIEDUST, then they would have to include shlobj.h. But if that were the case, they would have already needed to include it, since they are themselves using IShellItem in their own code. Forcing *everyone* to include a header with all its inherent risks (order of includes is often a huge nightmare) so that *some* people won't have to add one line of code (and just an include, at that) is silly. And if they don't need to manipulate the IShellItem directly, but only need it to be able to pass it as a pointer to some other function, then they *never* need the include, so it should be left inside the .cpp file to limit its exposure.

    Remember: Your use case probably doesn't match everyone else's use case.

  26. manuell says:

    Typo in the final version of CreateShellItemFromPixieDust.

    *ppsi = nullptr;

    should be

    *ppv = nullptr;

  27. Neil says:

    Even understanding that these are samples not designed to demonstrate error handling (and therefore the first sample doesn't try to release a partly-created psiResult at all), the second version appears to discourage correct code by including a release in a suboptimal place (a separate if (psiResult) psiResult->Release(); before the return would correspond best to where a smart pointer would do it).

  28. Myria says:

    IID_PPV_ARGS is undefined, according to C/C++ aliasing rules.  It is not legal to type-pun/union "void *" with "IUnknown *" like this.  (Then again, I suppose that Microsoft can just say it's defined, since they control the compiler…)

    The strict-aliasing-correct way to do this is as follows:

    IShellItem2 *psi;

    void *pv;

    hr = CreateShellItemFromPixieDust(ppd, __uuidof(*psi), &pv);

    if (SUCCEEDED(hr))

    {

       psi = static_cast<IShellItem2 *>(pv);

    }

    It's not legal to do the static_cast unless SUCCEEDED(hr) is true, because otherwise you're performing an undefined operation: casting a potentially-uninitialized or invalid pointer.  Even if you don't use it, merely forming the pointer is illegal C++.  (reinterpret_cast would be OK to do in this context, provided the pointer is not used until the type is confirmed.)

  29. foo says:

    I don't mind it. In fact I think it's a good idea. If I'm using COM then me (or some agent of mine) would have had to have called CoCreateInstance/Ex at some point, which uses this pattern also.

  30. John Doe says:

    @Myria, where do you get your information from?

    In C, you can provide any pointer to void*, and vice-versa, in the standard since C89.

    In C++, you can provide any pointer to void*, but you must cast the other way around, standard since C++98.

  31. stickboy says:

    @John Doe:

    I think what Myria meant is that while casting between void* and T* is legal, casting between void** and T** is not necessarily safe (although Microsoft gets to define the behavior).  For example, imagine a system where sizeof(void*) > sizeof(struct S*).

    [Win32 requires that sizeof(void*) == sizeof(T*) for all T. -Raymond]
  32. Myria says:

    @John Doe: Yes, and my example uses such a cast through void *.  What *isn't* legal, though, is accessing the pointer itself through a "void **" pointer rather than the correct "IShellItem2 **" pointer.  There is actually no guarantee in C/C++ that "void *" and "IShellItem2 *" are physically stored in the same format.  I've never seen a compiler that varies the physical storage of pointer by their type, though.

  33. > You're presupposing a time machine.

    How so? If CreateShellItemFromPixieDust can return an object that implements IShellItem2, that assumes that code has been updated. If code can be updated, so can a header file.

    I appreciate that the point of the blog post is "hey, if you use this technique, you can update your code (or consume code that can be updated) without having to update your header". My point I'm trying to make in response is that sometimes there is value in having the header match the code.

    > STDAPI CreateShellItemFromPixieDust( const PIXIEDUST *ppd, REFIID riid, void **ppv );

    At this point you may as well call the function CreateFromPixieDust(…). You no longer need to update the header file ever again (yay!) but you do need to figure out how to document what interfaces can be created (or refer to the documentation for the components you are internally consuming.)

    [CreateShellItemFromPixieDust studies the pixie dust and then calls a function like SHCreateShellItem with the appropriate parameters. The IShellItem2 is coming from SHCreateShellItem, not CreateShellItemFromPixieDust. CreateShellItemFromPixieDust is just a middle man. The point of CreateShellItemFromPixieDust is that it creates a shell item (via SHCreateShellItem), but you can pick which aspect of the shell item you want to receive. This is different from CreateFromPixieDust which could create an IStream from pixie dust, and then it would have to study the riid to decide whether to call SHCreateShellItem or BindToObject. -Raymond]
  34. Joshua says:

    [Um, you do recall that remotability was the reason why COM was originally invented?]

    In fact yes. I guess what gets on my nerves is all these shell APIs that aren't ever going to remote.

    [Some shell interfaces can be marshaled. Are you suggesting that an alternative interface be used for the ones that can't be marshaled? And then what if somebody decides to add marshaling support to an existing interface? Would there be two different interfaces to the same functionality, a COM-based one that can be marshaled, and a non-COM-based one that cannot? -Raymond]
  35. Joshua says:

    [Would there be two different interfaces to the same functionality, a COM-based one that can be marshaled, and a non-COM-based one that cannot? -Raymond]

    In my opinion a good thing. Too many cases come up where I want to do something that should work but the only interface is a COM interface that tries to call in several hundred MB of address space and then load some third-party DLL and several hundred MB more address space (due to failure to dedupe C runtime libraries) to do something like get an icon for a file. Address space is still a scarce resource.

    [Um, the non-COM interface (e.g. SHGet­File­Info is going to load the same third-party DLLs, seeing as the third party is the icon provider. -Raymond]
  36. John Doe says:

    @Myria, I stand corrected.

    @Joshua, but but… you can always delegate to a COM out-of-proc server that actually fetches the icons and returns them with remotable interfaces.  This way, it's not your process' address space, it's more stable and reliable, and it's cacheable for the whole user session, so it can in fact become faster in the not-so-longer term.  How easier could it be?

Comments are closed.