Why is it so hard to shim IRibbonExtensibility?

The COM Shim Wizards are a set of Visual Studio 2005 project wizards that automate the generation of COM shims for non-VSTO managed Office extensions. These shims support COM add-ins, smart tags and real-time data components. The shim for each is broadly similar, although there are obvious differences to allow for the different interfaces that each component implements, registration differences and so on.

 

There have been many questions in the forums over the last few months as to why these shims do not support the new Office 2007 extensibility interfaces, particuarly the interfaces for custom ribbons, custom task panes, and custom form regions. The short answer is that the shims are written to support a specific set of interfaces (IDTExtensibility2, ISmartTagRecognizer, ISmartTagAction, IRtdServer), and would therefore need to be extended to support the new interfaces (IRibbonExtensibility, ICustomTaskPaneConsumer, FormRegionStartup, etc).

 

OK, so why don’t we update the shims to support these new interfaces? The answer is that for all the interfaces except IRibbonExtensibility, this would be fairly simple. The problem is IRibbonExtensibility.

 

First the good news. The good news is that Office is generally moving towards an extensibility model where all new interfaces are implementable via COM add-ins (managed or unmanaged). More good news: all of these new interfaces are accommodated by the VSTO runtime if you’re building VSTO add-ins. If you’re not building VSTO add-ins, you simply implement the interface in your add-in code directly (in fact, you can also do this in VSTO add-ins if you don’t want to use the higher-level VSTO abstractions). The COM shim can be easily extended to specifically support most of these interfaces.

 

Now the bad news: although all the interfaces are implementable by add-ins, they are not all the same. Specifically, the programming model is different. Most especially, the programming model for ribbon extensibility is completely different from all the other interfaces. All the other interfaces follow the same model as any traditional COM interface. That is, the interface defines a number of methods, which are the communication points (the “interface”) between Office and the add-in. IRibbonExtensibility does define such an interface (it is a COM interface, after all), but it also assumes additional methods that are not defined in the interface.

 

IRibbonExtensibility only defines one method (GetCustomUI), BUT Office will expect to call back on your IRibbonExtensibility object for any number of additional methods in response to user interaction with the ribbon and any programmatic ribbon control state changes. For example, in your ribbon XML, you would specify that the callback method for a button might be MyButtonCallbackMethod – and this method is clearly not defined in the IRibbonExtensibility interface. In other words, while IRibbonExtensibility is basically a COM interface, it is most emphatically an automation interface. That is, you’re allowed to define any number of additional methods on the end of the interface, and Office will call back on these methods using IDispatch::GetIDsOfNames and IDispatch::Invoke. So why is an automation interface a problem for the shim?

 

Recall that there are two standard ways to build a shim: containment and aggregation.

 

With containment, the shim acts like an outer component, and contains pointers to interfaces on the inner component (the add-in, smart tag, etc). The shim re-implements the same interfaces as the add-in, and when Office calls into the interface on the shim, the shim then passes the call on to the add-in’s implementation of the same interface. With this approach, the shim can specialize the interface with prolog and epilog code wrapping the call to the add-in. This is how the shim is currently written: it uses containment for the interface on the add-in. We do this because we want to provide behavior that is additional to the add-in’s implementation of the interface – notably to create and destroy appdomains as part of the startup (IDTExtensibility2::OnConnection) and shutdown (IDTExtensibility2::OnDisconnection) sequences. The same model holds true for smart tags and RTD components (although with different interfaces, of course).

 

We could extend the shim’s containment model to cover ICustomTaskPaneConsumer and FormRegionStartup. Unfortunately, we can’t extend it to cover IRibbonExtensibility, because Office relies on the additional methods in the ribbon object that are not defined in the IRibbonExtensibility interface. If we don’t know what these additional methods are, we can’t provide a containment for them.

 

Aggregation is a specialization of composition. When an outer component aggregates an interface of an inner component, it doesn't reimplement the interface - it merely passes the inner component's interface pointer directly to the caller. With this model, the outer component can't specialize the behavior of the inner component. The critical problem with aggregation is that you’re handing out an interface pointer to the inner object back out to the caller (Office in this case). Because of this, you must be very careful to construct the inner object so that it is aware that it is being aggregated. Why? Because of the COM rules of identity, and specifically QueryInterface transitivity.

 

A COM component can implement any number of interfaces, and a caller must be able to switch between these interfaces, and back again. The COM rules state that “the QueryInterface method takes an interface ID and returns the requested COM interface on the same object. The set of interface IDs accessible via QueryInterface is the same from every interface on the object.” That is, whatever COM interface pointer a caller has on an object, it should be able to switch to any other interface on that object, and back again. This is a problem in aggregation, because here the caller starts off with a pointer to the outer object’s interfaces, and later gets handed a pointer to the inner object’s interfaces. Now, the outer object is aware that it is aggregating an inner object, so it can always deal with QIs for inner object interfaces correctly. However, we can’t have the caller QI on an inner object interface directly, because the inner object doesn’t have any knowledge of the outer object’s interfaces.

 

So, how does aggregation work? The solution is for the inner object to delegate IUnknown calls in its own interfaces, but also allow the outer object to access the inner object's IUnknown functions directly. All in all, aggregation is reasonably well-understood, and it works. The critical issue is that the inner object needs to know that it is being aggregated, so that it can take the correct action when QI calls come in. This is simple enough if you’re building your inner object with ATL – by default, ATL objects are aggregatable as inner objects. It’s not so simple if you’re building your inner object with managed code. It's even more difficult if your inner object is managed and your outer object is unmanaged.

 

There is an additional problem in the shim/add-in scenario, because typically, you build an add-in and then use the COM shim wizards to generate a shim. While we can write wizards that generate shims, the wizard can’t interfere with the code in the add-in itself - you might not even have the sourcecode of the add-in. So, there’s no easy way for us to make the add-in behave correctly as an inner object in an aggregation.

 

So, where do we go from here?

 

One option that we considered was to extend the existing shim containment model to allow the shim to reflect over the add-in to discover what additional methods are exposed from the object that implements IRibbonExtensibility, and then shim those methods also. But that’s a lot of work at runtime, and potentially error-prone. We would have to assume that any public method on the object is a potential ribbon callback, and that’s a wild assumption. Or, we’d have to mandate that the add-in developer adds some [RibbonCallback] attribute to each of the methods they want to expose. But, the existing shim doesn’t make any such assumptions about the add-in, and we don’t want to touch the add-in code at all. So, this is not a viable option.

 

Another option that we considered early on was to override the InternalQueryInterface method on the shim (which, you may recall is implemented with ATL). The thinking here was that for any new extensibility interface QI'd by the calling host, we would delegate to the add-in itself:

 

HRESULT WINAPI CConnectProxy::InternalQueryInterface(

       void* pThis, const _ATL_INTMAP_ENTRY* pEntries, REFIID iid, void** ppvObject)

{

       if (m_pConnect &&

              (iid == IID_ICustomTaskPaneConsumer

              || iid == IID_FormRegionStartup

              || iid == IID_IRibbonExtensibility

              || iid == IID_IBlogExtensibility

              || iid == IID_IBlogPictureExtensibility

              || iid == IID_IDocumentInspector

              || iid == IID_SignatureProvider

              || iid == IID_EncryptionProvider ))

       {

              HRESULT hr = m_pConnect->QueryInterface(iid, ppvObject);

              if (hr == S_OK)

                     return hr;

       }

       return CComObjectRootBase::InternalQueryInterface(

pThis, pEntries, iid, ppvObject);

}

 

This could be simplified further, so that we QI on the inner object for any interface apart from the ones we want the shim itself to handle:

 

HRESULT WINAPI CConnectProxy::InternalQueryInterface(

       void* pThis, const _ATL_INTMAP_ENTRY* pEntries, REFIID iid, void** ppvObject)

{

       if (m_pConnect

&& iid != IID__IDTExtensibility2

&& iid != IID_IUnknown

&& iid != IID_IDispatch)

       {

              HRESULT hr = m_pConnect->QueryInterface(iid, ppvObject);

              if (hr == S_OK)

                     return hr;

       }

       return CComObjectRootBase::InternalQueryInterface(

pThis, pEntries, iid, ppvObject);

}

 

BUT: this breaks COM identity rules, because in this approach we’re handing out a pointer to one of the new interfaces, which may be implemented in a separate object from the add-in. So, if Office were ever to QI again on that pointer, they would not get a QI closure to all the interfaces supported by the add-in. In other words, this code would implement a kind of partial or fake aggregation, without the most crucial feature of real aggregation.

 

It’s seductive to think that the following would work (recall that the COM map in an ATL project expands out to the QueryInterface implementation):

 

BEGIN_COM_MAP(CConnectProxy)

       COM_INTERFACE_ENTRY(IDTExtensibility2)

       COM_INTERFACE_ENTRY2(IDispatch, IDTExtensibility2)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_ICustomTaskPaneConsumer , m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_FormRegionStartup, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_IRibbonExtensibility, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_IBlogExtensibility, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_IBlogPictureExtensibility, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_IDocumentInspector, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_SignatureProvider, m_pConnect)

       COM_INTERFACE_ENTRY_AGGREGATE(IID_EncryptionProvider , m_pConnect)

END_COM_MAP()

 

…which could be simplified to:

 

BEGIN_COM_MAP(CConnectProxy)

       COM_INTERFACE_ENTRY(IDTExtensibility2)

       COM_INTERFACE_ENTRY2(IDispatch, IDTExtensibility2)

       COM_INTERFACE_ENTRY_AGGREGATE_BLIND(m_pConnect)

END_COM_MAP()

 

In other words, fix up the COM map of the shim so that all calls to non-IDTExtensibility2 interface functions get passed on to the add-in. In fact, this probably would work, right now at least. However, if you think about it, this is really no different from the InternalQI override above – we’re still breaking the QI transitivity rule. We’re asserting that we aggregate the add-in as the inner object, when in fact we don’t aggregate it at all. The add-in object (or, realistically, the multiple add-in objects that implement each of the extensibility interfaces) are crucially unaware that we’re presenting them as being aggregated.

 

So, why do I say it would work for now? It seems that right now, Office gets the first interface pointer on the object (the IUnknown of the object that implements IDTExtensibility2), and then always uses that pointer to QI for any subsequent interfaces. Right now, it doesn't QI on any of the subsequent interfaces. In other words, the potential QI transitivity breach of these two approaches is not in fact being hit – but there’s no guarantee that this would always be the behavior, and both approaches are technically still wrong, so it’s obviously an unacceptable risk.

 

The correct solution is to either (a) truly aggregate the add-in and all its inner objects, or (b) extend the existing shim containment model to cover all 7 of the new interfaces also. Which brings us back to IRibbonExtensibility. We could cover all the other new interfaces with containment, but we can’t cover ribbon customization with containment. And, how do we achieve aggregation without interfering with the add-in code?

 

For the skinny on this, see next week’s exciting adventure of The COM Shim Wizard!!!