How COM works & How to build a COM visible DLL in C#.Net, call it from VBA and select the proper ClassInterface (AutoDispatch, AutoDual) [part1/2]

 

This article is split across two blog posts and this is part #1.. use this link (please be patient :), I am still working on part 2) to go to part #2.When a beginner programmer needs to create a COM visible DLL which will be called from VBA, PowerShell, VBScript or .NET applications, he can find lots of tutorials on how to get started.

Some of the articles are just presenting the simplest possible steps to build the COM DLL, others go into more detail. For someone that has not had the opportunity to work extensively with multiple versions of the same COM DLL, it is easy to overlook a couple of very important library design details. If this happens, and the project gets released to end-users, fixing the problem becomes a lot harder.

What's the most important choice when creating COM visible DLLs?   

ANSWER: Selecting AutoDual, AutoDispatch or None for its interface. 

Each COM visible library exposes a set of functions to the outside world. The programmer can choose how these functions are discovered by applications referencing his DLL.As this MSDN article explains: https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.classinterfacetype.aspx (ClassInterfaceType Enumeration), we basically have 3 options:

https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.classinterfacetype.aspx ClassInterfaceType Enumeration---------------------------------------------------------

[...] Members

Member name      

Description

None

Indicates that no class interface is generated for the class. If no interfaces are implemented explicitly, the class can only provide late-bound access through the IDispatch interface. This is the recommended setting for ClassInterfaceAttribute. Using ClassInterfaceType.None is the only way to expose functionality through interfaces implemented explicitly by the class.

Tlbexp.exe (Type Library Exporter) exposes the first public, COM-visible interface implemented by the class as the default interface of the coclass. Beginning with the .NET Framework version 2.0, you can specify the default interface exposed to COM by using the ComDefaultInterfaceAttribute attribute. If the class implements no interfaces, the first public, COM-visible interface implemented by a base class becomes the default interface (starting with the most recently derived base class and working backward). Tlbexp.exe exposes _Object as the default interface if neither the class nor its base classes implement interfaces.

 

AutoDispatch  

Indicates that the class only supports late binding for COM clients. A dispinterface for the class is automatically exposed to COM clients on request. The type library produced by Tlbexp.exe (Type Library Exporter) does not contain type information for the dispinterface in order to prevent clients from caching the DISPIDs of the interface. The dispinterface does not exhibit the versioning problems described in ClassInterfaceAttribute because clients can only late-bind to the interface.

This is the default setting for ClassInterfaceAttribute.

 

AutoDual

Indicates that a dual class interface is automatically generated for the class and exposed to COM. Type information is produced for the class interface and published in the type library. Using AutoDual is strongly discouraged because of the versioning limitations described in ClassInterfaceAttribute.

[...]  

 

By-default, if you don't do anything to change the class interface settings, Visual Studio will automatically generate:   > an AutoDispatch interface for your COM library if you create it using C#;   > an AutoDual interface for your COM library if you create it using VB.Net;

 

Why is this choice so important?   

It matters because programs are caching (binding to) your DLL's interface information (it is an internal optimization designed to improve code execution speed - I'll give you more details later in the article).If you plan to release new versions which will run side-by-side or will replace the old ones, you don't want everyone else to recompile the code which calls into your library.

 

Some background information about COM (how it works behind the scenes)

Microsoft has released a very nice article which explains how COM works from the Office point of view. However, its design principles apply to all COM libraries:

https://msdn.microsoft.com/en-us/library/aa155776(v=office.10).aspx Automating Microsoft Office 97 and Microsoft Office 2000-------------------------------------------------------------------------

[...]

Introduction

Automation, formerly called "OLE Automation" is a technology that allows you to take advantage of an existing program's content and functionality, and to incorporate it into your own applications. Automation is based on the Component Object Model (COM) . COM is a standard software architecture based on interfaces that is designed to separate code into self-contained objects, or components. Each component exposes a set of interfaces through which all communication to the component is handled.

With Automation, you can use the Microsoft® Word mail merge feature to generate form letters from data in a database without the user being aware that Word is involved. You could even use Automation to incorporate all of the charting and data analysis functionality that Microsoft Excel provides. You don't need to write your own calculation engine to provide the multitude of mathematical, financial, and engineering functions that Excel provides; instead, you can automate Microsoft Excel to "borrow" this functionality and incorporate it into your own application.

Automation consists of a client and a server. The Automation client attaches to the Automation server so that it can use the content and functionality that the Automation server provides. The terms client and server are mentioned frequently throughout this article, so it is important that you understand their relationship.

This article is intended to provide you with a foundation for developing your own Automation clients for Microsoft Office applications. In this article, we provide a hands-on approach to help you to do the following:

  • Understand how Office applications expose their content and functionality to Automation clients;
  • Identify the specific functions for the task you choose to Automate;
  • Locate the resources and documentation you need;
  • Understand how Automation works behind the scenes;
  • Create Automation clients with Visual Basic®, Visual C++®, and Microsoft Foundation Classes (MFC);
  • Develop a controller that uses the server as efficiently as possible;

All the Microsoft Office applications have their own scripting language, which can be used to perform tasks within applications. This scripting language is Microsoft Visual Basic for Applications (VBA). The set of functions that a VBA routine, or macro, can use to control its host application is the same set of functions that the Automation client can use to control the application externally, regardless of the programming language for the controller.

[...]

 

Type Libraries

 

A type library specifies all the information an Automation client needs to call a method or property for an object. For properties, the type library describes the value it accepts or returns. For methods, the type library provides a list of all the arguments the method can accept, tells you the data type of each argument, and indicates whether an argument is required.

Type libraries can appear in any of the following forms:

  • A resource in a .dll file;
  • A resource in an .exe file;
  • A stand-alone type library file (.tlb);

[...]

OLE/COM Object Viewer

To view type library information, you can also use the OLE/COM Object Viewer utility that is included with Microsoft Visual Studio® . Like the Object Browser that Visual Basic and Microsoft VBA use, the OLE/COM Object Viewer lists all the classes exposed in a type library, along with the methods and properties those classes support. For the C++ programmer, the OLE /COM Object Viewer provides important information, such as function return types, argument types, and DISPIDs (which will be discussed shortly). You can use the information that the viewer provides in conjunction with the Office object model documentation. 

To view a type library with the OLE/COM Object Viewer:

  1. In Visual C++, on the Tools menu, click OLE/COM Object Viewer.
  2. On the File menu, click View TypeLib, and browse to locate the type library you want to view. 

The OLE/COM Object Viewer illustrated in Figure 3 displays information from the Microsoft Word 97 type library. More specifically, it shows the details for the Add member function of the Documents class.

   Figure 3. Use the OLE/COM Object Viewer to view class information in a type library.

[...]

PROGIDs and CLSIDsOffice applications register all of their classes in the Windows® system registry. Each class is associated with a globally unique identifier (or GUID) called a class identifier. In the registry, each class identifier (or CLSID) is mapped to a programmatic identifier (or PROGID) and to its application, as shown in Figure 4.   Figure 4. The relationship between PROGID, CLSID, and Server as described in the Windows registry

When automating an Office application, you can use either the CLSID or the PROGID to create an object from one of the classes that the Office application exposes.

[...] 

 

How an Object Exposes Its Methods and Properties

All COM objects that can be automated implement a special interface: the IDispatch interface. It is this IDispatch interface that provides Automation clients with access to an object's content and functionality. An object can expose its methods and properties in two ways: by means of a dispatch interface and in its vtable.

The Dispatch Interface

An object's methods and properties collectively make up its dispatch interface (or dispinterface). Within the dispinterface, each method and property is identified by a unique member. This member is the function's dispatch identifier(or DispID).

   Figure 5. An object can provide clients access to its functions using a dispinterface, an array of function names, and an array of function pointers that are indexed by DispIDs.

To execute a method or a property of the object portrayed in Figure 5, an Automation client can:

  1. Call GetIDsOfNames to "look up" the DispID for the method or property.
  2. Call Invoke to execute the method or property by using the DispID to index the array of function pointers.
Note IDispatch and its functions are described in greater detail in this article under "Creating an Automation client with C++."

Virtual Function Table (vtable)

The pointer to an interface references a table of pointers to functions that the interface supports. This table is called the virtual function table (or vtable). A COM object can expose its functions in its vtable to provide clients more direct access to the functions. Subsequently, this eliminates the need to call Invoke and GetIDsOfNames.

An object that exposes its methods through both a dispinterface and its vtable supports a dual interface. Figure 6 represents an object that has a dual interface. Clients can either:

  • Access its functions using GetIDsOfNames and Invoke

    -or-

  • Access its functions through the vtable. It is up to the Automation client to decide which method it uses to gain access to the object's functions.

   Figure 6. An object that has a dual interface can provide clients access to its functions with a dispinterface and in its vtable.

You can examine type libraries in the OLE/COM Object Viewer to determine if an object provides a dual interface. An object that provides a dual interface specifies the dual attribute in its interface declaration.

[...]

Sorry for using so many different colors to highlight the article from above, but it is vital that you don't miss one of those key concepts.

 

So .. to briefly summarize what's written above:   > when COM libraries get registered on the client machine, they create a set of entries (PROGID,       CLSID ..etc) in the Registry; These entries help client programs discover COM automation servers;   > client programs can create instantiate COM server objects using their PROGID (example: CreateObject ("<some COM_DLL PROGID>") ) or CLSID (example: ProgIDFromCLSID( <some CLSID>", &progid) );   > all COM libraries expose their internal functions using a dispatch interface (which is made of a collection       of separate interfaces, such as IUnknown, IDispatch ..etc);  >  specific functions inside the IDispatch interface are responsible for allowing client programs to "look up"       methods inside the COM library (GetIDsOfNames);   >  if IDispatch:: GetIDsOfNames returns a valid function pointer (DISPID), the client program calls        IDispatch:: Invoke and uses the DISPID pointer to execute it;    >  other functions exposed by the dispatch interface handle the lifetime of the COM object (we'll talk more        about the IUnknown interface in the next paragraphs);   >  but if the creator of the COM DLL wishes so, he can allow client programs to bind to his library's        vtable of DISPID function pointers; In this way, clients can optimize the execution speed (by not having       to discover, then execute each COM method call .. they can use the DISPIDs which are already cached)       of COM automation code.

 

This was a high level overview of what happens behind the scenes .. the whole process of calling and sending / receiving parameters to COM servers is more complex.

For the sake of completeness, I will go ahead and cite the rest of the article which talks about the inner workings of IDispatch and IUnknown. You will need this information if you decide to write your own COM automation client program in C++, or if you build your own compiler (which will need to do this kind of dirty work, to let the programmer concentrate on writing the code, without worrying how COM calls will be triggered).  

https://msdn.microsoft.com/en-us/library/aa155776(v=office.10).aspx Automating Microsoft Office 97 and Microsoft Office 2000 -------------------------------------------------------------------

[...]Creating an Automation Client with C++

The IUnknown and IDispatch Interfaces

Interfaces are the cornerstone to COM. An interface is a table of pointers to related functions. Once you acquire a pointer to an interface, you have access to the functions in that interface. The IUnknown and IDispatch interfaces are at the heart of Automation.

All COM interfaces inherit from the IUnknown interface. IUnknown gives COM objects the means to manage their lifetimes and provides clients access to other interfaces that an object supports. The IUnknown interface has only three functions, all of which a COM object must support.

IUnknown::QueryInterface()    Called to identify and navigate interfaces that an object supports
IUnknown::AddRef() Called each time a client makes a request for an interface
IUnknown::Release() Called each time a client releases an interface

Each COM object is responsible for maintaining a count of the number of interface pointers it has handed out to clients by means of AddRef and Release. This count is called the reference count. When the object's reference count goes to zero, this is an indication to the object that there are no clients currently using its interfaces and that it can safely remove itself from memory. As you can imagine, properly maintaining reference counts is very important; if references to objects are not properly released, you risk the chance of leaving an object in memory even when it is no longer in use.

In addition to IUnknown, every COM object that can be automated implements IDispatch because it is IDispatch that gives a client access to the object's properties and methods. The IDispatch interface provides the means for automating COM objects using only four functions.

IDispatch::GetTypeInfoCount()   Called to determine if type information is available
IDispatch::GetTypeInfo() Called to retrieve the type information
IDispatch::GetIDsOfNames() Called to obtain the DISPID from the name of a property or method
IDispatch::Invoke() Called to invoke a method or property for the object

To begin the Automation process, a client creates an instance of the Automation server by making a call to ::CoCreateInstance. With ::CoCreateInstance, you provide a CLSID for the Automation server and make a request for the IUnknown interface. (Note that you can determine the CLSID from the ProgID at run time with ::CLSIDFromProgID. ) Once the pointer to IUnknown is received, the client can then make the call to IUnknown::QueryInterface for a pointer to the object's IDispatch interface. The following code illustrates how an Automation client can create a new instance of Microsoft Word and obtain an IDispatch pointer to Word's Application object:

// Get the CLSID for Word's Application ObjectCLSID clsid;CLSIDFromProgID(L"Word.Application", &clsid);  // Create an instance of the Word application and obtain the pointer // to the application's IUnknown interfaceIUnknown* pUnk;HRESULT hr = ::CoCreateInstance( clsid,                                 NULL,                                 CLSCTX_SERVER,                                 IID_IUnknown,                                 (void**) &pUnk); // Query IUnknown to retrieve a pointer to the IDispatch interfaceIDispatch* pDispApp;hr = pUnk->QueryInterface(IID_IDispatch, (void**)&pDispApp);

Once the client has a pointer to the object's IDispatch interface, it can begin the work of calling the object's exposed methods and properties. To call methods and properties, it needs their corresponding DISPIDs. The Automation client can call IDispatch::GetIDsOfNames to retrieve a DISPID for a function of the object. Then, with the DISPID in hand, the client can use IDispatch::Invoke to invoke the method or property.

Now consider the IDispatch interface in terms of a C/C++ Automation client that automates Word to create a document similar to that of your Visual Basic Automation client. Figure 7 represents how this Automation client might use the IDispatch interface for the Automation server:

     Figure 7. A representation of the COM objects that an Automation client might access for Microsoft Word

The Automation client represented in Figure 7:

  • Calls ::CoCreateInstance to create a new instance of Microsoft Wordand obtains a pointer to the Application object's IUnknown interface, pUnk.
  • Obtains a pointer to the Application's object's IDispatch interface through a call to IUnknown::QueryInterface. This pointer is pDispApp.
  • Calls IDispatch::GetIDsOfNames on pDispApp to acquire the DISPID of the Application's Documents property and receives the DISPID 0x6.
  • Calls IDispatch::Invoke with the DISPID 0x6 to get the Documents property. The call to get the Documents property returns a pointer to IDispatch for the Documents collection. This pointer is pDispDocs.
  • Calls IDispatch::GetIDsOfNames on pDispDocs to acquire the DISPID of the Documents' Add method and receives the DISPID 0xb.
  • Calls IDispatch::Invoke with the DISPID 0xb to execute the Add method so that a new document is added in Microsoft Word.
  • Continues in this same manner, making calls to pairs of IDispatch::GetIDsOfNames and IDispatch::Invoke, until the automated task is complete.

When IDispatch::Invoke is called upon to invoke a method or property, it alsopasses on parameters for the invoked method or property and receives its return value, if a value is returned. As you can see, IDispatch::Invoke really does most of the work in this process and rightfully deserves a little extra attention.

HRESULT Invoke(DISPID dispIdMember , // DISPID for the member function REFIID riid, // Reserved, must be IID_NULL LCID lcid, // Locale WORD wFlags, // Flags describing the call's context DISPPARAMS FAR* pDispParams, // Structure containing the arguments VARIANT FAR* pVarResult, // Return Value of invoked call EXCEPINFO FAR* pExcepInfo, // Error information unsigned int FAR* puArgErr // Indicates which argument causes error);

 

Examine the arguments for IDispatch::Invoke in more detail:

  • dispIDMember is the DISPID of the method or property you want to invoke.
  • riid is reserved and must be IID_NULL.
  • lcid is the locale context and can be used to allow the object to interpret the call specific to a locale.
  • wFlags indicates the type of member function you're invoking; are you executing a method, getting a property, or setting a property? The wFlags parameter can contain the following:
    DISPATCH_METHOD  Executes a method
    DISPATCH_PROPERTYGET Gets a property
    DISPATCH_PROPERTYPUT Sets a property
    DISPATCH_PROPERTYPUTREF Sets a property by a reference assignment rather than a value assignment
     
  • pDispParams is a pointer to a DISPPARAMS structure; a DISPPARAMS is a single "package" that represents all of the parameters to pass to the method or property you're invoking. Each parameter represented by a DISPPARAMS structure is type VARIANT. (VARIANTs are discussed in greater detail later in this section).
  • pVarResult is a pointer to another VARIANT type and represents the value returned from the invoked property or method.
  • pExcepInfo is a pointer to an EXCEPINFO structure that contains exception information.
  • puArgErr also contains error information. If an error occurs due to one of the parameters you passed to the invoked method or property, puArgErr identifies the offending parameter.

The simplest use of IDispatch::Invoke is to call a method that has no parameters and does not return a value. Now, once again consider the IDispatch interface in terms of your Automation client for Microsoft Word. As you might recall, the TypeParagraph method of the Selection object is one such method that had no return value and no arguments. Given the IDispatch pointer to the Selection object, you could invoke the TypeParagraph method with the following code:

// pDispSel represents a pointer to the IDispatch interface of the// Selection object. DISIPD dispid;DISPPARAMS dispparamsNoArgs = {NULL, NULL, 0, 0};HRESULT hr;OLECHAR FAR* szFunction;szFunction = OLESTR("TypeParagraph");hr = pDispSel->GetIDsOfNames (IID_NULL, &szFunction, 1, LOCALE_USER_DEFAULT, &dispid);hr = pDispSel->Invoke (dispid, IID_NULL, LOCALE_USER_DEFAULT, DISPATCH_METHOD, &dispparamsNoArgs, NULL, NULL, NULL);

 [...]

 
 

How are those internal COM mechanisms affecting your COM visible DLL? As I wrote, you most probably won't actually need to call low-level IDispatch methods from your code just to execute a simple COM call .. if you are accessing COM objects from VBA, the compiler hides those things form you.  

But even if high-level programming languages hide those complex details away, you must be aware of what could happen if you allow client programs to cache your COM DLL's interface.There are two ways in which you can import a COM DLL in Visual Basic..

Early binding:   > go to Tools > References > locate the COM DLL > click OK to import it in your VBA project;  > inside your code, create an object instance like this: Dim objCOM As New COM_Obj_ProgID;  > from this point onwards, the VBIDE will discover the list of available methods and properties for     your newly created COM object class instance and you will receive InteliSense suggestions about     the name and list of accepted parameters, when you will try to call any method;  > when using early binding, VBIDE reads and caches your DLL's vtable, containing all function pointers;  > if at any point you try to call a method, the VBIDE doesn't call IDispatch to check if the method exists,      it just calls the corresponding function pointer;  > this is OK from the client's point of view, because the performance is optimized, and you get design-time InteliSense suggestions from the IDE;  > but if you later decide that some functions need to be removed or replaced and you roll out the new      COM DLL version to all end-users, every VBA project referencing the previous version might throw a runtime error, because the DISPID table of pointers exposed by the new DLL is going to be different;  Late binding:   > you don't need to import any reference;  > just instantiate the object like this: Dim objCOM As Object;  > then you can create a new class instance using:      Set objCOM = CreateObject("<COM PROGID>") ;  > finally, you will be able to access all COM visible methods by calling them as usual;  > because VBA has no way of knowing (until run-time) if the method you tried to execute is correctly      accessed (such as the proper method name, the correct number of parameters, the expected data      type), you can write any combination of method name and input parameters you want;  > if you got it wrong, VBIDE will trigger a run-time error when the IDispatch interface returns an      error code because the VBA compiler will first try to discover if the methods you are attempting      to access are really available;    > however, this late binding mechanism has a big advantage; If the creator of the COM DLL decides to      release a new version, with a different function interface layout, any VBA code calling those methods      won't crash unless the methods are no longer available;   > if they are still having their old names and the same input parameters, even if the virtual pointer table is different, VBA will discover the new DISPIDs and will call their appropriate methods;

Here is what the cited article says about binding:

https://msdn.microsoft.com/en-us/library/aa155776(v=office.10).aspx Automating Microsoft Office 97 and Microsoft Office 2000-------------------------------------------------------------------------

[...] 

Creating an Automation Client with Visual Basic..

Binding

Binding describes how an Automation client internally invokes a given method or property. There are two types of binding: early binding and late binding. With Visual Basic, the type of binding you choose is determined solely by the manner in which you declare your variables.Early bindingWith early binding, your project maintains a reference to the Automation server's type library and you declare your variables as types defined by that type library. For example, if your project had a reference to the Microsoft Word type library, you could declare an object variable as shown:

 Dim oWordApp as Word.Application  

Because information from the server's type library is available at compile time, Visual Basic does not need to "look up" the property or method at run time. If the object exposes its function in its vtable, Visual Basic uses vtable binding and calls the function through the vtable. If the object does not support vtable binding, Visual Basic invokes the function using the DispId it obtained from the type library. In other words, it uses DispID binding.Late binding 

With late binding in Visual Basic, your project does not need a reference to the Automation server's type library and you can declare your variables as type Object.

 Dim oWordApp as Object  

When you declare a variable as type Object, Visual Basic cannot determine at compile time what sort of object reference the variable contains, so binding occurs at run time. When you call a property or method of an object with late binding, the process by which Visual Basic makes the call, is twofold: 

  • Using an interface pointer to the object, Visual Basic calls GetIDsOfNamesto "look up" the name of that property or method to retrieve its DispID.-and-
  • Visual Basic calls Invoke to execute the property or method using the DispID.

 

Choosing the correct type of binding for your automation client  

There are advantages to both types of binding. You should weigh the benefits of each before deciding which type of binding to choose for your Automation client.

Early binding provides you with increased performance and compile-time syntax checking. In addition, you receive direct access to the server's type library in the Visual Basic design environment and to the Help for the object model. 

Despite the benefits of early binding, late binding is most advantageous when you are writing Automation clients that you intend to be compatible with future versions of your Automation server. With late binding, information from the server's type library is not "hard-wired" into your client, so you can have greater confidence that your Automation client can work with future versions of the Automation server without code changes.

For more information about binding in Visual Basic, please see the following article in the Microsoft Knowledge Base: 

Q245115 INFO: Using Early Binding and Late Binding in Automation

[...]

 

In part two of this article I am going to demonstrate how we can create COM DLLs which allow for Late or Early Binding. I will also show you the how Automation clients fail when they encounter a DLL with an unexpected vtable interface layout and what can be done to ensure that different versions of the same COM DLLs with the same PROGID can run side by side. Thank you for reading my article! Bye :-)