So what exactly IS COM anyway?

A couple of days ago, David Candy asked (in a comment on a previous COM related post) what exactly was COM.

Mike Dimmick gave an excellent answer to the question, and I’d like to riff on his answer a bit.

COM is just one of three associated technologies: RPC, COM and OLE (really OLE Automation).

Taken in turn:

RPC, or Remote Proceedure Call, is actually the first of the “Cairo” features to debut in Windows (what, you didn’t know that there were parts of Cairo already in Windows?  Yup, actually, almost all of what was called “Cairo” is currently in windows).

RPC provides a set of services to enable inter-procedure and inter-machine procedure calls.  The RPC technology is actually an implementation of the DCE RPC specification (the DCE APIs are renamed to be more windows-like), and is on-the-wire interoperable with 3rd party DCE implementations.  RPC deals with two types of entities, client’s and servers.  The client makes requests, and the server responds to those requests.  You tell RPC about the semantics of the procedures you’re calling with an IDL file (IDL stands for “Interface Definition Language” – It defines the interface between client and server).  IDL files are turned into C files by MIDL, the “Microsoft IDL compiler”.

When RPC needs to make a call from one process to another, it “marshalls” the parameters to the function call.  Marshalling is essentially the process of flattening the data structures (using the information in the IDL file), copying the data to the destination and then unpacking the flattened data into a format that the receiver can use.

RPC provides an extraordinarily rich set of services – it’s essentially trivial to write an application that says “I want to talk to someone on my local network segment who’s providing this service, but I don’t care who they are – find out who’s offering this service and let me talk to them” and RPC will do the hard work.

The next technology, COM, is built on RPC.  COM stands for “Component Object Model”.  COM is many, many, things – it’s a design pattern, it’s a mechanism to hide implementation of functionality, it’s an inter-process communication mechanism, it’s the kitchen sink.

At it’s heart, COM’s all about a design pattern that’s based around “Interfaces”.  Just as RPC defines an interface as the contract between a client and a server, COM defines an interface as a contract between a client of a set of functionality and the implementor of that functionality.  All COM interfaces are built around a single “base” interface called IUnknown, which provides reference count semantics, and the ability to query to see if a particular object implements a specific interface.  In addition, COM provides a standardized activation pattern (CoCreateInstance) that allows the implementation of the object to be isolated from the client of the object. 

Because the implementation of the COM object is hidden from the client of the object, and the implementation may exist in another process (or on another machine in the case of DCOM), COM also defines its interfaces in an IDL file.  When the MIDL compiler is compiling an IDL file for COM, it emits some additional information including a C++ class definitions (and C surrogates for those definitions).  It will also optionally emit a typelib for the interfaces.

The typelib is essentially a partially compiled version of the information in the IDL – it contains enough information to allow someone to know how to marshall the data.  For instance, you can take the information in a typelib and generate enough information to allow managed code to interoperate with the COM object – the typelib file contains enough information for the CLR to know how to convert the unmanaged data into its managed equivilant (and vice versa).

The third technology is OLE Automation (Object Linking and Embedding Automation).  OLE Automation is an extension of COM that allows COM to be used by languages that aren’t C/C++.  Essentially OLE Automation is built around the IDispatch interface. IDispatch can be though of as “varargs.h-on-steroids” – it provides a abstraction for the process of passing parameters too and from functions, thus allowing an application to accept method semantics that are radically different from the semantics provided by the language (for instance, VB allows parameters to functions to be absent, which is not allowed for C functions – IDispatch allows a VB client to call into an object implemented in C).

Anyway that’s a REALLY brief discussion, there are MANY, MANY books written about this subject.  Mike referenced Dale Rogerson’s “Inside COM”, I’ve not read that one, but he says it’s good 🙂



Comments (29)

  1. Anonymous says:

    Larry, I’ve always wondered what the story was behind an article at this site

    . The author says that while he worked at MS, he pointed out a "fatal" COM design flaw that was basically ignored. He writes…

    "You might think, "Oh, right, big deal! It’s easy to come up with these ideas now, after OLE has been on the market for almost a decade." What if I told you that yours truly, who worked for Microsoft back then, soon after OLE 1.0 was released, had these ideas written down and sent to the responsible people. To make the long story short, the ideas were accepted as valid, but rejected on the premise that there already had been too much code written to the OLE specification (mostly at Microsoft). No manager was willing to take the risk of redesigning OLE."

    Is that topic off limits or are you perhaps the wrong person to ask?


  2. Anonymous says:

    Bartoz did work at MS, actually I used to work with him.

    I have a lot of respect for Bartoz, and his points are valid, but my simple answer is two-fold.

    The first part is: "How often does this design pattern REALLY occur?".

    The second is that he’s saying that COM reference counting isn’t important. Actually it’s critical – lifetime management is a nightmare without it – he naively says "The truth is, there is very little need for refcounting as long as you agree not to destroy the object while you are using its interfaces." This is true, but how do you know if nobody’s using it’s interfaces?

    If you call a method that takes ownership of the object, how do you know that it’s taken ownership? With refcounting, it’s irrelevant – if the method wants to take ownership, it adds a reference, if it doesn’t want to take a reference, it doesn’t add the reference. Lifetime management is clear and clean. The only alternative to this is the CLR – object lifetime isn’t owned by the object, there’s another entity that owns the object’s lifetime. But that wasn’t a viable option back in the Win16 days where most windows machines had 1M of physical RAM.

  3. Anonymous says:

    Keep these discussions of COM going please. Would love to see more MS folks blog on this. COM is one of the most complex Component technologies and any insiders information on this would be very much appreciated by some folks like me who are still in the COM world(via C++). Although .NET supercedes it, COM will still be there for couple more years to come.

  4. Anonymous says:

    Wow, I haven’t heard "Cairo" in ages. I remember going to the Santa Monica MS office in Feb 1996 where they were showing off a beta of NT 4. The guy doing the demo made us all laugh by starting the presentation with "So do you all want to see Cairo? Here!" and he opened up a web browser showing a picture of the city of Cairo. 😉

  5. Anonymous says:

    The way I was always taught (well, learned) to understand COM was this:

    COM is at heart a binary interface specification. People often ignore this, but it’s vitally important to understanding that COM isn’t black magic. Strip away all the layers of helpers and what you’re left with is a specification that describes the following, regardless of what programming language you happen to be using:

    – What you (as a client) need to know about that interface pointer you’ve just been handed – i.e., where to find the VMT [Virtual Method Table]

    – Where to find the function pointers you need in the VMT

    – The first three methods in the VMT will always be the same (because they’re the methods of IUnknown, from which every other interface inherits)

    – What those three methods will do

    – The fact that, for a given instance of an object, any two calls to QueryInterface passing the same IID will result in the same interface pointer being returned.

    (Actually, not quite, but it will result in an interface pointer that has a VMT which references the same methods each time, so the two interface pointers will behave identically, even if they aren’t identical in absolute terms).

    I’m one of the (apparently) few people that writes COM code in C, rather than C++. The reason? I can’t stand C++. Doing so, though, means I haven’t had the ATL to shield me from lots of things.

    C++ compilers on Win32 have nice tricks that save you from having to worry about VMTs – or passing the interface pointer as the first parameter to each method call; this is because this is pretty much how virtual method calls work for normal C++ classes. The extent of the ‘nice tricks’ are generally just to tell the compiler that what you’re dealing with is a COM interface, and so is laid out in a particular way in-memory, as opposed to whatever mechanism the compiler author prefers. This is important for two reasons: firstly because C++ doesn’t (as far as I’m aware – if it does now it didn’t used to) specify the in-memory layout of a class under normal circumstances – so C++ classes compiled with one compiler couldn’t necessarily be used by a program built with a different compiler; secondly, because COM is language-neutral, not only do you not have to worry about which compiler a component was built with, but you don’t have to worry about what language it was written in, either. There came a point where the realisation hit me that COM was actually elegantly simple. The usage, granted, is often very complicated; there are often lots of clever tricks employed through the usage of COM; but COM itself is actually (I think, at least) pretty straightforward, so long as you don’t lose sight of what it actually is.

  6. Anonymous says:

    I learned COM by reading "Inside COM" by Dale Rogerson. It is very much a ‘how to use & create COM objects’, rather then ‘why is COM like this’.

    Haven’t read Don Box’s "Essential COM", but from reading his "Essential .NET" book I assume Box’s COM book gives a more ‘inside’ look into COM.

    But I’m very happy that I started with reading "Inside COM" – Thanks Dale for writing it! And of course a big thank you to Larry for writing these great articles!! 🙂

  7. Anonymous says:

    So I take it that marchalling is the mapping of the component into (from the callers perspective) the callers process.

    Why would anyone program in C. Isn”t this simplier

    set x=createobjext(something)


    The ref counting, cocreateinstance, is all invisible.

  8. Anonymous says:


    Marshalling is about taking:



    int x;

    int y;

    } foo;

    and putting it into a block of memory as:


    | x |


    | y |


    passing that block across some boundary (thread, process, computer) to the other side, and converting it back to:



    int x;

    int y;

    } foo;

    If you can do that for a structure, you can do it to the parameters to a routine, or whatever else you want to pass across.

    The VB example you used above is VB syntactic sugar around:

    x = CoCreateObject(something);

    string temp;

    HRESULT hr;

    hr = x->get_name(&temp);




    Just because there are language constructs that hide the implementation details doesn’t mean that they’re not there.

    And the single method call of x->get_name() is:

    x->m_lpVtbl->get_name(x, &temp)

    and the function at the get_name offset in the vtable for x is actually the routine NdrClientCall<n> for some value of <n>, which call NdrClientCall (Ndr is "Neutral Data Representation", which is sort-of the name of the wire format).

    NdrClientCall allocates a buffer marshals the two pointer parameters (x and &temp) into that buffer.

    It then calls into RPC to transfer the request across the boundary to the receiver.

    On the receiver’s side, it unmarshals the buffer, and calls the get_name method on the server side object that corresponds to x.

    The server side get_name method allocates a buffer, and sets it’s &temp parameter to point to that buffer, and returns.

    The RPC runtime library, marshalls the string in the &temp parameter back into the buffer, frees the string, and sends the buffer back to the client, which then unmarshalls the response into it’s local address space (allocating memory to hold the &temp return value).

    And now, the caller can finally call MessageBox on the parameter.

    Here’s why it’s important. If you don’t realize that all that stuff I just wrote happens when you say "", you might be tempted to believe that calling "" is an inexpensive operation.

    And it’s not – there’s a lot of processing that’s been hidden by your language being "nice" to you and helping you make it easier.

    For some classes of application, knowledge of this hidden complexity isn’t important. But often times these "simple" calls carry a huge amount of overhead – for example, there are at LEAST two context switches and four trips through the heap manager hidden inside the simple "" verb written above. Was it obvious? Nope. What happens when you put inside the inner loop of a time critical function (say a mouse move handler)? All of a sudden, that simple procedure call has become a huge performance drag on your application.

    And that’s a part of the reason why understanding this stuff is important. I like the CLR. I think it’s amazingly cool. And it’s wonderful that for 99% of the things that are done, the CLR doesn’t get in the way of stuff working. But even with the CLR, it’s important to understand the performance characteristics of the code you write. It may be simpler, but…

    Oh, and to be CRYSTAL clear. I have no problems with doing all that work. It’s just fine, machines are really good at doing it, and they’re really fast.

  9. Anonymous says:

    (To follow up on Larry a little):

    Even with knowing what happens when you call ‘’, you might still be tempted to think ‘well, it’s not necessarily that expensive’ – after all, in-proc calls aren’t really slow; it’s just a matter of pointer indirection.

    But, of course, the call doesn’t necessarily have to be in-proc. Sure, right *now* it’s in-process, and it might well remain so for a while; what happens if you later decide to move that functionality into another process, or even make it client-server with DCOM? The beauty of COM is that you don’t have to redesign everything to make this possible – provided you’re aware of the possibilities.

    Of course, sometimes you know that certain components will only ever be in-process – that might well be a restriction you place upon them; but the important thing is that you deliberately place that restriction, rather than letting it come and bite you in the ass later on when you’re not expecting it; or finding out that the performance or usability of your application is suddenly horrible (say, for example, because your UI thread is the same thread that’s now blocking on a DCOM call over a slow network link).

  10. Anonymous says:

    Mo’s absolutely right (in both his/her comments). If you’re using COM directly, you can control this (by using the CLSCTX flags to specify that you only want an inproc server, for example), but as far as I know, you don’t have that flexability from managed code (or from VB).

  11. Anonymous says:

    I understand the out of process bit.

    So marchalling is only the data? What about COM putting itself into the calling process. Is this included under marchalling or is that something else. I remember (and this is all I remember) a problem relateted to some com object in clsid that had marchalling in it’s name.

    I have no problems reading C with API calls and converting to VB (eg FindWindow, set windowpos). But I don’t know what all these > and *something (or is it something*) mean.

    I also don’t really know what a heap is apart from memory the program uses. I’ve used GlobalLock & GlobalAlloc et al in Win 16 (from basic) to manupulate the clipboard and to work in bytes (though normally I make a fixed length string). Although if I guess right all memory is pretty much the same in Win32.

    Anyway some new MS toys. (it’s 11:38AM yesterday in Seattle now)

    Don’t tell that bloke that did the powertoy calc.

  12. Anonymous says:

    COM doesn’t "put itself" in the calling process. The calling process uses COM, and calls into COM APIs. That stuff’s not marshalling, it’s loading DLL’s into a process.

    The heap is a virtual memory manager. When you allocate memory (for operator new, or whatever), the heap is where it comes from.

    The CLR has a heap too, but it’s managed (in other words, memory is GC’ed if it’s not in use).

  13. Anonymous says:

    Why are IDL and ODL almost but not quite compatible? What are the criteria for choosing one or the other? What is the method for switching from one to the other if the programmer’s actions in VC++ happened to generate the less appropriate choice?

    (One of your colleagues couldn’t answer this one. I don’t know if that’s because ODL was too old or too new.)

  14. Anonymous says:

    David Candy wrote:

    "I have no problems reading C with API calls and converting to VB (eg FindWindow, set windowpos). But I don’t know what all these > and *something (or is it something*) mean."

    This is where most of the fun in C is ;). With > I assume you’re thinking of something like this: lpPoint->x or lpHubba->DoBubba(). The "- >" is an operator named "Member access operator" and is quite similar to the "." operator which you are familiar with. Ok, but what’s the difference? The "." op. is used to access members of classes and structs which are on your stack, and the "->" op. is used to access members of classes and structs to which you have a pointer to. What "->" really does is to dereference the pointer for you, and it is only syntactic sugar for writing (*lpPoint).x.

    And now that "*" operator appeared. What does that do? 🙂 It is used for two different things actually:

    1) To declare pointers. This is something such as:

    POINT *lpPoint;

    But before you can use lpPoint you need to allocate memory to it, for example using HeapAlloc, malloc etc.

    A pointer points to some place in memory, instead of being some value. See below how to access the value of what the pointer points to.

    2) The indirection operator. This means to take a pointer and access the value it points to. You see an example of using it above in the discussion of the "->" operator.

    Often you may see the "&" operator too. This is the opposite of the "*" operator, and is called the "address-of" operator. What it gives you is the address of some value. For example you have something like this:

    POINT myPoint; // myPoint is stored on the stack

    Now you want to send myPoint to a function which is declared like this:

    void MySuperFunction(POINT* pnt);

    What you do is to call it like this:


    It is a lot quicker to call functions by reference rather then by value (ByRef and ByVal in VB), because you only pass an address to the function instead of the contents of the struct.

  15. Anonymous says:

    Larry: His, not hers (10/10 for not making assumptions, though :))


    When create an instance of a COM object that’s handled in-process, what you get back is a table of function pointers, and those function pointers will normally refer to addresses within the area of memory used by the DLL that implements that COM object. In this situation, you can draw (loose) parallels with LoadLibrary and GetProcAddress.

    When marshalling steps into the fray, no DLL is loaded into your process’ address space, but you still need a table of function pointers. Obviously, these function pointers have to come from somewhere. What happens is that the function pointers all point at client stubs instead of the ‘real’ functions. If you ignore the speed aspect for a moment, the net result is identical as far as you’re concerned – you have an interface pointer, complete with a VMT that contains a bunch of function pointers, that you can use to invoke methods. In the marshalling case, the client stubs take the parameters in the normal fashion, and send them to the server. Now, they might be sent over an inter-process communication mechanism (IPC/LPC), or they might be sent via a remote procedure call to another host (RPC). What’s important is that the parameters are put together into a packet that can be sent and understood by the server, which can then decode the packet and actually perform the action you requested in the first place. The same encoding/decoding-type action happens for any parameters marked ‘out’ or ‘in, out’, along with the return value.

    This might all sound a bit complicated – and it can be – but the important thing to remember is that from the client application, a marshalled method call is no different (in terms of the code you write to invoke it) to an in-process call. That’s what makes COM cool.

  16. Anonymous says:

    Maybe this (classic?) diagram will help you to better understand how marshalling works:


    +——+ |Remote COM object|

    |Client| +—————–+

    +——+ /-

    | |

    -/ +————+

    +———–+ |Remote proxy|

    |Client stub| +————+

    +———–+ /-

    | |


    That’s the basic call-flow when invoking a COM object out of process, as explained by Mo above. So basicly the stub/proxy pair handles the out-of-proc complexity so the client doesn’t have to think about it. Quite the same is done in CORBA.

    Sorry for my bad ASCII-drawing skills 😉

  17. Anonymous says:

    Jikes.. the diagram didn’t look very good in the edit-box here, but it certainly ended up appearing worse after the conversion to HTML :/.

    [1] is a much better figure, which is figure 9 in [2] which is written by Kraig Brockschmidt who designed/created a lot of COM/OLE AFAIK(correct me if I’m wrong Larry!). You’ll see a lot of other figures in that article explaining things discussed here as well.



  18. Anonymous says:

    COM, IMO, is about several things:

    – Memory allocation discipline (CoTaskMemAlloc et al)

    – Object activation protocol (CoCreateInstance et al)

    – Object lifetime control (IUnknown::AddRef and Release)

    – Object interaction protocol (IUnknown::QueryInterface and interfaces)

    The rest of the stuff kind of follows from these basics. The memory allocation protocol arguably is only there to enable marshaling but nonetheless establishing a standard for lifetime management of non-objects is pretty important.

    Interfaces, being long-lived binary API contracts are pretty darned important and useful. I’m constantly amazed at how people seem to have forgotten the problems with making non-virtual constructs part of the long-term contract for an object.

    You can debate the relative goodness of reference counting vs. garbage collection. In my book, determinism of lifetime beats faster allocations hands down. But then maybe I’m becoming a dinosaur. The stupid thing was forcing a virtual function call for every modification of the refcount…

    The activation is arguably the most important part of the definition at a systems level. The fact that the metadata to determine how and where to activate an object is separate from the calling code is probably the greatest genius of COM.

    It’s unfortunate that a lot of issues came to light during/after development but hey that’s the reality of product development.

    Object-based marshaling, which is very cool, is really a MS-only innovation over DCE RPC that as Larry mentions was done as part of Cairo. (I’m not sure that’s true; when joining MSFT in ’94 the incipient release of DCOM was heralded as the great enabler of truly distributed systems and Cairo was still incubating furiously at the time; it’s just suprising for a technology that’s incubating to spin off an important subpiece and actually release it… it would be like if Avalon or WinFS shipped before LH. But that’s also where Nile a/k/a OLE/DB came from… ah for the old days when something as simple as the next set of APIs were going to solve everyone’s problems…)

    In simple terms, COM = (OLE/2 – all the document/in-place-activation stuff).

  19. Anonymous says:

    Cairo was a set of technologies announced at the first PDC back in 1991 by Jim Allchin.

    As mentioned in this article:

    there were essentially 5 pieces to Cairo:

    1) DCE RPC

    2) x.500 Directory

    3) x.400 Messaging

    4) Content Indexing/Object filesystem.

    The NT networking team picked up the RPC component for NT 3.1, the directory was delivered in Win2000, the X.400 messaging system was delivered in Exchange, te content indexing was delivered in Index server.

    The only significant technology announced at the PDC that’s NOT been delivered was the indexible filesystem.

  20. Anonymous says:

    One thing that has always confused me with COM is the STA Model which is implemented using a hidden window to provide Single Threaded access to the COM object. Can anyone throw some light on what actually goes on underneath this particular model? The COM runtime has a lot of quirks built into it hidden from the programmer and because of this sometimes it is possible to shoot yourself in the foot if you dont properly understand the apartment concepts and use multhi-threading in your program.

  21. Anonymous says:

    As I wrote here:

    Threading models for COM exist to protect COM components that weren’t designed for multi-threading access. For example, since VB doesn’t have any concept of threads, it’s highly likely it’ll mess up royally if a COM component authored in VB is called from multiple threads.

    So the apartment model exists to ensure that those COM components don’t break royally when dropped into a multithreaded application.

    The hidden window is used for marshalling – just like RPC marshals parameters across a process boundary, COM marshals parameters across a thread boundary to ensure that only one thread calls into the COM component.

  22. Anonymous says:

    This is why it’s important to never:

    1. block execution (e.g. synchronous I/O)

    2. drop window messages

    on a STA thread.

    The whole STA/MTA thing is rather unfortunate. It works as designed but not as most people expect. STAs were designed to make object authors’ lives easier by giving them a simple concurrency model but you actually have to be a much better programmer to do things right on a STA thread.

    But maybe this is indicative of the fact that building highly responsive UI is much harder than most people budget for anyways.

  23. Anonymous says:

    I suspect this is one of the reasons why the BeOS folks designed all their stuff such that every window would have a separate dedicated UI thread; though on BeOS, threads are cheap.

  24. Anonymous says:

    I should clarify; BeOS doesn’t (as far as I know) use COM, but the problem of UI-blocking is one that’s plagued programmers on a whole host of different platforms for years 🙂