You Can't Triple Stamp a Double Stamp

In the Exchange 2007 SDK, we have an antivirus sample application which really exists to illustrate how to build a Transport Routing Agent and handle events. A few customers had difficulties with this sample, so I wanted to go over how to get is set up and working. At the end of this article you'll find a link to an updated sample which should be much easier to play with.

This sample consists of two projects. The first, AntivirusAgent, is a .Net based DLL that will be installed as the actual transport agent. The second, AntivirusService, is a C++ based EXE which functions as an out of proc COM service to do the real work. The split of work in the sample is arbitrary. It didn't have to use an out of proc COM service. In fact, things would have been much simpler if it didn't. :) Getting the .Net code to speak to the COM service, which has nothing to do with demonstrating how to build a Transport Agent, turned out to be quite tricky to get working.

The first problem I saw with AntivirusService is that it dynamically links to the C++ runtime. This is fine for production code, where you will be deploying the runtime as appropriate with your installer, but for samples it's simpler to statically link. So that's the first thing I changed in the sample.

The next thing I noticed is that the AntivirusService builds for both the Win32 platform and the x64 platform, but only the Win32 platform has a post build event to build the ComInterop.dll that is used for the interprocess communication. So if you were building for and testing with a 64 bit server, you would either not have a ComInterop.dll at all, or you'd have one but it would be 32 bit. So the second thing I changed was to add a post build step for the 64 bit build.

The third thing I changed was to clean up the install/uninstall powershell scripts. They assumed the sample would be built on the machine it would be tested on, a bad assumption, and they also failed to register the COM service correctly. Cleaning these scripts up a bit meant fewer manual steps as I built and rebuilt the project.

The fourth change I made was to the way the sample updated headers. The sample was using GetMimeReadStream and GetMimeWriteStream to read the MIME message and rewrite it. However, GetMimeReadStream apparently doesn't always return the MIME stream of the full message. Maybe that's a bug in GetMimeReadStream - maybe not - I didn't investigate as that wasn't really the point of the sample. So I commented out the code that wrote the stream back and instead used the HeaderList class to add a header.

With these changes, I had a functioning sample that could communicate to an out of proc service and write headers back to the mail. It worked (for me) on both 32 and 64 bit Exchange 2007 servers. So I sent the updated sample off to some customers and awaited their kudos. Unfortunately, none of them could get the updated sample to work either. Every time AntivirusAgent tried to call the BeginVirusScan function, which was implemented by AntivirusService, it would get a 0xC0000005 access violation.

What went wrong? After much back and forth, we finally isolated the problem to the .tlb they were generating from the .idl files included in the sample. ComInterop.idl contained this line:

 HRESULT _stdcall BeginVirusScan([in] IComCallback* callback);

But when we use OleView to examine the .tlb being generated by midl.exe, we saw this:

 HRESULT _stdcall BeginVirusScan([in] IComCallback** callback);

Note the double star. As Harry said to Lloyd, you can't double star a single star. At least, that's how I remember the line. Anyway, when this .tlb is used to marshal calls from one process to another, this extra level of indirection will result in an AV.

It turns out midl.exe has a problem importing an .idl that looks like this:

 library MyInterfaces
{
    interface IMyOtherInterface;

    interface IMyInterface : IUnknown {
        HRESULT _stdcall DoSomething1([in] IMyOtherInterface* lpParam);
    };

    interface IMyOtherInterface : IUnknown {
        HRESULT _stdcall DoSomething2();
    };
};

With both Visual Studio 2005 and Visual Studio 2008, if you import that .idl file into anther, then use midl to build a .tlb file, the resulting .tlb (as viewed with OleView) will sometimes have an extra star on the definition of DoSomething1. I've passed this bug along to the Visual Studio team and hope to hear back from them soon. Meanwhile, fixing our sample not to hit this problem is a simple matter of changing the order of the interface definitions.

Overall - getting the sample to work only required a few targeted changes. For those who would like to play with the sample but don't want to have to fix it first, I've put all my changes up for download: https://stephengriffin.members.winisp.net/AntiVirus/antivirus.zip

Enjoy!

[Edit] BTW - I want to give credit where credit is due - Stan Miasnikov at PhatWare.com is the one who tracked the MIDL problem down to the forward declaration. I had to make sure I had his OK before mentioning him here.