Building a Pub/Sub Message Bus with WCF and MSMQ

In recent years there has been a lot of talk about event-driven architecture as a technique to build more scalable and maintainable systems. I've found this to be a very interesting pattern that makes sense in a number of scenarios, but it's never been very well supported on the Microsoft platform, and many who have attempted it have found it painful. A number of years ago I worked on a system using a pub/sub message bus built on .NET Remoting, MSMQ and HTTP, and it wasn't at all pretty. Everything was difficult and required custom code, from hosting the queue listeners, encoding and decoding messages, dealing with reliability and managing subscriptions.

So it was with some apprehension that I made another attempt to adopt this pattern in my current project. However a lot has changed in the last few years, and I'm pleased to say that my experience was many, many times better than the one I'd been through all those years ago. Before I get on to the solution, I want to make clear that I'm describing just one approach to implementing this pattern, and there are other approaches that may be more appropriate for applications with different requirements. Specifically the application I'm working on is a largely green-field .NET application, so interoperability across platforms was not a consideration (lucky me!).

The solution we ended up with was built with .NET Framework 3.0 and makes extensive use of Windows Communication Foundation (WCF), Microsoft Message Queuing (MSMQ) 4.0 and Internet Information Services (IIS) 7.0, all hosted on Windows Server 2008. Here's what we did.

Defining the Service Contract

The first step was to define the contracts which the publisher would use to notify any subscribers that an interesting event occurred. In our case we had a number of different types of events, but in order to reuse as much code as possible we used a generic service contract:

 [ServiceContract]
public interface IEventNotification<TLog>
{
    [OperationContract(IsOneWay = true)]
    void OnEventOccurred(TLog value);
}    

Now for any given event type, we can simply define a data contract to carry the payload (not shown here), and provide a derived service contract type as shown below:

 [ServiceContract]
public interface IAccountEventNotification : IEventNotification<AccountEventLog>
{
}

Implementing the Publisher

One of the key aspects of a publisher/subscriber pattern is that there should be ultra-loose coupling between the publisher and the subscriber. Critically, the publisher should not know anything about the subscribers, including how many there are or where they live. Originally we tried using MSMQ's PGM multicasting feature to accomplish this - essentially this lets you define a single queue address that will stealthily route the same message to multiple destination queues. While this feature does work, it had a couple of limitations that made it inappropriate in our scenario. First, the only way to use multicast queue addressing with WCF is to use the MsmqIntegrationBinding, which is less flexible than the NetMsmqBinding. Second, multicast addressing only works with non-transactional queues, which would have had an unacceptable impact of the reliability of our system.

So we abandoned this option and decided to implement our own lightweight multicasting directly within the publisher code. While technically this breaches the golden rule of the publisher knowing nothing about the subscribers, the information about the subscribers is completely contained in a configuration file. This means we can add, change or remove subscribers before or after deployment with no impact on the application code.

We had already built a component we called the ServiceFactory (no relation to the p&p Web Service Software Factory) which is a simple abstraction for creating local or WCF instances via a configuration lookup. This component isn't publicly available, but you could easily substitute your favourite Dependency Injection framework and achieve similar results. In our case, the web.config for one of our web services may have its dependent services defined as follows:

<serviceFactory>
    <services>
<add name="EmailUtility" contract="MyProject.IEmailUtility, MyProject" type="MyProject.EmailUtility, MyProject" mode="SameAppDomain" instanceMode="Singleton" enablePolicyInjection="false" />

        <add name="SubsctiberXAccountEventNotification" contract="MyProject.Contracts.IAccountEventNotification, MyProject.Contracts" mode="Wcf" endpoint="SubsctiberXAccountEventNotification" />
<add name="SubsctiberYAccountEventNotification" contract="MyProject.Contracts.IAccountEventNotification, MyProject.Contracts" mode="Wcf" endpoint="SubsctiberYAccountEventNotification" />
    </services>
</serviceFactory>

Previously we had used the ServiceFactory for creating individual instances, with code like this:

 IEmailUtility email = ServiceFactory.GetService<IEmailUtility>();

 

As you can see from the configuration above, this would result in a singleton instance of a local class called EmailUtility being returned, but different configuration could result in a WCF proxy being returned instead. It was a simple matter to reuse this same ServiceFactory component to return all configured services matching a specific contract. We used this capability to build the NotificationPublisher class as follows:

 public class NotificationPublisher<TInterface, TLog>
    where TInterface : class, IEventNotification<TLog>                    
{
    public static void OnEventOccurred(TLog value)
    {
        List<TInterface> subscribers = ServiceFactory.GetAllServices<TInterface>();

        foreach (TInterface subscriber in subscribers)
        {
            subscriber.OnEventOccurred(value);
        }
    }
}

With this class in place, all that is required for the publisher to publish event is to instantiate a NotificationPublisher with the appropriate generic parameters and call the OnEventOccurred method. Assuming we are using the IAccountEventNotification interface and the above configuration, this would result in the event being fired over WCF to the services defined by the SubscriberXAccountEventNotification and SubscriberYAccountEventNotification endpoints.

Configuring the Publisher

The final missing piece on the publisher side is the WCF configuration. As mentioned previously, we chose to use MSMQ to provide reliable, asynchronous message delivery. Programming with MSMQ used to be quite a painful experience, but with WCF the programming model is no different than for any other transport - all you need to do is configure the right bindings. In our case we chose the NetMsmqBinding, which provides full access to WCF functionality for core MSMQ features (as opposed to the MsmqIntegrationBinding, which provides richer MSMQ support at the cost of more limited WCF functionality).

Here's an example of the client-side WCF configuration.

<system.serviceModel>

    <bindings>
        <netMsmqBinding>
            <binding name="TransactionalMsmqBinding" exactlyOnce="true" deadLetterQueue="System" />
        </netMsmqBinding>
    </bindings>

    <client>
        <endpoint name="SubscriberXAccountEventNotification"
            address="net.msmq://localhost/private/SubscriberX/accounteventnotification.svc"
            binding="netMsmqBinding" bindingConfiguration="TransactionalMsmqBinding"
            contract="MyProject.Contracts.IAccountEventNotification" />

<endpoint name="SubscriberYAccountEventNotification"
            address="net.msmq://localhost/private/SubscriberY/accounteventnotification.svc"
            binding="netMsmqBinding" bindingConfiguration="TransactionalMsmqBinding"
            contract="MyProject.Contracts.IAccountEventNotification" />
      </client>
</system.serviceModel>

There's nothing too fancy in this - the key thing to note is the exactlyOnce="true" setting which is required for transactional queues. The other thing that my stand out is the unusual net.msmq:// addressing syntax, which is used by the NetMsmqBinding in lieu of the more familiar FormatName addresses. The queues themselves are private queues called "SubscriberX/accounteventnotification.svc" and "SubscriberY/accounteventnotification.svc". Why did I give the queues such silly names? Read on...

Hosting and Configuring the Subscribers

In the past, if building MSMQ clients was annoying, building MSMQ services was a nightmare. You had to build your own host (typically in an NT Service) or make use of the somewhat inflexible MSMQ Triggers functionality. You then had to do a whole lot of work to ensure your service didn't lose messages, and that it wasn't killed by "poison messages", which are messages that will constantly cause your service to fail due to a malformed payload or problems with the service.

Just like on the client side, WCF takes a lot of the hard work away on the service side - but it doesn't directly help with hosting the service and listening to the queue. Luckily this problem is solved beautifully by IIS 7.0 and Windows Activation Services (WAS), which is available on Windows Vista and Windows Server 2008. In a nutshell this enables IIS to listen to MSMQ, TCP and Named Pipes and activate your WCF service, just as IIS 6.0 does for HTTP. If this all sounds great, it is - but be warned that it can be a bit fiddly to set up.

First, you need to set up an "application" in IIS that points to your service, including the .svc file and the web.config file. This is no different to what you'd normally do for an IIS-hosted service exposed over HTTP.

Next, you need to create the message queue - you can do this with the Computer Management console in Vista or Server Manager in Windows Server 2008. The name of the queue must match the application name plus the .svc file name, for example "SubscriberX/accounteventnotification.svc" (this fact is unfortunately not well documented). Make sure you mark the queue as transactional when you create it, as you can't change this later. You'll also need to set permissions on the queue so that the account running the "Net.Msmq Listener" service (NETWORK SERVICE by default) can receive messages, and whatever account is running the client/publisher can send messages (NETWORK SERVICE by default, too).

Finally you'll need to configure IIS and WAS to enable the Net.Msmq listener for the web site, and for the specific application (make sure you've installed the Windows components for WAS and non-HTTP activation before you proceed!). The easiest way to do this is using appcmd.exe which lives in the \System32\InetSrv folder:

  • appcmd set site "Default Web Site" -+bindings.[protocol='net.msmq',bindingInformation='localhost']
  • appcmd set app "Default Web Site/SubscriberX" /enabledProtocols:net.msmq

With the IIS configuration in place, it's time to make sure the service's WCF configuration is correct. As you might expect, this looks pretty similar to the client configuration you saw earlier.

<system.serviceModel>
    <bindings>
        <netMsmqBinding>
            <binding name="TransactionalMsmqBinding" exactlyOnce="true" deadLetterQueue="System" receiveErrorHandling="Move"/>
        </netMsmqBinding>
    </bindings>

    <services>
        <service name="SubscriberX.NotificationService">
            <endpoint contract="MyProject.Contracts.IAccountEventNotification"
                bindingConfiguration="TransactionalMsmqBinding"
                binding="netMsmqBinding"
                address="net.msmq://localhost/private/SubscriberX/accounteventnotification.svc"/>
        </service>
    </services>
</system.serviceModel>

One thing worth calling out here is the receiveErrorHandling="Move" . This innocent-looking attribute probably saved us a month of work, as it tells WCF to move any messages that have repeatedly failed to be processed onto an MSMQ subqueue called "poison" and continue processing the next message, rather than the faulting the service. Note that subqueues, as well as the long-awaited ability to transactionally read from a remote queue, are some more new features in MSMQ 4.0 in Vista and Windows Server 2008.

Implementing the Subscribers

The only thing remaining is to implement the subscriber. Most of the code will of course be specific to the business requirements, so I'll only spend time describing the implementation of the service interface. In our system it is very important to make sure that no messages are accidentally lost. Since MSMQ can provide guaranteed delivery it may not be obvious how a message could just vanish. In fact most messages are lost after MSMQ has successfully delivered the message to the service. This can happen if the service receives the message and then fails before the message is successfully processed (possibly due to a bug or configuration problem). The best way of avoiding this problem is to use a transaction that spans receiving the message from the queue and any processing business logic. If anything fails, the transaction will be rolled back - including receiving the message from the queue! If the problem was a temporary glitch, the message may be successfully processed again. If the problem is permanent or caused by a malformed message, the message will be considered to be "poison" after several retries, and as mentioned earlier will be moved to a special "posion" subqueue where it can be dealt with manually by an administrator.

Making all of this work is surprisingly simple, since all of these capabilities are supported by MSMQ (provided you're using transactional queues) and WCF. All that you need to do is decorate your service implementation methods with a couple of attributes that state that your business logic should enlist in the transaction started when the message was pulled off the queue.

 public class NotificationService : IAccountEventNotification
{
    [OperationBehavior(TransactionScopeRequired = true, TransactionAutoComplete = true)]
    public void OnEventOccurred(AccountEventLog value)
    {
        // Business-specific logic
    }
}

Conclusion

While this has been one of the longer blog posts I've done in a while, the solution is extremely powerful and surprisingly simple to implement thanks to some great advances in WCF, MSMQ and IIS. In the past, many people (including myself) have spent months trying to implement pub/sub patterns, often with less-than-spectacular results. But using these new technologies eliminates huge amounts of custom code - in fact the few code and configuration snippets in this post are really all that it takes to make this work.