Clustering MSMQ applications - rule #1

[[Updated 6th October 2008 to add KB198893]] 

Just a quick post about one of the main gotchas.

A common misunderstanding can be over which MSMQ service is doing the work. On a simple 2-node MSMQ cluster there will be three instances of MSMQ:

  • Running on the first physical node (let's call it NodeA)
  • Running on the second physical node (NodeB)
  • Running in the cluster's virtual machine (VirtualMSMQ) - this is the clustered service

All three have their own physical storage - the local disk for the nodes and the shared disk for the virtual machine - and run in their own memory space independently of each other. In fact each should be treated as a physically separate machine, even though the virtual machine is running in the same physical RAM as one of the nodes.

The problem comes when you install an application that sends or receives messages.

Lets say I make a simple console app that I want to read messages from a queue I created in the virtual machine. To make things easy for myself, I'm addressing the queue with "./myqueue" where the "." is used to indicate a local resource. I've already populated the queue with messages by sending test messages to the queue from my laptop.

When I log on to NodeA and run the application, I get an "MSMQ service not available" exception. The reason for this is that the application needs to talk to a local queue manager and so communicates with the MSMQ service running on NodeA. So that service has to be running for the application to do any MSMQ work.

Now when I start the local service my application doesn't find any messages to read from the queue on the cluster; in fact the queue isn't there at all to open in the first place. This is because the application is trying to open "./myqueue" which is not recognised by the local queue manager as that queue does not exist on the local machine.

My application needs to be "local" to the clustered MSMQ queue manager and that means running it within the virtual machine. All I need to do is add a clustered application to the resource group that contains the clustered MSMQ resource (with its IP address, disk and network name). Now when the resource comes on-line, the application can communicate with the proper queue manager and messages can be read.

There is an easy test to see if you have configured your application correctly. Stop the local node MSMQ services and run the application. If it now complains that the MSMQ service is not available then you know the application hasn't been clustered properly.

One final confusion comes over the use of services. Fundamentally these are pretty much like normal applications except they get started automatically for you on bootup. On a cluster, though, there is a BIG difference as services do not run within the virtual machine. If a clustered service is added to a resource group then all that means is that the cluster manager controls whether your clustered service is running or not - effectively it is issuing a NET STOP or a NET START as appropriate. The service itself, though, is installed and configured on the physical which means it is totally unaware of the cluster resource group that it belongs to. So any service that is trying to treat the clustered MSMQ resource as a local queue manager will fail in the same way as running an application on the local node will.

[[New section]] 

The answer is to:

  • make sure the clustered applicationservice is dependent on MSMQ (so that the queue manager is there ready and waiting)
  • check the "Use Network Name for Computer Name" tick box so that the applicationservice thinks it is running on the virtual machine and not the network node

198893 Effects of Checking "Use Network Name for Computer Name" in MSCS
https://support.microsoft.com/default.aspx?scid=kb;EN-US;198893

"Under the parameters page, make sure that the "Use Network Name for Computer Name" check box is selected. Selecting this check box has the effect of setting the environment variable _CLUSTER_NETWORK_NAME_ for the context of the application. Calls to GetHostName() and GetComputerName() return the value of the variable as opposed to the host name of computer name of the node."