Clustering MSMQ applications – rule #1


[[Updated 6th October 2008 to add KB198893]] 


Just a quick post about one of the main gotchas.


A common misunderstanding can be over which MSMQ service is doing the work. On a simple 2-node MSMQ cluster there will be three instances of MSMQ:



  • Running on the first physical node (let’s call it NodeA)

  • Running on the second physical node (NodeB)

  • Running in the cluster’s virtual machine (VirtualMSMQ) – this is the clustered service

All three have their own physical storage – the local disk for the nodes and the shared disk for the virtual machine – and run in their own memory space independently of each other. In fact each should be treated as a physically separate machine, even though the virtual machine is running in the same physical RAM as one of the nodes.


The problem comes when you install an application that sends or receives messages.


Lets say I make a simple console app that I want to read messages from a queue I created in the virtual machine. To make things easy for myself, I’m addressing the queue with “./myqueue” where the “.” is used to indicate a local resource. I’ve already populated the queue with messages by sending test messages to the queue from my laptop.


When I log on to NodeA and run the application, I get an “MSMQ service not available” exception. The reason for this is that the application needs to talk to a local queue manager and so communicates with the MSMQ service running on NodeA. So that service has to be running for the application to do any MSMQ work.


Now when I start the local service my application doesn’t find any messages to read from the queue on the cluster; in fact the queue isn’t there at all to open in the first place. This is because the application is trying to open “./myqueue” which is not recognised by the local queue manager as that queue does not exist on the local machine.


My application needs to be “local” to the clustered MSMQ queue manager and that means running it within the virtual machine. All I need to do is add a clustered application to the resource group that contains the clustered MSMQ resource (with its IP address, disk and network name). Now when the resource comes on-line, the application can communicate with the proper queue manager and messages can be read.


There is an easy test to see if you have configured your application correctly. Stop the local node MSMQ services and run the application. If it now complains that the MSMQ service is not available then you know the application hasn’t been clustered properly.


One final confusion comes over the use of services. Fundamentally these are pretty much like normal applications except they get started automatically for you on bootup. On a cluster, though, there is a BIG difference as services do not run within the virtual machine. If a clustered service is added to a resource group then all that means is that the cluster manager controls whether your clustered service is running or not – effectively it is issuing a NET STOP or a NET START as appropriate. The service itself, though, is installed and configured on the physical which means it is totally unaware of the cluster resource group that it belongs to. So any service that is trying to treat the clustered MSMQ resource as a local queue manager will fail in the same way as running an application on the local node will.


[[New section]] 


The answer is to:



  • make sure the clustered applicationservice is dependent on MSMQ (so that the queue manager is there ready and waiting)

  • check the “Use Network Name for Computer Name” tick box so that the applicationservice thinks it is running on the virtual machine and not the network node


198893 Effects of Checking “Use Network Name for Computer Name” in MSCS
http://support.microsoft.com/default.aspx?scid=kb;EN-US;198893

“Under the parameters page, make sure that the “Use Network Name for Computer Name” check box is selected. Selecting this check box has the effect of setting the environment variable _CLUSTER_NETWORK_NAME_ for the context of the application. Calls to GetHostName() and GetComputerName() return the value of the variable as opposed to the host name of computer name of the node.”

Comments (14)

  1. Dean Harding says:

    Presumably you could also have your service reference the clustered queue, rather than just ".myqueue"

    I would imagine that the name of the queue your service uses is a configuration parameter, rather than a hard-coded thing, and so changing that would be easier than re-coding the service as an application…

  2. MSDN Archive says:

    Hi Dean, no, it’s not the name of the queue that’s the real problem – it’s which queue manager that you are using. Maybe the example I gave was too simplistic and I should have gone into the issues more.

    If you cluster MSMQ it’s because you want high availability. This means nothing must be tied to the local node as that will be lost when the node goes off-line for whatever reason.

    Looking at a clustered service that uses MSMQ, this can only create messages in local storage. For example, if the service sends a message it will be created in an outgoing queue on NodeA (or NodeB). If the node dies before the message is sent then the message is effectively lost. On the other hand, an outgoing queue for the clustered MSMQ service is available as soon as the resource group fails over to the next node.

    You could change the queue named referenced by the service (maybe it’s hard-coded, maybe it’s in a registry value; up to the developer) so that the clustered queues can be accessed but this poses new problems.

    Take performing "transactional receives" – before MSMQ 4.0 (Vista/2008) this could only be done locally so a clustered service would fail when reading from a clustered MSMQ queue while a clustered application would succeed.

  3. Dean Harding says:

    Ah, that’s a good point about transactional receives. I guess that’s why you’re the expert 🙂

  4. John – I have a similar query you may be able to assist with.

    I have clustered nodes A and B, and IIS is running on both of these nodes too. I’m exposing a web service from IIS, and want to post a message to the clustered queue from within this webservice.

    Do I need to have the MSMQ service running locally on these nodes in this instance?

  5. MSDN Archive says:

    Hi Morgan,

    I found the following KnowledgeBase article has the answer:

    820985 You cannot access Message Queuing from a clustered instance of IIS

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;820985

    There are some important limitations to take note of in the "Workaround" section.

  6. Perry says:

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;820985 is interesting in stating that you cannot connect to MSMQ from clustered IIS, but what it does not make clear is if it is possible to connect to a clustered msmq queue from a web service on a non clustered installation of IIS.

    We certainly can’t seem to get that to work either.

    I would appreciate any tips on how to get it to work.

    Is it ever possible to have a clustered web service that is able to process a clustered msmq?

  7. MSDN Archive says:

    Hi,

    The wording of the KB article is important here – it says you can’t access "Message Queuing". That is, the MSMQ service that an application must interact with to perform any messaging activity. The workaround in the KB is to make sure the local node’s MSMQ service is running so that a queue manager is available for IIS to use.

    This is all distinct from accessing an MSMQ *Queue*. If your IIS ASP page can access the local queue manager then there are a couple of problems you may face sending or receiving messages:

    1) Make sure you open the queue using it’s full name (networknamequeue or networknameprivate$queue) rather than using the local shortcut (.queue or .private$queue).

    2) Check the permissions on the queue and make sure the required accounts have access. An easy test that you have a permissions problem is if it all works when you give Everyone+AnonymousLogon access to the queue.

    What errors are you seeing?

    Cheers

    John Breakwell

  8. Perry says:

    Hi

    We have NODEA, NODEB and CLUSTERAB

    MSMQ is running on NODEA, NODEB and CLUSTERAB. All services are started.

    MSMQ on NODEA there is an existing private queue QUEUEA with fullcontrol given to Everyone and Anonymous

    MSMQ on CLUSTERAB there is an existing private queue QUEUEB with fullcontrol given to Everyone and Anonymous

    IIS is running on NODEA. A webservice is running on that IIS.

    If the webservice is configured to use .private$QUEUEA it fails to process but we cannot see why.

    If the webservice is configured to use NODEAprivate$QUEUEA it fails to process but we cannot see why.

    If the webservice is configured to use CLUSTERABprivate$QUEUEB it fails to process and returns error "Invalid Queue Path Name"

    If we let the webservice try to create a queue….

    If the webservice is configured to use .private$newqueue1 the queue is created and messages are queued correctly but I cannot see the queue messages in the MMC.

    If the webservice is configured to use NODEAprivate$newqueue2 the queue is created and messages are queued correctly but I cannot see the queue messages in the MMC.

    If the webservice is configured to use CLUSTERABprivate$newqueue3 it fails to process and returns error "Invalid Queue Path Name"

  9. MSDN Archive says:

    Hi Perry,

    Just a quick sanity check. Do you have any problems accessing NODEAprivate$QUEUEA or CLUSTERABprivate$QUEUEB from a remote machine outside the cluster?

    Also, is the webservice sending or receiving messages? The mechanisms and troubleshooting are different between the two.

    When the webservice fails to process or generates the error message "Invalid Queue Path Name", at what point does it fail in your code? Which method call are you making at the time? Knowing the exact line of code will be important here.

    When the webservice creates a queue with ".private$newqueue1", where does this queue appear? (NODEA, NODEB or CLUSTERAB?).

    When you say the messages are queued correctly but canot be seen, how do you know they are queued correctly? Did you get a success return code on sending? Ensure the logged-in user actually has permissions to view the queue contents.

    Cheers

    John Breakwell (MSFT)

  10. Perry says:

    Hi John

    I do not have any remote machines that are in the same firewall zone to try that out.

    The web service is sending messages, we are using a windows service to receive them.

    When the webservice creates a queue with ".private$newqueue1" it ends up on NODEA

    Regarding messages being queued directly – we know that is happening as the message count it incrementing in MMC – I just can see the messages themselves. This is a permissions issue, but we cannot see EXACTLY what user the queue is being created by. We have tried the NODEAASPNET, NODEANETWORK SERVICE, NODEALOCAL SERVICE, and NODEAIUSR_NODEA but done of them work either. Even giving anonymous and everyone full control does not allow us to see the messages.

    As we have tried in vain for so long to get this working without success in a clustered environment, we are going old school and writing our own interfaces to replace MSMQ.

    We managed to get a custom clustered windows service to interact with a clustered MSMQ service, but then ran into the issue of clustered IIS not interacting with message queuing. I had no option to use NLB in the architecture I have, so as a compromise attempted to just get a local IIS web service to talk to the clustered MSMQ but could not get that to work, even with the local MSMQ service running. (Can a local MSMQ service interact with a clustered MSMQ queue anyway?)

    Whatever route we take we cannot see how we can get a clustered web service to interact with a clustered queue, so with too many resilience compromises in play, we’ve decided to write our own system.

    I think it would be really useful to readers of this blog to state definatively if it possible for a clustered web service running under IIS (with the compromise of using an NLB rather than MSCS) to interact directly with a clustered MSMQ? If so a KB article on how to set it up would be most useful.

    Maybe using SQL Server Broker on a clustered SQL Server as an alternative to clustered MSMQ is the way to go – but thats for another day!

    Thanks

    Perry

  11. MSDN Archive says:

    Hi Perry,

    There is nothing magical about how MSMQ behaves on a cluster.

    You’ve shown with ".private$newqueue1" that the webservice is effectively running on machine NODEA.

    The clustered MSMQ resource is running on CLUSTERAB.

    In your scenario it should behave just the same as having two standalone machines that happen to be called NODEA and CLUSTERAB.

    If giving anonymous and everyone full control to a queue does not allow you to see the messages then you do not have a permissions issue as you have effectively switched off queue-level security.

    If the web service is sending messages then the queue manager on NODEA will be trying to establish a network connection to CLUSTERAB to deliver them over.

    Computer Management can be used to check the status of the outgoing queue.

    Also, the next hop shows the IP address of the destination, which in this case should be the one associated with the CLUSTERAB network name.

    Problems I would look into are:

    For sending messages

    ——————–

    1 Check the next hop IP address is valid.

    2 If using path names, ensure CLUSTERAB appears as a domain member in "Active Directory: Users and Computers" with an MSMQ child object.

    For looking in the queues

    ————————-

    This is an RPC operation and so different from sending messages, which uses the MSMQ protocol.

    I expect you are getting an "Access Denied" situation so could be a "Secured Remote Read" issue (http://technet.microsoft.com/en-us/library/cc759412(WS.10).aspx) if all the machines are not domain members with MSMQ installed as AD-Integrated.

    Cheers

    John Breakwell (MSFT)

  12. Jonathan says:

    John,

    The company I work for has a series of custom applications which use MSMQ to do such things as gather statistics from other servers, as well as do logging.  On all servers but our Database servers, MSMQ has no problem.  It will either be Connected (having been actively sending messages recently), or Inactive (being in a period of inactivity recently).  The Database servers are in a state of "Waiting to connect" perpetually for it’s Outgoing Queues, and always seem to have a certain number of messages in the queues.  All servers are running Windows Server 2003 SP2.

    Our setup has two sites, and at each site is two Database servers.  Each pair of servers is clustered together, and at any one time, only one of the servers in the pair is active.  There are a set of shared drives between the two of them, and every couple of weeks we "roll" the clusters from one server to the other.  At that point, the Database server we just switched to takes over all of the services for the site.  A virtual IP address will always point to the current Database server.

    Our concern is why our queues are always in the "Waiting to connect" state.  A sample queue configuration is as follows:

    DIRECT=TCP:10.X.X.Xprivate$queuename

    This format works for all non-Database servers, but does not for the clustered servers.  The receiving server has the exact Private Queues configured and all others that connect to the same Queue have no difficulties in connecting.  This perplexes us.  Additionally, if you inspect the messages that are seemingly stuck in the Outgoing Queues for these clustered servers, the dates and times for them will always be from a point yesterday up to the moment you last refreshed the queue in the window.  The number of queued messages is not consistant between servers, either, with some having thousands of messages just sitting in the queue not doing anything in particular.

    I did not have a hand in creating these queues.  They have been in place for several years by my estimates, and the documentation on them when they were made is severely slim.  Unfortunately my own experience in handling MSMQ stretches only as far as monitoring them and recognizing when there is potentially a problem.

    Any help is appreciated.

  13. MSDN Archive says:

    Hi Jonathan,

    I am guessing that the messages are being sent out on the private network between the two cluster nodes. A network trace would show this. In the event log you should see a message when MSMQ starts up that details which IP address the (local node) service is going to use. You may want to set the MulticastBindIP registry value.

    Also, the messages are no older than yesterday probably because the developer set a TimeToLive of 24 hours on them. Your application is continually creating new messages but, as they don’t get delivered, MSMQ deletes the old ones as they expire.

    Cheers

    John Breakwell (MSFT)

  14. Jonathan says:

    John,

    We finally got around to implementing MulticastBindIP a few weeks ago, but hit another speedbump after rolling the cluster node.  We did some research and learned of ClusterBindIP, which we set to the Shared Virtual IP Address for the cluster, and performed a test roll and roll back.  Our message queues have continued to function for over a week now thanks to this change.

    Thank you for your help.  Had you not mentioned setting MulticastBindIP, we may have never found ClusterBindIP.

Skip to main content