Oil and water - MSMQ transactional messages and load balancing

A really common request is to load-balance MSMQ, either to "scale out" (adding more machines, as opposed to "scale up" by enhancing the existing server) or for high availability.

For load-balancing to work, the traffic must be stateless as there is no guarantee that a sending machine will be using the same balancer each time.

For express messages there isn't a problem as these are stateless - "fire and forget".

Transactional messages, though, are not going to be happy with load balancing systems. The message transfer protocol for transactional messaging needs state consistency between the sender and the receiver (state is maintained at both ends) and hence it becomes tricky to achieve exactly-once-in-order delivery guarantee in a stateless load-balancing farm or a configuration that involves a network address translation in the middle. Also, transactional messaging requires messages to flow in both directions so ports need to be open in the outward direction as well.

The most common symptom is that storage acknowledgements returning to the sender are sent to the wrong machine. This is because the acknowledgement message is sent to the IP address contained in the message it is acknowledging. Unfortunately that is the IP address of the load balancing machine. The result is that the load balancer receives the acknowledgement but has no way of working out which machine to forward it on to. The wrong machine will simply discard the acknowledgement. Without the acknowledgment, the sending machine cannot confirm message delivery so messages just accumulate in the outgoing queue.

The subject is discussed in more detail here:

899611 How Message Queuing can function over Network Load Balancing (NLB).

There are a couple of workarounds (including one too new for inclusion in the KB):

[[Thanks for Sanjib Saha for ideas]]