How to help outgoing queues scale in MSMQ

This blog is written for those systems where a central MSMQ server sends messages to a large number of MSMQ clients. "Large" here could mean hundreds, maybe thousands.

MSMQ is not designed to be a real-time system - although you can use it as such, that wasn't the main focus. Instead MSMQ will ensure messages get delivered, no matter how long it takes. This means that you can send messages to thousands of destinations but not necessarily instantly or concurrently.

To understand why this is, we need to look at the underlying mechanism.

The MSMQ service has a pool of threads that it uses to make network connections to remote clients and deliver messages. This pool is not dynamic to avoid exhausting system resources and is instead related to the number of processors in the server. So, if you have a larger number of outgoing queues than you do threads, some destinations will just have to wait until it is their turn - and the wait can be significant.

There are two types of client for this discussion - off-line and on-line. If the destination is off-line then MSMQ has to spend a while determining this before giving up and moving on to the next destination. The situation is actually worse if the destination is on-line as, although the messages for that client can be delivered, the connection will need to be idle for at least two minutes before MSMQ will finish with it. 

The MSMQ FAQ has an entry to discuss this:

18.2 - Any scale considerations for the messages delivery mechanism?
Yes. Scale need to be considered. The Message Queuing service has a pool of worker threads that do most of queuing and delivery work. All threads listen on a common I/O completion port. When an I/O operation is ready, the operating system wakes up a worker thread, which processes the result of the I/O, issues another I/O request, and then waits again on the completion port. Responsiveness of the Message Queuing service will reduce if all worker threads are blocked and none is available. The number of worker threads can be controlled via the QMThreadNo registry value.
For example, the connect() API blocks for 20 seconds (by default) when the remote side does not reply. If the local computer tries to send to too many remote computers that are not available on the network, most worker threads will be blocked, waiting for connect() to fail. This scenario is partially solved by Message Queuing, by always leaving one worker thread available for a non-blocking operation. Either the application or administrator can help with this scenario, by pausing an outgoing queue if it is known that remote side will not be available for a long time.

So the areas for optimising are as follows:

  • Number of threads available to make network connections and deliver messages
  • Time taken for MSMQ to give up trying to deliver to an off-line destination
  • Time taken for MSMQ to clean up after delivering to an on-line destination
Increase number of threads available
  1. Pause outgoing queues that won't be able to connect
    MSMQ skips outgoing queues that are paused so if you know a destination is going to be off-line then make sure MSMQ doesn't try to send any pending messages to it. This isn't going to be an easy approach, especially with many clients. An example would be to have an outgoing queue paused by default until the corresponding client requests it be unpaused - maybe the client machine could send an MSMQ message to a queue on the server to indicate to a monitoring service that it is available for any pending messages; some business logic could then determine when to pause the outgoing again.
  2. Only send messages that need sending
    Review the effective lifetime of messages - if a message would be out of date after 6 hours then remove it from the outgoing queue once it becomes stale; once an outgoing queue becomes empty, MSMQ will remove it and so free up a thread. To set a lifetime for a message, use the Time to reach queue and Time to be received properties.
  3. Increase the thread pool size
    The registry value QMThreadNo controls the number of threads in the pool available for making network connections. Note that there is a documentation error - the value has no upper limit. This doesn't mean you should just set the value to a million and walk away - the thread pool is small because threads are expensive. By default, each thread has 1Mb of virtual memory reserved for it, of which 8kb is commited. So a 32-bit system with 2GB of addressable memory can only maintain a maximum of 2,000 threads for the process, even if they are not all in use - and that is assuming this addressable memory is not already in use for other objects ... like messages and queues. You should, therefore, raise the QMThreadNo value in increments whilst monitoring the performance of your system to find the "sweet spot" where you maximise MSMQ throughput without impacting overall performance. Ideally you would have one thread per outgoing queue but that will not be practical on most systems. In summary, if you must increase the defaults, use the smallest number of threads you can to get the job done and stay well below the limits imposed by the architecture. 
Reduce time spent with OFF-line destinations

The FAQ states that the connect() API call blocks for 20 seconds. This is a handy simplification as the timeout - which is in fact 21 seconds - is really at the TCP level. There are two TCP settings that are relevant - TcpInitialRtt and TcpMaxConnectRetransmissions. The first value defaults to 3 seconds and the second to 2 retries. The time between retries doubles each time so the connect() call waits 21 seconds, made up of 3 (1st attempt) + 6 (1st retry) + 12 (2nd retry), before giving up on the connection. 

Note - It is possible to change these values and so reduce the timeout but you need to understand that EVERY network-using application on the machine will be affected in a similar fashion. These are not MSMQ-specific settings and reducing them has the potential to break other processes. Handle with care.

Reduce time spent with ON-line destinations

MSMQ regards establishing a network connection as expensive in terms of time and computer resources but, equally, maintaining an unused session is costly. Existing connections are not closed immediately after the last message has been sent, mainly because there is no way of knowing which one IS the last message. As a compromise, MSMQ waits for a few minutes of idle time before ending the session. To free up the thread for the connected queue quicker, reduce the CleanupInterval registry value. Don't drop the timeout too low as you want to avoid reconnecting too frequently. 

References

280087  FIX: Windows 2000 Application Delay Occurs in Response to the Message Queuing Send Function
321784  Application delay occurs in response to Message Queuing send on Windows XP

[[Thanks to Doug Stewart and Ray Ion for their technical input]]