Insufficient Resources? Run away, run away!

What is the MSMQ problem that you are most likely to encounter? The dreaded "Insufficient Resources" error. Unless you have a working knowledge of how an operating system actually operates then this error is going to cause mass confusion. For starters, which resources does it mean? Disk space? No, that has GBs free. Memory? No, task manager shows that there is more than enough still unused. So what's up?

I've had a look at the MSMQ FAQ for the following list of possibilities (plus one or two others):

  1. The thread pool for the remote read is exhausted (MSMQ 2.0 only).

  2. The number of local callback threads is exceeded

  3. The volume of messages has exceeded what the system can handle (MSMQ 2.0 only).

  4. Paged-pool kernel memory is exhausted.

  5. Mismatched binaries.

  6. The message size is too large.

  7. The machine quota has been exceeded.

  8. Routing problems when opening a transactional foreign queue (MSMQ 3.0 only)

  9. Lack of disk space.

  10. Storage problems on mobile devices

  11. Clustering too many MSMQ resources

 

Slightly confusingly, one KnowledgeBase article covers the first two causes

1    The thread pool for the remote read is exhausted. (MSMQ 2.0 only)

This is further documented in KnowledgeBase article 901279 "Applications may receive error messages about insufficient resources in Message Queuing".

If you are running several Message Queuing-enabled applications that receive messages or monitor for message arrival events from a queue on another computer, this problem probably occurs because you have exceeded the number of remote reads that can be made against Message Queuing on a remote computer. When the application reads from a remote queue, a thread is created by the local Message Queuing service that waits for the remote read on the remote computer to be completed. The default threshold of threads that are created to handle these requests is 24 (Windows 2000 Professional) or 96 (Windows 2000 Server). Edit MaxRRThreads in the registry to increase this value.
Note: Systems that are running Windows XP or Windows Server 2003 are not affected by the remote receive threads limit.

You can also get this problem in Computer Management when looking in the queues on another machine - the snap-in is doing remote reads to query message properties so "insufficient resources" will appear if all the threads are in use.

2    The number of local callback threads is exceeded

This is further documented in KnowledgeBase article 901279 "Applications may receive error messages about insufficient resources in Message Queuing".

If you are running Message Queuing-enabled applications to monitor queues for arriving messages, this problem probably occurs because you have exceeded the number of local callback threads that are permitted. Message Queuing has a callback thread limit of 63 for local queues. The limit on callback threads for local queues is by design and cannot be changed. Callback threads are used to monitor for arriving messages. The threads permit applications to attach to a queue and perform a specific action when a message arrives. Because of the callback threads, the developer does not have to use a timer or some other mechanism that cannot guarantee that an action will be performed when a message arrives.

3    The volume of messages has exceeded what the system can handle. (MSMQ 2.0 only)

This is further documented in KnowledgeBase article899613 "You receive 'insufficient resources' error messages when the amount of allocated memory that is being used to store messages in Message Queuing 2.0 exceeds an amount that is between 1.6 GB and 1.8 GB". Bit of a mouthful, that title - see if you can read it all out in one breath.

The problem here is one of the big call generators and arises because many customers don't know how to monitor MSMQ. Their IT pros usually have a lot of experience in monitoring network health, server capacity (CPU, disk, memory, etc) and so on but have never looked into how to check that MSMQ itself is running smoothly - until it is too late and it falls over. As I've had to resolve this issue so many times, I'll give it a lot more depth than most of the other causes on the list.

So let's start off with the basics. The Windows operating system allocates 4GB of virtual memory to all applications - for 32-bit systems, anyway (2^32 = 4GB). Of this 4GB, 2GB is for the application to use and the remaining 2GB is for the kernel (for device drivers, etc.). MSMQ 2.0 uses the 2GB application area to not only run its code in but also store every message - that's express, recoverable and  transactional. If we assume that the service uses around 300MB just to spin the plates then there is about 1.7GB left for all the messages. To put this into some context, let's assume that your messages are 10kb in size, on average, which is by no means large. 1.7GB will therefore be able to store in memory only up to 170,000 messages. If there is a network outage, for example, and messages are starting to back up as they can't get delivered then it is only a matter of time before the space allocated to MSMQ becomes full. Once that happens, the MSMQ service will become unstable and be unable to process new or existing messages.

To check this is the cause of your "insufficient resources" problem, try one of the following:

  • Using Performance Monitor, check the "Total bytes in all queues" counter under the MSMQ Service performance object.

  • Look in the %windir%\system32\msmq\storage directory and check the total size of all files.
        See KnowledgeBase article 174307 "Interpreting file names in the Storage directory" for an explanation of what the various files are for.

If either show a total approaching 1.7GB then you have too many messages.

What can you do?

In the short term, you need to create breathing space for the MSMQ service. Before you start, make sure the applications that generate MSMQ messages are stopped as they will usually be able to fill space as fast as you can free it up. If you have enough resources left to open the queues with Computer Management then you may be able to purge any that contain messages you can afford to lose (messages that can easily be recreated, for example, or ones that are no longer required). Unfortunately, you may be in the situation where Computer Management returns "insufficient resources" and you will need try a more manual technique:

  • Stop the MSMQ service and any sending applications

  • Run MQBKUP.EXE command line utility to backup the MSMQ configuration and message store.

  • Move all the *.MQ files from the Storage directory to a backup area.
    Note: ONLY move the *.MQ files and not the five log files that must remain there.

  • Restart the MSMQ service to ensure it can start up successfully and eliminate any other causes from the troubleshooting

  • Stop the MSMQ service and move back half of the *.MQ files.
    Note: storage files are maintained in pairs - Pxxxxxxx.MQ and Lxxxxxxx.MQ. The latter file is an index for the former and they must be kept together.

  • Restart the MSMQ service but not any sending applications yet

  • Process the backlog of messages that has now been restored

  • Repeat the steps for the other half of the *.MQ files

  • Bring the sending applications back on line

There is one gotcha to this process if you are sending multiple messages within a single transaction. MSMQ stores messages wherever there is space so it is possible that these messages - linked by the same transaction - will be stored in different storage files. Therefore, when you go through the manual steps of moving files around, some of the messages in the transaction may be left in the second half of the storage files. This means that the transaction will abort when MSMQ starts processing the restored storage files as messages are missing. It is not possible to determine which storage files contain the messages from a particular transaction so be prepared to compensate for broken transactions.

In the medium term, you want to stop this happening again - either by monitoring MSMQ closely or by preventing message overload.
Monitoring is easy - set an alert on the "Total bytes in all queues" counter that I mentioned earlier. Customers ask me what level they should alert on and I respond that it is up to them - not very helpful but they have asked a "how long is a piece of string" question. You need to determine how long you are "happy" for MSMQ to be unable to process messages for. For example, if messages arrive at the rate of 1MB per minute, the system will fill up and fail after 28 hours if they cannot be processed or delivered. A 500-MB alert will therefore kick off several hours after the problem (outage, application, crash, etc.) has occurred which will give somebody in IT a reasonable amount of time to do something before MSMQ gets full.
Prevention is just as essential although not necessarily as easy to implement. Here's some examples of easy and not-so-easy causes or solutions:

  • Journaling has been enabled during some testing that was performed a while ago but no-one switched it off or maybe journaling is part of a reliable messaging mechanism but the code that is supposed to consume the journal messages is not working. Over time the journal queue fills up with duplicate messages until virtual memory is full.

  • Quotas have not been set. MSMQ will stop new messages arriving if it is told to and this is through quota limits - either at the machine or the queue level. Although implementing quotas is very easy - tick the box and set a size limit - this will have a knock-on effect on applications. Does your (or a 3rd party's) application that you use know how to respond when it is informed that the quota has been exceeded? Will it continue regardless, generate an error message or even crash?

899612 How to set up computer quotas and queue quotas in Microsoft Message Queuing

  • No throttling within the application. If quotas cannot be implemented for some business reason then it is up to the sending application to ensure it is not causing the MSMQ system to overload. An application, for example, could monitor the queue depth (number of messages) and back off from sending if there were already too many messages.

  • Not using 4GT RAM Tuning (a.k.a. the 3GB switch). It is possible to change virtual memory allocation so that 3GB goes to the application and only 1GB to kernel. This would increase the MSMQ message ceiling to around 2.6GB which may help matters but there's no such thing as a free lunch. Enabling 4GT RAM Tuning will help virtual memory usage but at the expense of kernel memory - you could end up with LESS message capacity after applying the 3GB switch than before. Also, this setting is global so every other application and service will be impacted (positively or negatively) - don't deploy this switch until it has been thoroughly tested.
    Note: The MSMQ service executable (MQSVC.EXE) needs to have the IMAGE_FILE_LARGE_ADDRESS_AWARE bit set.
    Note: Using the /3GB switch temporarily may be a good way to get MSMQ back up and running so you can clear up the messages instead of using the manual method described above.

In the long term, migrate to Windows 2003 as MSMQ 3.0 does not try to store all messages in virtual memory (but still beware of being bitten by problem 4 below).

Also, check out the MSDN article Resource Management in MSMQ Applications.

4    Paged-pool kernel memory is exhausted.

Another big call generator. According to the MSMQ FAQ:

The Mqac driver consumes a few dozen bytes of kernel memory for each queued message, and the size of kernel memory is limited. Practically, for MSMQ 1.0 and Message Queuing 2.0, this limit is approximately 2.5 million messages depending on how much physical memory is available. Windows XP and later can allocate larger pools of kernel memory, allowing Message Queuing 3.0 to store a few more million messages. The limit is much higher for Windows Server 2003 on a 64-bit platform.

If we again have a look in Resource Management in MSMQ Applications (under the Paged and Non-paged Memory section):

Each message consumes approximately 70-80 bytes, on average, of page pool memory.

so every million messages is about 75MB of kernel memory.

Additionally, each queue which contains messages consumes more than 400 bytes of memory for its internal state, of which more than half is kernel memory. On the face of it, even 2,000 queues is only going to take up only 0.5MB of kernel memory but it all adds up.

Your next question is "how do I find out how much kernel memory I have?" and you can have an easy/rough answer or a tricky/accurate answer:

  • Easy/rough - run "TMQ STATE" and look at the log file generated, you may see something like:
    Pools limitations (calculated approximately, in KB)
    Paged : limit 307,200 used for 17 %
    Nonpaged : limit 262,144 used for 13 %
    Calculations are wrong (NIY) if you boot with /3GB or /4GB option
     

  • Tricky/accurate - attach a debugger or open a kernel dump and run the !vm command to show something like:
    *** Virtual Memory Usage ***
    ....
    NonPagedPool Usage: 4691 ( 18764 Kb)
    NonPagedPool Max: 65536 ( 262144 Kb)
    ....
    PagedPool Usage: 13217 ( 52868 Kb)
    PagedPool Maximum: 68608 ( 274432 Kb)

These reports are from two different machines, the latter !vm output is from KnowledgeBase article 894067 "The Performance tool does not accurately show the available Free System Page Table entries in Windows Server 2003".

TMQ is available from KnowledgeBase article 887220 "Description of TMQTools utilities for Microsoft Message Queuing"

If we take the paged pool maximum of 274,432 kb and divide by 75 bytes (average kernel memory usage per message), we get 3,750,000 messages as the maximum theoretical number of messages that system can store. On a Windows 2000 server, you would normally run out of virtual memory before you got anywhere near 3,000,000 messages. On Windows 2003, though, virtual memory is no longer an issue so kernel (or more accurately paged pool) memory availability is your new limiter.

The symptoms for running out of paged pool are more severe than running out of virtual memory. If MSMQ has insufficient virtual memory space to work in then MSMQ suffers. Paged pool, on the other hand, is shared amongst all applications, services and drivers so when system has run out of paged pool then ALL processes are affected and the system will become unstable.

What can you do?

It depends on where the paged pool memory has gone. As this resource is shared out then any application can use it up, not just MSMQ.

We'll start with MSMQ-as-culprit. If MSMQ has millions of messages and paged pool has run out then you can try out the steps above for virtual memory exhaustion (restore messages in batches, set quotas to keep volume (and therefore message numbers) down, set alerts on message numbers, etc.). MSMQ, though, does try to be a "good neighbour" and it will hibernate (go into "low memory mode") when 80% paged pool is in use as documented in KnowledgeBase article 811308 "MSMQ: How to Increase the Kernel Memory Threshold". The service assumes that it is the cause of the high pool usage and allows only 1 thread to be in use at a time. This prevents MSMQ making things much worse and allows for messages to still be consumed (albeit very slowly). Make sure you set registry value KernelMemThreshold to 95 because, as explained in the article:

Garbage collection is done by the kernel only when the paged pool memory consumption reaches 90 percent. However, Message Queuing stops functioning at 80 percent of paged pool memory consumption. If you set the kernel memory threshold above 90 percent, this makes sure that Message Queuing does not go into "low memory mode" until the Windows Memory Manager starts cleanup

[[Update 8th June 2007]] Note: KernelMemThreshold is NOT required if you have applied the Update Rollup 1 for Microsoft Windows 2000 Service Pack 4 as MSMQ 2.0 has been recoded to not stop working at 80%. Instead, it will rely on the operating system to manage things properly (which, by the way, is how MSMQ 3.0 works so setting KernelMemThreshold will have no effect on Windpws XP/2003/Vista/etc.). If you have the rollup and apply the KernelMemThreshold registry change then MSMQ 2.0 will honour the value you have set. [[End of Update]]

More than likely, though, is the situation where something else has caused the paged pool to run out and cause MSMQ to have resource problems. For example, anything to do with file access can be expensive on kernel memory - having MSMQ running on a machine that also operates File & Print Sharing heavily is probably not going to be a good arrangement. Terminal Services is another no-no because it attempts to maximize system page table entries at the expense of paged pool memory, as documented in KnowledgeBase article 268230 "Scaling Out Versus Scaling Up with Intel Physical Addressing Extensions (PAE)"

Tracking down the consumers of paged pool takes a bit of detective work. Basically with Poolmon (or similar utility) you collect logs of how memory pools are used over time and look for trends. Memory is tagged with a 4-byte word (such as FatV, for example) so that you know what it is used for although half the time this information can be very cryptic. What, for example, is a Fat Vcb stat bucket? Usually sorting on Top 10 (ab)users will show you immediately which tags are eating the most paged pool memory - most tags use a trivial amount and can be ignored - only tags using megabytes need to be looked at. Once you have a list of tags to investigate you need to look at a reference list of tags to see what products they come from and therefore if there is anything you can do about it.

How the system is configured can make a big difference on available memory. To boost kernel memory, you want to set PagedPoolSize=FFFFFFFF as documented in KnowledgeBase article 810507 "Error 0xc00e0027 when you send or receive a Microsoft Message Queuing message".

On the other hand, adding physical RAM reduces paged pool memory quite dramatically. The following KnowledgeBase arricle 912376 "How to monitor and troubleshoot the use of paged pool memory in Exchange Server 2003 or in Exchange 2000 Server" says:

Each byte of physical RAM that is installed in a server requires some kernel memory to address and manage it. The more RAM that is installed, the more kernel address space must be reserved for it. Address space may be borrowed from paged pool memory to satisfy this demand.

The bottom line is that MSMQ, just like Exchange in the article, doesn't benefit from adding more physical RAM past the first 4 GBs. 

[[Update 24th February 2009]]  As discussed in FIX: Kernel-pool memory may become exhausted when many clients connect to Message Queuing, "When many clients connect ..., ... Message Queuing may exhaust its kernel-pool memory." The many connections cause an accumulation of Transmission Control Protocol (TCP) buffers in kernel memory and the MaxInSessions registry value can be used to throttle the number to a safe level. [[End of Update]]

 

5    Mismatched binaries

This is an easy one to clean up, as documented in KnowledgeBase article 326447 "'Insufficient Resources to Perform This Operation' Error Message with Message Queuing Queue". I would suggest, though, that applying the security rollup from 2002 is not adequate and you should as well apply the much more recent MS05-017 which is documented in KnowledgeBase article 892944 "MS05-017: Vulnerability in MSMQ could allow code execution"

 

6    The message size is too large.

MSMQ (up to and including 3.0) can only send messages as large as 4MB. Anything larger throws an error. Note that this is 4 million bytes and NOT 4 million characters - UNICODE characters are double-byte representations and so you can only send 2 million characters from a standard Windows application. MSMQ logging (as documented in the MSMQ FAQ) allows us to check if the message size exceeds the 4MB limit.  Log entries similar to the following will occur where the Data indicates the hex size of the message:

0x488> Wed Apr 02 11:34:19 2003: DRIVER Error: heap/1235, HR: 0xc000009a, Data-0x560600
0x478> Wed Apr 02 11:34:19 2003: DRIVER Error: Packet/150, HR: 0xc000009a, Data-0x0

0xC000009A means INSUFFICIENT RESOURCES
0x560600 = 5,637,632 bytes in decimal

 

7    The machine quota has been exceeded.

As discussed earlier, you can set quotas on the machine as a whole and individually on a queue. If the machine quota is exceeded by a local application, "Insufficient Resources" is returned and the sending application must deal with this scenario properly. By local, I mean that the application is running on the same machine as where the quota is set and is trying to create a message, either in a local queue or in the outgoing queue destined for a remote machine. Machine quotas (a.k.a. storage limits) are set in Computer Management on the General tab of Message Queuing Properties. Machine-level quotas on remote machines and Queue-level quotas are handled differently. From the MSMQ FAQ:

Q. Are there any fundamental differences between machine quota and queue quota?

A. Yes. When the machine quota is exceeded on a computer, the computer closes its sessions and does not accept new messages. In this case, messages will wait in outgoing queues of source computers, or intermediate routers, making the machine quota a binary method of throttling. When queue quota is exceeded, the destination Message Queuing computer rejects the messages for the particular queue, and returns an exceeded quota NACK (if a NACK has been requested). This means that messages are lost if a queue quota is exceeded.

 

8    Routing problems when opening a transactional foreign queue

From the MSMQ FAQ:

A special case of this error code is when opening a transactional foreign queue on Message Queuing 3.0.
Then, it most likely means that Message Queuing cannot find a route to a bridge server that can forward the message to MQSeries.
There may be two problems:

  • The bridge was not configured correctly and necessary objects (such as Message Queuing routing links and the Message Queuing site gate) were not created.

  • For a new bridge, Message Queuing on the local computer needs to refresh its own routing data. This happens once every 12 hours by default. You can restart the Message Queuing service in order to overcome this delay.

 

9    Lack of disk space.

Pretty self-explanatory. If there is not enough disk space for MSMQ to create files in the Storage directory then "insufficient resources" will be returned.

 

[[I really wished I had picked something simpler to write about - this blog took a DAY  :-) ]]

 

10   Storage problems on mobile devices

[[This update thanks to Biju S Melayil on the MSMQ newsgroups]] 

The default values for MSMQ storage on mobile devices may be too low for the messages you are trying to send (so similar to items 7 and 9 above).
There are 4 registry values you want to check under HKEY_LOCAL_MACHINE\Software\Microsoft\MSMQ\SimpleClient:

  • BaseDir (Directory used for queue information and message storage.)
  • DefaultQuota (Outgoing queue quota; default 0.25MB)
  • DefaultLocalQuota (Incoming queue quota; default 1MB)
  • MachineQuota (Total quota; default 2MB)

These are documented on MSDN.

You will need to restart the MSMQ service to make these take affect.
Also, the default local quota only applies to queues created after you have made the change and not to existing ones.

Note - According to the Windows CE Networking Team WebLog, WinCE processes can only handle 32 MB of RAM per proc, regardless of available memory. (This is being fixed in CE 6.0)  

 

11   Clustering too many MSMQ resources

[[added September 13th 2007]]

It is important to ensure that all the clustered MSMQ resources have adequate kernel memory address space for mapping messages. Each resource has its own instance of the Message Queuing driver (Mqac.sys) which maps files to a 4MB range of the System View Space memory pool (desktop heap for drivers). As the default pool size is 16MB, this allows for 3 MSMQ services (3x4MB=12MB) and 4MB left over for other (non-MSMQ) device drivers - this 4MB remainder may not be enough. Even worse, running a fourth service would mean MSMQ would try and allocate itself all of the memory pool, leaving none for other services to make use of. The results of this are, obviously, unpredictable and you may see "Insufficient Resources" reported. 

Note - the local unclustered MSMQ service will also count against this limit if it is running.

To raise the limit, modify the SystemViewSize registry value (not to be confused with SessionViewSize). The value should be set as follows:

SystemViewSize = 16 + (total number of Message Queuing resources x 4)

SystemViewSize is documented here:

Message Queuing in server clusters
"Memory management for Message Queuing resources in clusters"

Other Reference material:

MSMQ FAQ document
"3.5 Why is there a 4-MB message size limitation?"

Deploying Message Queuing (MSMQ) 3.0 in a Server Cluster
Q. Are there any limitations for multiple MSMQ resources in a server cluster?

[[added October 1st 2007]]
936497 BUG: Error message when you try to send messages to a Message Queuing queue on a computer that is running a 64-bit version of Windows Vista: "404 not found"