We have gotten a number of cases lately involving applications using Outlook 2010's MAPI failing when connecting to Exchange 2010 servers. The primary symptom observed by our customers is that random MAPI calls to Exchange return MAPI_E_CALL_FAILED. At first blush, there is no rhyme or reason as to which calls fail, except that all of the applications which see the problem have multiple threads using MAPI. It turns out the underlying problem is very complicated and we won't be able to fix it in a hotfix (though we do have a long term plan for a solution), so I wanted to get this information out there in case this problem is affecting you.
The root of this problem has to do with how the Exchange MAPI provider (henceforth called just MAPI here) packs remote operations, or rops, into the RPC buffer to send them to Exchange to be processed. In all versions up until Outlook 2003 (and including all versions of the MAPI download), most rops are sent synchronously to the server as soon as the client calls in to MAPI. So if you were to call GetProps and follow along in the debugger, you would see your call go in to emsmdb32, then to rpcrt4, at which point a packet would be sent on the wire. Your thread would then wait for Exchange to respond, then it would parse the response buffer and hand it back to the client. The MAPI_DEFERRED_ERRORS flag can cause multiple rops from the same thread to get packed together in the same buffer, but the key point here is that everything usually happens on the initial calling thread.
Outlook 2007 changed the model for how Outlook’s MAPI handles MAPI operations. Most MAPI operations are now issued asynchronously. MAPI maintains a worker thread which queues up pending MAPI operations from the various client threads and packs them into buffers as needed, resulting in fewer packets being sent between the client and server. So for instance, if one thread issued an operation to read a stream and another issued an operation to set properties at the same time, both operations could be packed into the same buffer and sent to the server. When the results come back, MAPI decodes them and sends them back to any client threads which were waiting on the results. This has the benefit of greatly reducing the amount of data sent across the wire, which is a primary concern for the developers of Outlook’s MAPI. The side effect of this change is that we can now see operations packed together in combinations that were never possible to see with Outlook 2003. This is the first change which leads to the random MAPI_E_CALL_FAILED results.
On the server side, Exchange 2010 made a series of fixes to prevent looping calls which can overwhelm networks and bring down Exchange servers. The key rop here is RopBufferTooSmall, which is the rop the Exchange server can send back to a MAPI client if the server needs to send more data back to the client than is available in the response buffer. This is a signal to the client that it retry the operation, but this time either use a larger buffer, pack fewer operations into the request buffer, or both. Through analysis of numerous hang dumps we had received over the years, the Exchange team identified a number of scenarios where the server would send this response back to the client and the client's response would be to issue the exact same request the server had just rejected. This would result in a loop, with client and server exchanging the exact same packets until one or both was shut down. The key characteristics in these loops were that RopReadStream was the operation for which the result could not fit in the response buffer, that it was the first and usually only operation in the request buffer, and that we were already using the maximum size response buffer. The fix then was to special case when RopReadStream was the first operation in the request buffer. Instead of returning the RopBufferTooSmall rop, we would return the error ecBufferTooSmall (0x47D, or 1149 in decimal). This change, being more picky about how Exchange 2010 will accept packed rops, is the second change leading to random MAPI_E_CALL_FAILED results.
We combine these changes to see how they interact: prior to Outlook 2007, if Outlook’s MAPI issued a request buffer with RopReadStream as the first operation in the buffer, it was most likely also the last operation. A single client thread was waiting synchronously for the response to this rop. If the response buffer was already maximized and the request still could not fit, re-issuing the request could not be expected to work, so Exchange’s error would be the best response, as it avoids the loop. With Outlook 2007 and higher though, this RopReadStream could be packed in the same buffer as rops for other threads. Some of these rops may carry a large payload, which complicates the math Exchange uses to calculate the remaining space in the response buffer. This calculation occurs before Exchange executes the rops (since it would do no good to execute the rops first and discover we have no room for the results) so Exchange has to assume that every rop in the buffer could potentially fail. The upshot is that RopReadStream gets the short end of the stick, and Exchange concludes it does not have enough space for its response. When RopReadStream is not the first operation in the buffer, we send RopBufferTooSmall back to the client so it can break up the operations and try again. But when it’s the first operation in the buffer, the entire packetful of rops fails outright.
And when they fail, since MAPI doesn’t have a targeted error code to match ecBufferTooSmall, everybody gets the default error of MAPI_E_CALL_FAILED.
The Exchange and Outlook development teams both take this issue very seriously and want to correct this. That said, the risk of regression when you start playing with how Outlook packs rops or how Exchange calculates response buffers is incredibly high. The amount of testing we can do for a hotfix does not reduce this risk enough for us to be comfortable approaching this as a hotfix. So the Outlook team has agreed to attempt a fix for this in the next version of Outlook, currently under development. By getting the fix in to Outlook this early in the development cycle, we’ll have plenty of opportunity to test the fix, and also to observe how it affects clients in both dogfood and beta deployments. We will evaluate our options for Outlook 2010 if/when it comes time to consider what goes in to SP2, depending heavily, of course, on how things go in the next version.
What You Can Do
Given the nature of this problem, if your application is multithreaded, uses Outlook’s MAPI, and reads streams, there is a high likelihood you will run in to this problem at some point. Here are the options we’ve identified so far, in no particular order:
- Use the MAPI download instead of Outlook’s MAPI. Of course, this option does not apply if you are dependent on Unicode PSTs or the MAPI<->MIME conversion API.
- Use EWS to communicate with Exchange. This is your best long term resolution, and also the one with the highest development cost. However, if you’ve been looking for an excuse to make the switch, this is your chance.
- Use cached mode. When working in cached mode, all of your calls operate locally against the OST. The methods used to keep the OST in sync do not appear vulnerable to this issue.
- Serialize your MAPI calls. We’ve had some customers who have had success with this. The simplest way to serialize all of your MAPI calls is to use a single thread. If you cannot do this, then you need to isolate your calls which result in reading a stream from calls which are likely to carry a large request payload. One way to do this is to classify your MAPI operations into three classes:
- Calls which do or are likely to result in RopReadStream, such as: ISequentialStream::Read, CopyTo, MAPIToMIMEStm, IStream::Commit, etc.
- Calls which do or can handle a large amount of data, such as: ISequentialStream::Write, GetPropList, GetProps (for more than a handful of properties), SetProps (again, for more than a handful of properties), etc.
- Other calls, such as OpenEntry, SaveChanges, etc
Once your calls are classified, you would then use mutexes or other control mechanisms to ensure you never issue a call from the first class while you’re waiting on the result of a call in either the first or second class. Calls in the third class would be safe to issue indiscriminatingly.
- Insert manual retry logic for all MAPI operations. Since the issue will cause all operations in a packet to fail, it is not sufficient to just retry rops which read streams. But since some MAPI operations will fail internally and swallow the error, it’s also not possible to protect all calls. This is not considered a viable resolution.
Hopefully one of the first three options is available to your application.
Given that this problem is one that is going to happen despite your best efforts to avoid it, it helps to be able to identify the issue. The best diagnostic here is to use Exchange’s RPC Client Access (aka RCA) logs, which are on by default. These logs contain an entry for every time Exchange failed a call with ecBufferTooSmall. The key word to look for in these logs is BufferTooSmallException. You should expect to find several entries with BufferTooSmall, without Exception. These are the normal retry results and can be ignored. Here are a couple of examples of the log entry you need to look for:
2011-03-17T08:23:58.188Z,2,8548,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,,
RpcDispatch: [BufferTooSmallException] Une exception de type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' a été levée.
2011-09-15T13:59:20.762Z,5,1351,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,,
RpcDispatch: Exception of type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' was thrown.
You'll find these logs on the Client Access Server (CAS) for Exchange 2010 in the following directory:
%ExchangeInstallPath%LoggingRPC Client Access
Note that the second entry contains more information on the other operations which were included in the buffer. While you don’t need this information to identify the problem, it might be helpful in matching the logs to your own error logs. Here’s how to crank that logging higher (note that this must be done on every CAS for which you want greater logging:
- Locate the %ExchangeInstallPath%Bin folder
- Open Microsoft.Exchange.RpcClientAccess.Service.exe.Config in notepad.
- Near the bottom, there is a line which will look like this:
<Add key=”LoggingTag” Value=”ConnectDisconnect, Logon, Failures, ApplicationData, Warnings” />
To this list, add the word Rops. It should then look like this:
<Add key=”LoggingTag” Value=”ConnectDisconnect, Logon, Failures, ApplicationData, Warnings, Rops” />
- Restart the RpcClientAccess Service.
Now Exchange will log the rops which were in the buffer. By the way, RCA logs are great for troubleshooting other issues as well. Any time Exchange 2010 isn’t returning the results you expect, these logs are a good place to start. Also, I should remind anyone looking at these logs that we do know at least one other reason you might see BufferTooSmallException. I discussed it in my article on Large Multivalued Properties.
This is a complicated issue that can present itself in multiple ways (due to the random nature of which calls fail). Even though I was the engineer who discovered it and figured out what was happening, I still worked two further issues where this issue was the root cause but did not recognize it. In one case I happened to get debug trace that reminded me to check for this issue, and in the other, in desperation when no other troubleshooting was working, I rolled the dice and had the customer check the RCA logs on the off chance this was their issue. It turned out it not only lined up perfectly, but the customer was able to apply the same troubleshooting to other sites experiencing issues they though were unconnected and confirmed the issue was happening there as well. So while we don’t have a fix for this yet, at least we can determine when the problem is happening and stop troubleshooting.
If development informs me of any plans to fix this earlier than the next version of Outlook, I will report them here. However, don’t ask me if or when SP2 is expected. For that, track the Office Sustained Engineering blog or the Update Center for Microsoft Office.