This post was inspired by a case I worked recently. In this case, the customer was using the 5.5 Event Script service to autoaccept meeting requests. They weren't having any problems with their script or scalability. Their problem was that the service would run fine for days at a time, and then suddenly stop handling all incoming meeting requests. The error they would get in the event logs was this not especially helpful event 11:
Event ID: 11
Description: A fatal error (0x80004005)
occurred in an IExchangeEventSink while processing message [Subject = '<subject>
Examining the script logs was no help - they didn't say anything about the error. This case had me stumped for a while. The latest fixes for events.exe were no good. Turning on internal tracing didn't take me too far either.
Time to pull out the big gun
When I finally got a debugger attached to the events.exe process, I found that OpenMsgStore was returning MAPI_E_CALL_FAILED (==0x80004005). Debugging into that, I found that during handling of OpenMsgStore, MAPI has to go to the registry to retrieve some properties from the profile. MAPI's call to RegQueryValueEx was returning ERROR_KEY_DELETED (==1018). The profile we were using in the Event Script service had been deleted out from under us! Why?
The simplistic answer is that right after MAPILogonEx, events.exe calls DeleteProfile. To understand that, we need to digress.
What does DeleteProfile do?
Well, for starters, it doesn't delete the profile. At least not always. Here's what the MSDN has to say on this function:
The IProfAdmin::DeleteProfile method deletes a profile. If the profile to delete is in use when DeleteProfile is called, DeleteProfile returns S_OK but does not delete the profile immediately. Instead, DeleteProfile marks the profile for deletion and deletes it after it is no longer being used, when all of its active sessions have ended.
Let's break this down:
Check if the profile is in use
MAPI maintains a shared memory object (this is CreateFileMapping, not .Shared sections) for interprocess communication and synchronization. In that shared memory object, we keep a linked list of all profiles currently in use. When we log onto a profile with MAPILogonEx, its ref count is bumped up. When the client logs out, the ref count is dropped back down. So checking if a profile is in use is equivalent to looking in the shared memory object to see if the profile has a ref count.
Mark a profile for deletion
MAPI uses a special reg key to mark a profile for deletion. The key is
HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\Windows Messaging Subsystem\Profiles\Deleted Profiles
To mark a profile for deletion, we write the profile name as a subkey. When MAPI is asked to present a list of existing profiles, such as in the MAPILogonEx dialog or in response to GetProfileTable, profiles listed under Deleted Profiles are left out.
Delete a profile after it is no longer being used
This is the hard one. How does MAPI determine the exact instance a profile is no longer in use? During logoff is not sufficient, since we also need to account for cases where the application using MAPI has crashed. So we have to be cleverer than that. We delete profiles from the registry when:
- DeleteProfiles is called and the profile is not currently in use.
- A session is released and the profile is marked for deletion
- MAPIInitialize is called
The first two cases handle most profile deletions. The third case handles the abnormal situations - an application has crashed or leaked a session pointer. The third case is also where bugs in the implementation cause cross process problems.
Here come the bugs, part 1
The first bug is completely our fault. Suppose two different processes run as the same user. Suppose also that one of the processes uses impersonation to run under the credentials of another user. Note that by default, this does not load the impersonated user's registry hive. Doing that requires a different call. (See KB 259301 for example code that does this). So, both processes will use the same registry hive when accessing HKEY_CURRENT_USER.
The first process to start creates a profile for MAPI. It then calls DeleteProfile after logging on so as not to clutter up the registry with temporary profiles. The second process (which has done the impersonation) now comes in and calls MAPIInitialize. We have a problem with our shared memory now - each process got their own block of shared memory. Our scheme now breaks down. The second process checks its shared memory to see what profiles are in use, finds that no profiles are being used, and proceeds to delete every profile listed under "Deleted Profiles". The first process is now what we call hosed.
When this bug was first reported, we weren't sure how to fix it without doing a total rewrite of this feature. That is, until one of our devs had the brilliant insight that each shared memory section was essentially tracking the profiles in use by a particular NT account. So all we needed to do was reflect that in our use of the registry by appending a CRC of the SID under which MAPI is running to the "Deleted Profiles" string. Now all processes which are running MAPI under the same NT account (therefore accessing the same shared memory) are looking at the same place in the registry, and processes running MAPI under different NT accounts look in different places in the registry.
Here come the bugs, part 2
And yet, even after putting in the fix, we still saw profiles being deleted out from under us. This one isn't completely MAPI's fault. Suppose two different processes are running on the same machine. Both run as the same user and neither uses impersonation. However, one runs in Terminal Services and the other runs in the console.
Same dance as the first bug. When the second process goes to create a shared memory object, it does not get the same shared memory as the first one. We don't get the same shared memory because sessions started under Terminal Service use a different namespace than sessions started from the console. In general this is a good thing, but it blocks MAPI from doing what it's trying to do.
The fix from the first bug isn't effective here because the SID for both processes is the same. Fortunately, the fix here is even simpler - it's even spelled out in the MSDN: prepend "Global\\" to the name of the shared memory. Now all processes running as the same user will use the same shared memory regardless of the Terminal Service session.
Where was I?
This started as an Event Script problem, and digressed into a discussion of DeleteProfile. The Event Script service is built around MAPI. It creates a profile on the fly and calls DeleteProfile immediately after logging on. The customer used Terminal Services to manage their server, and to top it off, Outlook was installed on the machine. So every few days, they'd encounter Bug 1 or Bug 2 and their script was toast. I had them remove Outlook, sent them the patch, and they haven't seen the problem since.
Hey Steve, where's this wonderful patch you speak of?
Thought you'd never ask. Here it is by product:
- Exchange 5.5: Bug 1 and 2 were both fixed in the same patch - KB 818861. If MAPI32.dll is 5.5.2657.50 or greater then you've already got the fix.
- Exchange 2000: This is where Bug 1 was originally reported. KB 817375 fixes it. As far as I can tell, no one's reported Bug 2 on Exchange 2000 and the fix for Bug 2 has not yet been ported to Exchange 2000. If you are seeing this problem against Exchange 2000, please open an issue with PSS.
- Exchange 2003: Bug 1 was discovered in time to roll the fix into RTM. Bug 2 was discovered later. No hot fix was issued, but the fix is in SP1.
- Outlook: Neither of these fixes is in any version of Outlook. However, as far as I know, no customer has reported either issue with Outlook. If you are hitting this issue on Outlook machines, you should also open an issue with PSS.
[8/22/04 8:09PM Minor edit - clarified some text]
[8/25/04 11:04AM Clarified impersonation and added links]