I just finished troubleshooting an error where we were generating multiple OAB's on a server with 6 OAB's and one of them would always fail with error Event 9175 (MSExchangeSA). I want to provide a small tidbit on how the system attendant obtains it's connection objects.
Here is a little information on the connection logic.
- When the system attendant starts up it will attempt to make a connection to the information store so it can get a session object.
- This session object is for the public folder store that this system attendant is associated with. This session object will be used for connecting to the public folder store when we need to access it for OAB, Free/Busy, etc.
- Each time we need to connect to the information store we use the same session object and we will log the information about how many times we connect and how long they take to connect.
Now I needed to figure out why only the last one would fail. Through creating debug builds of store and system attendant so I could output maximum logging that was relevant to the problem I was able to obtain some information. Looking at the traces I could see that we would always return error
Dumping out this error I can see that we get ecWrongServer.
# for hex 0x478 / decimal 1144
# 1 matches found for "0x478"
Now the question is why are we returning ecWrongServer. The information store has logic built in to it where we get redirected to another information store we will detect this change and try to connect where we are connecting too. This can happen for multiple reasons, for one too many connections in a short time. The system attendant will connect to the information store via the RPC interface and call the following function emsmdb!CNCT::EcForceRpc(). Through debugging I was able to see that we always got a return value of [0x47f = ECSERVERPAUSE].
Through code review I was able to see that when a client connects to a server it isn't configured for we will get an error code of ecWrongServer. This error code tells us that this is the wrong server and we will do a query to determine the correct server. Once we have this information we give the client the correct server to connect too so we can attempt to connect again. In certain situations this could lead to a looping connection condition and cause RPC problems. To correct this behavior we added some logic we that keeps track of how many ecWrongServer errors have been sent to a particular client.
In addition to this if a client tries to make RPC calls more than 5 times under 10 seconds we will automatically return the error code of ecServerPaused error. This error will bubble back up the stack and be mapped back to the following error code: MAPI_E_FAILONEPROVIDER(0x8004011d). In this case these observations show that there is nothing wrong with the OAB itself and has to do with too many connection attempts.
One solution to this problem is to stagger your OAB's so they are built at different times so we do not hit this logic, which is a safety mechanism to protect the store from RPC DOS type attacks (ligitement or not).