Error 22022: SQLServerAgent Error: Network problems are preventing connection to the Microsoft Exchange Server computer.

Another troubleshooting story to share...

 

Customer was using SQL Server 2000 (build 8.00.2162). They use SQL Agent mail for sending notifications about job completion. They upgraded their Exchange Server 5.5 to Exchange Server 2003, and since then they started experiencing the following problem:

Occasionally they receive error message indicating that connection to Exchange server has been lost. After this no mails are sent by SQL Agent mail, most of the time until they recycle SQL Agent Service, but sometimes it starts working again without manual intervention required. Customer told that the error used to happen 1-2 times a day, mostly during the night.

 

After a few hours of proper job notification sending, the following entry is dropped to SQLAgent log instead of sending notification:

2006-07-06 10:49:38 - ! [LOG] Unable to send job completion notification email to operator 'DBA' for job 'TestMailAfterFailure'

 

Also tested with "Include execution trace messages" option (which gives more detailed information in SQLAgent log):

2006-08-04 12:04:27 - ? [177] Job TestMailAfterFailure has been requested to run by User MyDomain\MyUser
2006-08-04 12:04:29 - ? [182] Job completion for TestMailAfterFailure is being logged to the eventlog
2006-08-04 12:04:29 - ! [LOG] Unable to send job completion notification email to operator 'DBA' for job 'TestMailAfterFailure'
2006-08-04 12:04:29 - ? [184] Job completion for TestMailAfterFailure is being logged to sysjobhistory

Job history clearly shows that notification could not be sent:

The job succeeded. The Job was invoked by User MyDomain\MyUser. The last step to run was step 1 (Send Mail). NOTE: Failed to notify 'DBA' via email.

 

If testing mail sending to the operator in SQL Server Enterprise Manager while problem exists:
After problem occurs (ie. job notifications are not sent), customer tried testing mail sending to operator from SQL SEM, and the following error did pop up:

"Error 22022: SQLServerAgent Error: Network problems are preventing connection to the Microsoft Exchange Server computer."

 

 

I asked customer to collect Time Travel Debugging traces of SQLAgent.exe while he reproed the failure from SQLSEM, and this is what I found:

When you hit the “Test” button on the General tab of the “SQL Server Agent Properties” dialog, from Enterprise Manager, it triggers the execution of an extended stored procedure on SQL Server, which notifies SQL Server Agent the user wants to run that mail profile test, and then SQL Server Agent tries to send the message.

What I saw on the first attempt from the two I asked him to capture is that when SQL Server Agent calls the IMAPIFolder::CreateMessage method implemented in emsmdb32.dll, it gets back an HRESULT 0x80040115  (which is defined by MAPI as MAPI_E_NETWORK_ERROR). What this means is that there has been some sort of network cut which disconnected the MAPI client from the server. I could reproduce this error by unplugging the network cable on my machine and then started working again after connecting it back. But, based on the description of the problem experienced by customer (i.e. sometimes only restarting SQL Server Agent fixes the problem), I assumed that under some specific circumstances MAPI sort of hangs internally and is not able to reconnect client with the Exchange server. At this point, I suggested this should be investigated from networking/messaging specialist to try to determine when and why those cuts were happening and, hopefully, minimize them.

0:000>
eax=627b2210 ebx=00978b70 ecx=017e4fe0 edx=017e4fe0 esi=00978cd8 edi=00740020
eip=41e95d54 esp=0111fec4 ebp=0111ff20 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
semmap!CSemmapThread::Ver1SendMail+0x174:
41e95d54 ff504c call dword ptr [eax+4Ch] ds:0023:627b225c=6273d686 <-- This is the call to emsmdb32!IMAPIFolder::CreateMessage, but I don’t have symbols for that module so that’s why I don’t see symbolic name of the function.

0:000> p
(790.984): C++ EH exception - code e06d7363 (first/second chance not available)
(790.984): C++ EH exception - code e06d7363 (first/second chance not available)
(790.984): C++ EH exception - code e06d7363 (first/second chance not available)
(790.984): C++ EH exception - code e06d7363 (first/second chance not available)
(790.984): C++ EH exception - code e06d7363 (first/second chance not available)
eax=80040115 ebx=00978b70 ecx=017e4fe0 edx=627b20f0 esi=00978cd8 edi=00740020
eip=41e95d57 esp=0111fed4 ebp=0111ff20 iopl=0 nv up ei ng nz ac pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000296
semmap!CSemmapThread::Ver1SendMail+0x177:
41e95d57 8945d8 mov dword ptr [ebp-28h],eax ss:0023:0111fef8=00000000

The function is returning 0x80040115. Since it isn’t a successful HRESULT, then we call Ver1GetLastError which will call IMAPIProp::GetLastError, IMAPITable::GetLastError, IMAPIAddrBook::GetLastError, and IMAPISession::GetLastError in that order until one of them returns details for the error found.

0:000> pc
eax=80040115 ebx=00978b70 ecx=00974a90 edx=01c2c15c esi=00978cd8 edi=00740020
eip=41e95d89 esp=0111fec0 ebp=0111ff20 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
semmap!CSemmapThread::Ver1SendMail+0x1a9:
41e95d89 e8a20d0000 call semmap!CSemmapThread::Ver1GetLastError (41e96b30)
.
. <-- Removed for the sake of clarity. Just until the point where we will call MultiByteToWideChar, where we pass it the error string. At that point we can see what was the error that caused CreateMessage to fail.
.
0:000>
eax=01125808 ebx=00978b70 ecx=01124fa8 edx=01c2c15c esi=00978cd8 edi=00740020
eip=41e96c99 esp=0111fdb8 ebp=0111feb8 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
semmap!CSemmapThread::Ver1GetLastError+0x169:
41e96c99 ff151890e941 call dword ptr [semmap!_imp__MultiByteToWideChar (41e99018)] ds:0023:41e99018=0000a534

0:000> dd esp
0111fdb8 00000000 00000000 01124fa8 ffffffff
0111fdc8 01c2c15c 00000200 0111fe10 627b54d0
0111fdd8 00974a90 627b5310 7801f459 00000000
0111fde8 627388eb 62738921 000003ed 01127e90
0111fdf8 62738941 627b5310 80040115 0111fe94
0111fe08 00000000 80040115 80040115 00000000
0111fe18 00000000 6273822c 80040115 80040115
0111fe28 01127e90 01127e94 01127e98 6273823d

Here we dump the third parameter passed to MultiByteToWideChar, which is the error string:

0:000> da 01124fa8
01124fa8 "Network problems are preventing "
01124fc8 "connection to the Microsoft Exch"
01124fe8 "ange Server computer."

I could also notice that, on a second attempt to test SQL Agent notifications while the trace was being recorded, the call to IMAPIFolder::CreateMessage succeeded and the message was delivered correctly.

Investigating a bit further, I found that at a certain point (build 08.00.0922.00 to be precise. See details here http://support.microsoft.com/kb/833045), the product team fixed this exact same issue for the SQL Server SQLMail implementation (i.e. through the use of xp_sendmail). Unfortunately, the code layer which wraps MAPI for xp_sendmail is different than the code layer which wraps MAPI for SQL Server Agent’s SQLMail functionality. At that time also, there were many customers reporting they were affected by this issue when using xp_sendmail, but none reported the same issue with SQL Server Agent’s SQLMail (maybe because they didn’t use it, maybe because they worked the issue around by calling xp_sendmail as an additional step added to their jobs, or some other reason which I don’t know).

In the end, we put together a business case which justified changing SQL Agent Mail's behavior so that it implemented something similar to what was implemented in xp_sendmail (http://support.microsoft.com/kb/833045), and got this changed in build 8.00.2240. Starting with that build, if the call to IMAPIFolder::CreateMessage, made by SQL Server Agent, fails with MAPI_E_NETWORK_ERROR (0x80040115), it retries as many times as defined in the registry value NumberOfTimesToRetryOnNetworkError, using the retry interval specified in WaitIntervalForRetryOnNetworkError. Those two Registry values are specified under SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent (for a default instance), or SOFTWARE\Microsoft\Microsoft SQL Server\<MyNamedInstance>\SQLServerAgent (for named instances). If the values don't exist in the registry default values are used, which are 1 for NumberOfTimesToRetryOnNetworkError, and 1000 (milliseconds) for WaitIntervalForRetryOnNetworkError.

 

I finish here, hoping it's useful for someone to know this little trick.