Exception and Debug Event, the feedback from OS
This section will firstly brief exception related tech, and then use example to demonstrate how to use exception to troubleshoot effectively.
Exception is a mechanism to control code’s execution flow. In normal situation, the code executes consequently, like the following:
It should print 11. But how about if p points to an invalid memory address? Then the line to assign value to *p will trigger access violation exception, and the following line to print may not execute any more.
For applications, if the behavior does not follow the expectation, exception is likely a direct cause because this is the most obvious and common way that changes the execution flow. In most cases, troubleshooting problem is just the same meaning of troubleshooting exception.
In original Chinese version, I discuss how the OS plays an important role on exception handling and dispatching. Also I brief how different programming language leverages the SEH to support the exception handing mechanism. I will skip such introduction here because the following two articles cover them all:
A Crash Course on the De
Case study, how to let C++ dump the callstack as the C#
For the application created by C# or Java, when exception occurs, they are able to dump the call stack where the exception comes from. However, for C++, we have to use debugger to get the callstack. Now the customer wants to achieve stack dump in C++. Any good idea?
My solution is to use SEH, due to that local variable’s destructor will be executed during stack unwind when exception occurs. The sample code worked fine in VC6+Win2k3 platform. However, when I retry the sample, the same code behaves strangely in VC2005 + Win2k3 SP1. If I compile in debug mode, it works fine. However, in release mode, the application quits silently. For the whole story, I saved in my MSN blog (English), please refer to:
SEH,DEP, Compiler,FS: and PE format
Case study, Why Dr. Watson cannot save the dump file.
The customer reports their VC application crashes randomly. To obtain detailed info, the customer registers Dr. Watson so that when exception occurs next time, we can get the dump file. However, when the problem reoccurs, Dr. Watson saves nothing.
In Chinese version, I provided brief info about what dump file is, and what info we can find in dump. Related info can be found at:
Specifying the Debugger for Unhandled User Mode Exce
INFO: Choosing the Debugger That the System Will Spawn
Generally speaking, by setting the AeDebug registry key, we can lunch the debugger when application crashes. If we choose Dr. Watson as the debugger, the default behavior is generating the dump file.
Back to the case, the customer fails get the dump file, possible causes:
1. The Dr. Watson’s bug. It works abnormally.
2. The customer’s application does not crash, it just exits like calling ExitProcess.
To perform test against point 1, I provided the following sample code for testing:
With above code, Dr. Watson captured the dump file successfully on the customer side. So Dr. Watson works fine. It seems that the crash exclaimed by the customer is not really caused by unhandled exception. Maybe the customer calls ExitProcess unexpectedly. Thus during information capturing, we should not limited in unhandled exception. What we need to check is how the process disappears, maybe normal quit, maybe unhandled exception.
One possible way to figure out is to run the application in windbg. However, manual operation is troublesome. It would be nice if there is some automatic way. Windows provides a registry, which allows an application starts under debugger. With this setting, when the specified process starts, OS starts the debugger firstly, and pass in the target process and command line to debugger, then debugger starts the target process to debug. This option is very useful especially when we cannot start the process manually, like Windows Service, which starts ahead the user logon:
How to debug Windows services
Some malicious software uses this way to attach silent process. This method is also called IFEO (Image File Execution Option) hijacking in
In windbg folder, there is a script called adplus.vbs. We can use it to launch windbg to obtain the dump file. Here we will use the script:
How to use ADPlus to troubleshoot "hangs" and "crashes"
Use adplus /? to obtain detailed info.
With above analysis, the detailed actions are:
1. In customer’s machine, create the key named by the problematic process under Image File Execution Options
2. Under the key, create a string value called Debugger.
3. Set the value to Debugger= C:\Debuggers\autodump.bat
4. Edit the C:\Debuggers\autodump.bat as the following:
Based on above setting, when the application starts, the OS launches cscript.exe to execute the adplus.vbs script. The –sc switch in adplus.vbs specify the target process path, -crash means we will monitor for application’s quit, -o specifies the dump output folder, -quiet disables prompt. We can use notepad.exe as test to check if dump is generated when notepad.exe quits.
Based on above setting, when the problem reoccurs, we get two dump files in c:\dumps folder, called:
Pay attention to the second filename. The name indicates the 2nd chance C++ exception does happen. Open the dump in windbg, check the callstack, it shows that the customer throws some C++ exception in code, but forgets to capture that. By adding corresponding catch block, the issue gets fixed.
The solution is nice, but why Dr. Watson cannot get the dump?
The Dr. Watson’s behavior still confuses me. Since it is unhandled exception, why Dr.Watson cannot capture the dump file? Firstly I created two different applications to double verify the behavior of Dr. Watson:
int _tmain(int argc, _TCHAR* argv)
int _tmain(int argc, _TCHAR* argv)
For the first one, Dr. Watson does not save the dump. For the second, Dr. Watson saves the dump. It looks like the behavior is related to the exception type.
Recall the detailed crash behavior for above two applications when the Auto key is set to 0 under AeDebug. On my side, the message boxes for crash are:
Microsoft Visual C++ Debug Library
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
(Press Retry to debug the application)
Abort Retry Ignore
The instruction at "0x00411908" referenced memory at "0x00000000". The memory could not be "written".
Click on OK to terminate the program
Click on CANCEL to debug the program
The behaviors are totally different! And the behavior is related to the compilation mode.
SetUnhandledExceptionFilter API is used to modify the default unhandled exception handler. Here, when C++ initialize the CRT, it passes CRT’s implementation (msvcrt!CxxUnhandledExceptionFilter). When unhandled exception occurs, the function checks the exception code. If it is a C++ exception, it shows up the first dialog, otherwise it bypass it to the default handler (ernel32!UnhandledExceptionFilter) provided by the OS. For the 1st situation, the callstack is:
For the second, it is
For detailed info, please refer to:
Does above analysis help explain the Dr. Watson’s behavior? To be honest, I do not think so. I think it is due to Dr. Watons’s special handling on different exception types. The detailed research can be found at:
Debug Event – communication between the OS and the debugger
Notification, also called Debug Event, it is a mechanism for OS to notify debugger when some thing happens. Similar as exception handing, OS dispatches the notification when some thing happens if the debugger is attached. Unlike exception, the notification can only be monitored by the debugger, not the target process. Also, there is no 1st chance and 2nd chance differences. In windbg’s help file, all the notifications are listed in the Controlling Exce
With exception and notification, we can capture the key for issue.
Case study, VB6’s version.
Customer’s VB6 application is not able to open data file created by Access 2003 in developer machine. It works fine for data created by Access 97. In other machines, both Access 2003 and 97 work fine.
They way to think is direct. Since it occurs in a specified machine, it means the issue is about the environment, not the code. Since it is about Access version, it should be related to the DAO’s version. By checking the modules loaded by the EXE, I found dao350.dll was loaded instead of dao360.dll. The next step is to figure out why dao350.dll gets loaded instead of dao360.
DAO is a COM component. It is likely created by COM API. A simple way is to trace the execution of the COM API like CoCreateInstanceEx with wt command, like I did in ShellExecute case. However, if we really try that, the wt command may execute for a whole day. It would be better if we can find a more workable way. Since we will trace until the library loading, why not set breakpoint at LoadLibrary to check how the dao350.dll gets loaded?
It is a very good way to set breakpoint on LoadLibrary because:
1. DLL loading is not necessary through LoadLibrary. Native API like ntdll!LdrLoadDLL may load the module directly.
2. If there are hundreds of DLLs to be loaded, breaking into LoadLibrary is troublesome, even if we can set conditional breakpoint to filder.
The better way is to leverage notification. During module load, OS sends notification to the debugger. In Windbg, we can use wide char to match and filter the DLL filename. It is easy to operate. Firstly, use “sxe ld:dao*.dll” command intercept the module load notification. When the filename is dao*.dll, the debugger breaks. (For windbg detailed usage, we will cover in next sections). The result in debugger is:
0:008> sxe ld:dao*.dll
ModLoad: 1b740000 1b
eax=00000001 ebx=00000000 ecx=0013e301 edx=00000000 esi=7ffdf000 edi=20000000
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
By checking the parameter of the LoadLibraryExW, it shows
0:000> du 0013ea40
0013ea40 "C:\Program Files\Common Files\Mi"
0013ea80 "crosoft Shared\DAO\DAO360.DLL"
With above information, we see:
1. DAO360 is not created by CoCreateInstanceEx. Instead it is created by CoGetClassObject. If we trace CoCreateInstanceEx, it wastes time.
2. COM invocation starts from VB6!_DBErrCreateDao36DBEngine function. We should check the function in detail.
With previous DLL hell’s lesson, here the first thing is to check VB6.EXE’s version since the function resides in VB6. Compared with normal condition, the workable module version is 6.00.9782, while the problematic one is 6.00.8176. By installation of VS6 SP6, the issue gets fixed.
(In Chinese version, I discussed how to analysis the dump even if the dump is not captured at the first place when exception happened. I have to skip here.)
Exit proactively for unhandled exception
In some situation, the developer exits the application proactively when unhandled exception occurs, instead of waiting for the OS to terminate it. COM+, ASP.NET use this kind of tech. A Chinese C2C software called taobao wangwang (also named ali wangwang) uses this kind of tech too. The benefits are:
1. We can define the UI for the crash.
2. We can save the unhandled exception info for postpone analysis.
3. To avoid the interference of the debugger, guarantee the immediate recycle, and try the necessary rescue operation like restarting the process.
It is easy to implement. One way is to use the __try and __except clause. The other way is to use SetUnhandledExceptionFilter API. For the study of taobao wangwang, please refer to:
Based on my analysis, taobao uses SetUnhandledExceptionFilter to capture unhandled exception, and use MiniDumpWriteDump API to capture the dump proactively.
With this tech, the debugger is hard to get the dump for crash directly. Some additional configuration and windbg command is necessary
How To Obtain a Userdump When COM+ Failfasts
How to find the faulting stack in a process dump file that COM+ obtains
How to troubleshoot UnhandledExceptionFilter
Based on MSDN, UnahandledExceptionFilter will be invoked only if the debugger is not attached. Thus we can use UnahandledExceptionFilter to bypass the trace of debugger, to protect some sensitive code. To avoid debugger’s check, there are two ways at least:
1. The target uses IsDebuggerPresent API to check if the debugger is attached. If so, it refuses to execute the sensitive code.
2. Put the sensitive code to a function, and register the function as UnHandledExceptionFilter. To execute the sensitive code, just trigger an exception manually. Due to the design of exception handling, it avoids the debugger’s trace.
For the first way is easy to by pass. Look at the implementation of IsDebuggerPresent:
:000> uf kernel32!IsDebuggerPresent
282 77e64866 8b4030 mov eax,[eax+0x30]
282 77e64869 0fb64002 movzx eax,byte
283 77e6486d c3 ret
IsDebuggerPresent checks the flag in FS register. (FS:]:30 saves PEB of current process). In debugger, we can change any of the register easily. Here we just need to change value of [[FS:]:30]:2 to 0 to cheat IsDebuggerPresent to return false.
For the second way, changing [[FS:]:30]:2 does not work because the judgment is based on the result of a kernel call. However, it does not mean impossible. Kwan Kyun Kim provides a way to cheat:
How to debug UnhandleExceptionHandler
Next I will discuss memory, including Heap, Stack, and the lovely heap corruption and pageheap.