Properly collecting a User Mode memory dump is only the first step in uncovering the cause of a crash or hang. The remainder of this post will assume that you have already configured WinDBG correctly and captured a memory dump using the techniques outlined in previous posts.
For the purpose of this posting we will assume the following scenario.
You are a software vendor that has written an automated banking machine application. Several times a day the kiosk is restarted by the customer because the application has crashed. In an effort to identify the cause of the crash, which happens when you are not there, you have used ADPLUS to collect a User Mode memory dump. The memory dump has been copied onto your machine and you are ready to start debugging.
Open the dump file by selecting the “Open Crash Dump…” option found under the “File” menu within WinDBG. Browse to the appropriate memory dump file and click the “Open” button. After a few moments WinDBG will return control to you and a prompt should be seen that is similar to “0:000>” (ProcessId:ThreadId>) as seen in the bottom centre of the image below.
Some information will become immediately available to you in the text that is displayed above. One important piece of information is the “x86 compatible” indicator. This identifies to us that we should be using the 32 bit version of WinDBG, which I am.
At this point we can start debugging, or if you are impatient like I am and would prefer to get responsive feedback throughout your entire debugging session you will want to take the hit of loading your symbols up front while you have a coffee.
All commands are entered in the command window to the right of the prompt, which is seen above as “0:00>”. To display the symbols load status type the lm command and press enter. The lm (list modules) command displays a list of the modules that are present in the dump.
This command displays in the first two columns the start and end address of the module in memory. This can be very helpful when trying to solve load address issues as well as trying to determine the module that contains a specific address in memory. The module name is often the name of the binary without the extension (ex. BankingClient.exe would be BankingClient). The right-most column is the location of the symbols that were loaded for the module, or deferred if they will be loaded later. The reason symbols are not loaded up front is due to the lazy loading mechanism WinDBG uses helping the user to avoid a long start-up time.
To force the debugger to load the matching symbols it has access to immediately we execute the “.reload /f” command. After executing that command the debugger will be flagged as busy until it has finished.
The first step is to let the debugger perform its own automated analysis. It is recommended to ensure you keep your version of the debugging tools up to date to ensure that the analysis is current and includes new analysis details. Run the “!analyze –v” command and examine the results.
The error code along with parameters help to understand the failure, but one of the most important output elements is the call stack. A call stack has the most current frame (function call) at the top. From this we can see that there was a C++ exception thrown (_CxxThrowException) from our TransferFunds method. To the right of the frame identified we can see the line of source code which was being executed.
To see the parameters passed to each of the frames that we have the private symbols for we execute the “kPn” command. This will display frame numbers to the left and a separate line for each parameter.
It can be seen that the method which has the failure is frame 02 executing the TransferFunds method with an amount 1000 and an Account object located at 0x001dfa48. To view the contents of the data structure we can execute the “dt ServerAccount::Account 0x001dfa48” (dump type). This displays the information about the types structure populated with the information found at memory address 0x001dfa48.
We can see that the AccountHolderName is “Bank of Some Place” with a financial institution number of 10, branch number of 5645 and account number of 42.
From reviewing the lines of code leading up to the one that caused the exception I see that the method will generate an error if the account details seen above are not found in an array of accounts named ExternalAccounts.
To find the value of the first element of the ExternalAccounts array we first change the current frame context to be frame 2 by using the “.frame 0x02” command. Once the context has been changed use the “?? ExternalAccounts” command to view the value.
Reviewing the text above shows that there are 2 elements in the array (based on the  indexer) and the elements are ServerAccount::Account types. To view all of the elements in the array use the dump type command “dt –a2 ServerAccount::Account 0x001df76c”. The –a2 parameter indicates the number of elements to display.
Reviewing the contents of the array above we find that the financial institution identifier of 10 was not found in the list.
As can be seen in this example, a developer can quickly identify exception causes by capturing a memory dump and analyzing a few basic debugger commands.