In this post I'm going to explain how to debug an error occurring in Dynamics 365 for Finance and Operation on-premises - directly in the on-premises environment, where Visual Studio isn't available, by using a free tool called WinDbg.
This approach gives a fast way to catch exceptions occurring in the environment and identify the call stack, more detailed error message (for example to see inner exceptions) and to see values for running variables at the time of the exception. You can use this approach not only for debugging the AOS itself, but actually for any component in Windows which is running .NET type code - for example if SSRS was throwing an exception, you can do the same thing to debug SSRS itself.
It does not give a full X++ debugging experience as you would normally have using Visual Studio with the Dynamics dev tools installed - I will be making another post soon explaining how to hook up Visual Studio to debug your on-premises instance to debug.
WinDbg is a very powerful debugging tool and can be used in many different scenarios - for example debugging an exception occurring in any Windows software or analyzing memory dumps (also known as crash dumps) from a Windows process.
In this document we'll look at one particular scenario to give an introduction to the tool and how it can be helpful in conjunction with Dynamics 365 for Finance and Operations on-premises to troubleshoot exceptions.
The example scenario here is:
- I have an external application trying to call into Finance and Operations web services
- The call is failing with "Unauthorized" in the calling application
- There is no error in the AD FS event log - AD FS is issuing a token fine, but the AOS is denying the call.
- I want to know why I am "Unauthorized" because it seems AOS should be allowing me
First install WinDbg, this is available from the Windows SDK here
Note: there is a newer version of WinDbg currently in preview available in the Windows Store here, but my post here is only dealing with the old current released version.
Most of the install tabs you can click next-next - but when choosing which options to install, uncheck everything except the "Debugging tools for Windows" as shown below:
Once the installer completes you will find WinDbg on your Windows start menu - both x64 and x86 versions (and ARM and ARM64) will be installed. The rule for debugging .NET code with WinDbg is to match the version of WinDbg to the architecture of the process - 32 bit process, 32 bit WinDbg and 64 bit process, 64 bit WinDbg. As we are going to debug the AOS which is 64 bit, we'll need to open WinDBgx64 - MAKE SURE to run as Administrator, otherwise it won't let you attach to the process.
In a typical on-premises environment there w3ill be 3 AOS instances - when we're debugging we're not sure which of the 3 AOS we'll hit, so we want to turn off the other two, then we know everything will hit the remaining one, and we can debug that one. There are two options to do that:
1. Shut down the other two AOS machines in Windows.
2. From SF explorer, disable the AOS application for the other two AOS - if you take this route then you need to check that AXService.exe has actually stopped on both of those AOS machines in task manager - because I've found that it doesn't always stop immediately, it'll sit there for a while and requests will continue to go to them.
Now we have the tool installed we're ready to debug something. In WinDbg go to "File"->"Attach to process..", a dialog will open showing all the current running processes on that machine - select "AXService.exe" and click ok. It's easier to find in the list if you select the "by executable" radio button, which will alphabetize the list.
WinDbg is a command line debugger, at the bottom of the Windows there is a box where you can enter commands for it to execute - that's primarily how you get it to do anything.
As we're going to debug .NET code, we'll first load an extension for WinDbg which will help us to decode .NET related information from the process. This extension exists on any machine which has the .NET framework installed. Enter this command and hit enter:
Next we're going to tell WinDbg that when a .NET exception occurs it should stop the process on a breakpoint, because we don't have source code available in an on-premises environment, the easy way for us to set a breakpoint is to base it on exceptions. The command for WinDbg to break on exception is "sxe" and the exception code is "e0434352", we always use the same exception code here, because that is the native Windows code representing all .NET type exceptions.
Now we need to let the process run again - because when we attached to the process WinDbg automatically put a "break" on it - we can tell if the process is running or not - if it's running it says "Debuggee is running.." in the command prompt. To let the process run again enter "g" meaning go.
After entering "g" you see it is running again:
Ok now we're ready to reproduce our issue, so I'm just going to my client application and making the error happens, then in WinDbg I see this. Note that the client application will seem to "hang", this is because WinDbg is stopping the AOS on a breakpoint and not letting it complete the request:
We can run a command to show us the exception detail "!pe". This command comes from the sos.dll extension we loaded earlier, the use of "!" denotes it's coming from an extension. Note that WinDbg is case sensitive on everything you enter.
Here I can see the exception from within the AOS - it's hard to see in the screenshot, so here's the full text:
Exception object: 000002023b095e38
Exception type: System.IdentityModel.Tokens.SecurityTokenInvalidAudienceException
Message: IDX10214: Audience validation failed. Audiences: 'https://ax.d365ffo.zone1.saonprem.com/namespaces/axsf/'. Did not match: validationParameters.ValidAudience: 'null' or validationParameters.ValidAudiences: 'https://ax.d365ffo.zone1.saonprem.com, 00000015-0000-0000-c000-000000000000, https://ax.d365ffo.zone1.saonprem.com/'
I'm not going explain the example error message in this post - but if you're interested it is explained here
Next we can see the call stack leading to this exception by running "!clrstack", it's worth noting that the first time you run this command on a machine where it's the first time you've used WinDbg it might spin for a couple of minutes - that happens because WinDbg is looking for symbols - after the first time it'll run straight away. This command is useful to understand what the AOS was trying to do when the exception occurred - its not necessary to have all of the source code to make sense of the call stack - most times I am looking at this I am simply reading the method names and making an educated guess about what it was doing based on the names (of course it's not always that simple, but often it is).
Last command for this post, is to show the running .NET variables relating to the call stack we just saw. This command is useful, to understand what values the AOS was running with - similar to my approach with !clrstack, I am simply looking through this list of human readable values - something I recognize - for example if it was an exception in a Purchase order process I'd be looking for something which looks like a vendor account number or PurchId. This is particularly useful when the value the AOS is running with, isn't the value that you expect it should have been running with.
That's all for now, happy debugging!