Back in July, I posted a step by step flow to troubleshoot performance issues on Dynamics AX 2012. Today I would like to share the same list but this time for the latest release of Dynamics AX available on Microsoft Azure through Lifecycle Services.
Because the new Dynamics AX is running as a Cloud Service managed by Microsoft, it means Service Engineering team is constantly monitoring system and resource usage on all production instances and get notified of any health or availability incidents. For complete definition of Service Level Agreement, please check the SLA website for Microsoft Online Services.
As a customer, you cannot anymore access the Virtual Machine directly in Production, and of course the Azure SQL database running in Microsoft subscription. As a consequence, you will not be able to reuse the same tools as for Dynamics AX 2012: for example, SQL Server Management Studio, Performance Analyzer and Performance Monitoring. However, full admin rights in development environment allows you to run these tools. Please expect some other blog post on Performance Tuning with DynamicsPerf 2.0.
However, there are new tools provided by Lifecycle Services for administrator to monitor and analyze a live Dynamics AX instance. These new tools are automatically deployed in production and are running 24x7 as soon as the solution is deployed by Lifecycle Services. Of course, this list is subject to change and we will update this article based on the new capabilities provided by Azure and Lifecycle Service platform.
If you are familiar with DynamicsPerf, you can understand how the new Telemetry dashboard is designed. It collects data (performance counters and ETW events) directly from the live environment and aggregates it so that you can search reactively for performance issues but also investigate proactively for usage patterns across one application, such as slow query.
This article will focus only on the reactive performance analysis. Future post will be more focused on the Monitoring aspect of LCS Telemetry.
1 / Activity Dashboard for all user's activity
under Lifecycle Services, click on Cloud Hosted Environment and then Environment Monitoring:
You can see all the recent active requests in the Activity dashboard: by default, the range is the last 15 minutes but you can manually filter the timeline and the period to go back in time. The maximum period is 60 minutes:
The red line is the SQL Utilization and the other lines are the sessions. Therefore, you can easily see how many users are connected on that instance. If you see a peak of SQL DTU (Database Transaction Unit) you can manually drag and drop to zoom in the range around the peak:
When you zoom a specific time range, the whole dashboard will automatically be refreshed so it makes the investigation even more easy. To disable the zoom, click on the button "Reset zoom ":
To drill down the activity per AOS and per user session, click directly in each role you want to analyze. It will automatically refresh the statistics relevant to the session selected and show you what the user was doing:
In the Activity Load section, you can see more details of the type of activity per session, for example batch, DMF, Management Reporter:
In User Activity, the last section of the Activity dashboard, you can see the details of all the events per session. Select one user session, and search for specific time stamp when error occurs. It is also possible to export the complete list of events to CSV file for extensive analysis in Microsoft Excel.
2 / Find Slow Queries in Raw Logs
To search usage patterns, you can analyze the Raw Logs from the Activity dashboard:
Right after you click the Raw Logs, choose one specific Query from many available. Slow Queries view shows all statement that takes more than one second:
You can access the Call Stack from the Business Logic behind the SQL queries: if it is customized code you can work on the code optimization, if it is standard code you can open a support case. You can also filter Raw Logs per User Session and specific Time Line.
3 / SQL Insights for Long Running Queries
In the SQL Now section, you can see all the SQL statements currently running with the CPU Time, the number of logical Reads and the Total Elapsed Time in Milliseconds. So if you filter by Total Elapsed Time, you will directly detect if one SQL query has been running for more than few seconds:
4 / Detect the blocked queries with SQL NOW
You can troubleshoot SQL issues in real time by viewing the queries that are blocked and the queries that are blocking them. You can also view aggregated lock information for tables that currently have locks on them. Under SQL Insights, click on SQL NOW and look at Blocking Statements:
Note: You can also look at the Raw Logs and filter the query: "All Deadlocks in the system"
5 / Capture and analyze an application Trace
If you can reproduce the issue in Production, you can also take a Trace directly from the browser under Settings:
First, provide a name for the trace and then click Start Trace. Remember to provide a relevant name so that later, once it is imported in Trace Parser, you can still remember the context, especially if multiple traces are taken for the same scenario.
Import the trace in development in the Trace Parser tool and start the analysis.
Note: The tool Trace Parser can be installed directly from the K drive of the Virtual Machine under Retail\Services\PerfSDK\Scripts.
6 / Use Debug Mode
From the Internet Browser, add "&Debug=develop" in the URL to enable the Perf Timer feature. This is only useful if you can reproduce the issue in production with repeatable timing. Once the Perf Timer is enabled, you can see the time in milliseconds for every interaction:
Click to expand the right panel Performance and search for Server and Client calls. The Perf Timer lets you see whether any query or any specific method call is causing a performance issue. Therefore, you don’t have to take a trace and analyze everything in detail.
7 / Performance Profiler
Another powerful tool provided directly by Internet Explorer is the Performance Profiler. Click F12 and go to Performance tab to start the profiling. You can see CPU Utilization and the Visual Throughput to identify slowdown in the User Interface:
Principal Premier Field Engineer