page contents

How to analyze live performance issues with Dynamics 365 for Operations


Back in July, I posted a step by step flow to troubleshoot performance issues on Dynamics AX 2012.  Today I would like to share the same list but this time for the latest release of Dynamics AX available on Microsoft Azure through Lifecycle Services.

Because the new Dynamics AX is running as a Cloud Service managed by Microsoft, it means Service Engineering team is constantly monitoring system and resource usage on all production instances and get notified of any health or availability incidents. For complete definition of Service Level Agreement, please check the SLA website for Microsoft Online Services.

As a customer, you cannot anymore access the Virtual Machine directly in Production, and of course the Azure SQL database running in Microsoft subscription. As a consequence, you will not be able to reuse the same tools as for Dynamics AX 2012: for example, SQL Server Management Studio, Performance Analyzer and Performance Monitoring. However, full admin rights in development environment allows you to run these tools. Please expect some other blog post on Performance Tuning with DynamicsPerf 2.0.

However, there are new tools provided by Lifecycle Services for administrator to monitor and analyze a live Dynamics AX instance. These new tools are automatically deployed in production and are running 24×7 as soon as the solution is deployed by Lifecycle Services. Of course, this list is subject to change and we will update this article based on the new capabilities provided by Azure and Lifecycle Service platform.

If you are familiar with DynamicsPerf, you can understand how the new Telemetry dashboard is designed. It collects data (performance counters and ETW events) directly from the live environment and aggregates it so that you can search reactively for performance issues but also investigate proactively for usage patterns across one application, such as slow query.

This article will focus only on the reactive performance analysis. Future post will be more focused on the Monitoring aspect of LCS Telemetry.

 


1 / Activity Dashboard for all user’s activity

under Lifecycle Services, click on Cloud Hosted Environment and then Environment Monitoring:

summary

You can see all the recent active requests in the Activity dashboard: by default, the range is the last 15 minutes but you can manually filter the timeline and the period to go back in time. The maximum period is 60 minutes:

activity

The red line is the SQL Utilization and the other lines are the sessions. Therefore, you can easily see how many users are connected on that instance. If you see a peak of SQL DTU (Database Transaction Unit) you can manually drag and drop to zoom in the range around the peak:
Select zoom

When you zoom a specific time range, the whole dashboard will automatically be refreshed so it makes the investigation even more easy. To disable the zoom, click on the button “Reset zoom “:

reset zoom 2

To drill down the activity per AOS and per user session, click directly in each role you want to analyze. It will automatically refresh the statistics relevant to the session selected and show you what the user was doing:

user load

In the Activity Load section, you can see more details of the type of activity per session, for example batch, DMF, Management Reporter:

acti load

 

In User Activity, the last section of the Activity dashboard, you can see the details of all the events per session. Select one user session, and search for specific time stamp when error occurs. It is also possible to export the complete list of events to CSV file for extensive analysis in Microsoft Excel.

user activity details


2 / Find Slow Queries in Raw Logs

To search usage patterns, you can analyze the Raw Logs from the Activity dashboard:

click raw logs2

Right after you click the Raw Logs, choose one specific Query from many available. Slow Queries view shows all statement that takes more than one second:

Raw Logs

You can access the Call Stack from the Business Logic behind the SQL queries: if it is customized code you can work on the code optimization, if it is standard code you can open a support case. You can also filter Raw Logs per User Session and specific Time Line.

Slow queries


3 / SQL Insights for Long Running Queries

In the SQL Now section, you can see all the SQL statements currently running with the CPU Time, the number of logical Reads and the Total Elapsed Time in Milliseconds. So if you filter by Total Elapsed Time, you will directly detect if one SQL query has been running for more than few seconds:

current exe statements2


4 / Detect the blocked queries with SQL NOW

You can troubleshoot SQL issues in real time by viewing the queries that are blocked and the queries that are blocking them. You can also view aggregated lock information for tables that currently have locks on them. Under SQL Insights, click on SQL NOW and look at Blocking Statements:

blocking

Note: You can also look at the Raw Logs and filter the query: “All Deadlocks in the system 


5 / Capture and analyze an application Trace

If you can reproduce the issue in Production, you can also take a Trace directly from the browser under Settings:

Trace

First, provide a name for the trace and then click Start Trace. Remember to provide a relevant name so that later, once it is imported in Trace Parser, you can still remember the context, especially if multiple traces are taken for the same scenario.

TPstart

Import the trace in development in the Trace Parser tool and start the analysis.

TraceParser

Note: The tool Trace Parser can be installed directly from the K drive of the Virtual Machine under Retail\Services\PerfSDK\Scripts.


6 / Use Debug Mode

From the Internet Browser, add “&Debug=develop” in the URL to enable the Perf Timer feature. This is only useful if you can reproduce the issue in production with repeatable timing. Once the Perf Timer is enabled, you can see the time in milliseconds for every interaction:

perftimer

Click to expand the right panel Performance and search for Server and Client calls. The Perf Timer lets you see whether any query or any specific method call is causing a performance issue. Therefore, you don’t have to take a trace and analyze everything in detail.


7 / Performance Profiler

Another powerful tool provided directly by Internet Explorer is the Performance Profiler. Click F12 and go to Performance tab to start the profiling. You can see CPU Utilization and the Visual Throughput to identify slowdown in the User Interface:

F12

 

Regards,

@BertrandCaillet
Principal Premier Field Engineer

Comments (2)
  1. Even if SQL Server is the usual suspect for poor performance in an AX environment, it might happen that the bottleneck lies somewhere else – AOS, Citrix, network, client or other software that runs in the same environment. Similarly even if long running queries have a considerable impact on AX’s performance, the bottleneck might reside in improper configuration, resources, indexes, workload, design or system’s misusage.

    I have a slightly different routine for troubleshooting the performance of an AX environment. This applies mainly to on-premise environments (I couldn’t comment on the previous post as commenting was closed).

    Step 0: Check the Environment
    The following two steps need to be performed as soon as one becomes responsible for administering the server. Even if they are listed first, they should be performed after checking the critical resources.

    Step 0a: Check Environment’s Configuration
    When dealing with a new environment check first the configuration of SQL Server, database and AOS against the recommendations made by Microsoft. Unfortunately not everybody understands what it means to have an adequately configured environment.

    Step 0b: Check Environment Changes
    Check whether recent changes were made to the environment – (cumulative) service packs or new software installed, etc. Typically this information should be available under Configuration Management reports or at least in a change log for the server.

    Step 1: Check Critical Resources
    If there’s an issue with a critical resource, then more likely needs to be addressed as soon as possible, before attempting any other troubleshooting. As soon the critical issues were addressed, one can proceed with the further troubleshooting.

    Step 1a: Check Available Space
    Check the actual size of the tempdb, data files and log files. It might happen that your environment is running out of space somewhere. The check should be performed even if monitoring is in place.

    Step 1b: Check the Event Log of SQL Server and AOS
    Critical issues appear in one form or another directly in event logs.

    Step 1c: Check CPU Utilization.
    If the CPU isn’t utilized at all then more likely the bottleneck is somewhere else, e.g. AOS or Citrix, for those who use Dynamics AX over a Citrix farm. I met cases in which one or more AOS weren’t behaving as expected.

    Step 1d: Check Memory Utilization
    Step 1e: Check Network Utilization

    Step 2: Check for Blocking Queries and Deadlocks

    Step 3: Check Current Workload
    Step 3a: Check the Active Transactions
    There are many the cases in which queries have individually acceptable performance, but when run as part of a job within a batch they perform poorly. This provides a first overview of the workload.

    Step 3b: Check the Running AX Jobs
    It might happen that a job takes longer than usual or that 2 or more system/user Jobs overlap.

    Step 3c: Check Growth
    Check the growth of data and log files, how many transactions were processed during the last hours. It might happen that a large volume of transactions was processed, fact that could affect system’s performance.

    Step 3d: Check System Usage
    Check how many users are in the system. It might happen that the number of active users approaches or outruns the value used for sizing the system.

    Step 4: Check Index Defragmentation and Tables’ Statistics

    Step 5: Check Long Running Queries in SQL Server and in AX, via SQL Trace

    1. Thank you Adrian for your comment but this article is focusing on Dynamics AX Public Cloud using SQL Azure. The checklist you are mentioning is more for On Promise solution or when full access is given, typically in Pre Production environment. You can see such checklist on our blog as well: https://blogs.msdn.microsoft.com/axinthefield/dynamics-ax-performance-step/

      Thank you,
      @BertrandCaillet

Comments are closed.

Skip to main content