Using DMVs to Monitor and Terminate Long Running MDX Queries

At one time or other, just about every DBA charged with administering a Microsoft SQL Server Analysis Services server has encountered a long running query that seems to consume machine resources with a voracious appetite. Those queries are typically executed by "Power Users" and almost invariably at a time that other users are attempting to run queries for month end reports resulting in a flury of calls asking if the server is down or encountering a performance issue. It's also rather interesting that the person registering the complaint almost always seems to be in the office of the CFO or CEO. That scenario does happen, and as one might expect, a common request within the Business Intelligence community is for the ability to identify and terminate long running or resource intensive queries that are executing on a Microsoft SQL Server Analysis Services Server. While a query governor would be very useful, unfortunately, Analysis Services doesn't have that type of functionality built into the product. Chris Webb's article on Killing session automatically with SSIS describes one approach to the problem using DMV queries in an SSIS Package. Tim Laqua's article Dealing with long runnign SSAS queries using PowerShell-SSAS is based on the same principle, but uses an unsupported API (Microsoft.AnalysisServices.XMLA.dll).

When a query is executed, significant amounts of memory can be allocated on the server, with much of the data going into a shared cache. The query may also be using data that is already resident in a shared data cache, so the task of determining how much memory is actually being consumed by a given query would be non-trivial. I've tried it before, and was anything but satisfied with the solution that I came up with. The conclusion that I finally reached was that the longer a query executes, especially if it's bottlenecked in Storage Engine, the more resources it's going to consume and the more users it's going to impact and the only viable option was to look at how long a query had been executing. Those efforts, however, lead me to come up with a very simple solution that is based on a single DMV query and an XMLA Cancel command.

The DMV query:

SELECT SESSION_SPID, SESSION_USER_NAME, SESSION_LAST_COMMAND, SESSION_LAST_COMMAND_ELAPSED_TIME_MS FROM $SYSTEM.DISCOVER_SESSIONS ORDER BY SESSION_LAST_COMMAND_ELAPSED_TIME_MS DESC

Returns the user SPID, User Name (I definitely don't want to kill a session opened by the CIO, CFO, or CEO), the last command executed, and the duration of the command in milliseconds. And I get something that looks like the following:

SESSION_SPID SESSION_USER_NAME SESSION_LAST_COMMAND SESSION_LAST_COMMAND_ELAPSED_TIME_MS
4255 NORTHAMERICA\jdesch SELECT measures.members on 0, ([product].[product categories].members,[customer].[customer geography].members) on 1 from [Adventure Works]                                                                                                                                                                    11928 4239 NORTHAMERICA\jdesch SELECT SESSION_CONNECTION_ID, SESSION_USER_NAME, SESSION_LAST_COMMAND, SESSION_LAST_COMMAND_ELAPSED_TIME_MS FROM $SYSTEM.DISCOVER_SESSIONS order by session_last_command_elapsed_time_ms desc 0
3707 NORTHAMERICA\jdesch MDSCHEMA_FUNCTIONS 0
3703 NORTHAMERICA\jdesch MDSCHEMA_FUNCTIONS 0

SPID 4239 is me executing my DMV query, so I don't want to have my connection commit suicide. I am, however, interested in what's happening with SPID 4255. Armed with that information, I can now execute an XMLA cancel command to terminate that particular connection and its associated session by executing the following:

<Cancelxmlns=https://schemas.microsoft.com/analysisservices/2003/engine>

    <SPID>4255</SPID>

    <CancelAssociated>1</CancelAssociated>

</Cancel>

It wasn't a big jump from that to a simple C# application that uses a timer to periodically execute a single very simple DMV query against the server to check for long running queries and cancel sessions associated with queries that exceeded a configurable threshold.