Querying the Dependency Net

OK, ok, so just yesterday I posted that it was easy to determine what queries were being used by the browsers to get the data underlying the view.  Of course it’s easy to get them, but without a teensy weensy bit of documentation, it’s not necessarily easy to understand what the parameters mean or possibly what the results mean.

For instance, take the dependency net – everybody loves the dependency net – it’s cool and shows that “high-level information” that everybody craves.  For example, this picture showing how “Die Hard” is the nexus linking Beverly Hills Cop with the Lethal Weapon family of movies.

image

Whatever.

Anyway, to get this information we call the deceptively simple ARGetNodeGraph function, that is, for Association Rules models, like this:

 CALL System.ARGetNodeGraph('Associate Movies', 60)

With the first parameter being the name of the model and the second being the number of nodes to return.  The function chooses nodes using a heuristic that considers the popularity of the node and balances between inputs and outputs in order to produce a nice result.

That’s easy enough – let’s take a look at the output (truncated for space)

image

Whoa!  That’s a bit more complex.

Basically the output is divided into two sections indicated by the "NODE_TYPE” column.  The “NODE_TYPE” column actually has nothing to do  with a node type and (if I had to guess, which I don’t) I would say that NODE_TYPE was used to reuse names from the MINING_CONTENT schema rowset rather than be the most accurate moniker for the column itself.  NODE_TYPE is actually the ROW type, and has the values 1 or 2.  If the NODE_TYPE is 1, then the row represents a NODE in the graph.  If the NODE_TYPE is 2, the row represents an EDGE in the graph.  All of the other column interpretations depend on the type of row.

For NODE rows (NODE_TYPE=1, for those readers with serious short-term memory issues), the columns describe a node like this:

  • NODE_UNIQUE_NAME_1 – This is an ID used to identify the node in the edges section.
  • NODE_UNIQUE_NAME_2 – The default label for the node
  • MSOLAP_NODE_SCORE – Unused
  • ATTRIBUTE_NAME – The attribute name part of the default label, used to compose and decompose different options for node displays
  • ATTRIBUTE_LONG_NAME – The full attribute name – i.e. if the attribute came from a nested table, it has the name of the nested table, plus the specific attribute name in parenthesis.  Again used for label composition.
  • ATTRIBUTE_VALUE – The value name part of the default label, again used for label composition.

For EDGE rows (NODE_TYPE=2) the columns describe a directed edge like this:

  • NODE_UNIQUE_NAME_1 – The ID (from the NODE rows) of the source of the edge
  • NODE_UNIQUE_NAME_2 – The ID (from the NODE rows) of the destination of the edge
  • MSOLAP_NODE_SCORE – The strength of the edge – I don’t believe there is any specific range
  • Other columns - unused

So, that’s how the dependency net gets created – initially.  There are actually many additional functions used by the dependency net to, as you may say, fill out the graph.

For example, if you click the “Find Node” button in the dependency net browser, the browser issues this call:

 CALL System.ARGetNodes('Associate Movies')

This call returns a result set like the NODE section of the ARGetNodeGraph, except without the NODE_TYPE column, with a row for every possible node – not just the top 60.  The only parameter is the name of the model.

If you select a node that is not already in the graph, this is where it gets a bit interesting.  The browser issues a call like this:

 CALL System.ARAddNodes('Associate Movies', '600', 
     '604;726;648;733;630;700;719;718;130;387;386;712;621;727;670;154;337')

 

ARAddNodes has the following parameters:

  • strModel – the model name, but you already new this
  • strNodesToAdd – a semi-colon delimited list of node ids to add to the graph
  • strNodesInGraph – a semi-colon delimited list of node ids already in the graph

The result set looks like the EDGE section of the result of ARGetNodeGraph without the unused columns and contains only the edges between the nodes identified in strNodesToAdd and those identified in strNodesInGraph.  Note that the node id’s that are used are only those returned from ARGetNodeGraph or ARGetNodes and are not node id’s from the model content schema rowset.

NB When you see the function calls in SQLProfiler, you will get the fully qualified function name, e.g. System.Microsoft.AnalysisServices.System.DataMining.AssociationRules.ARAddNodes.  You can eliminate all the intermediate namespaces and just call System.<function name>.

NB2 There are a set of equivalent stored procedures for Decision Trees, that you can probe by browsing a tree model’s dependency network

NB3 Nope, you won’t find a Naive Bayes version by browsing a NB model’s dep net – that browser was never “updated” to use stored procedures to get dependency network information.  You can use the Visio Data Mining Template and see what functions are called….but their different…..