Getting Data Mining results into SQL Tables

I've been seeing lots of questions about how to get data mining results into SQL tables.  Most times the answers are "use the prediction query builder save button" or "build an SSIS package."  Both of these have issues among them being that the former is really only suited to single-use, small jobs, and the latter has a lot of overhead (not to mention that if you want to use the Data Mining Query Transform you have to have Enterprise Edition).

Luckily, there happens to be a much easier way - and it's one of those "Doh!" moments when you learn about it, because it's that easy.

The way to do it is to simply use linked servers.  Anyone who uses DMX knows to connect to SQL data with OPENQUERY.  There's no reason you can't simply connect to Data Mining data using the same mechanism.

For example, use a query like this to establish a link to an AS server:

 EXEC sp_addlinkedserver 
@server='LINKED_AS', -- local SQL name given to the linked server
@srvproduct='', -- not used 
@provider='MSOLAP', -- OLE DB provider 
@datasrc='localhost', -- analysis server name (machine name) 
@catalog='MovieClick' -- default catalog/database 

Then you can select data using OPENQUERY like this:

 SELECT * FROM 
OPENQUERY(LINKED_AS, 
  'SELECT Cluster() AS [Cluster], ClusterProbability() AS [Prob] 
   FROM [Customers - Clustering]
   NATURAL PREDICTION JOIN
   OPENQUERY([Movie Click],''SELECT * FROM Customers'') AS t')

Then, of course, you can do all kinds of manipulations on it, like finding the average cluster probability of each cluster, right?  Well, almost, the data type returned by the Cluster function is actual text or ntext or something that GROUP BY chokes on, so you have to do some casting first.  Therefore if you want to do that trick, use a query like this:

 SELECT Cluster, AVG(Prob) FROM
(SELECT CAST(Cluster AS Char(30)) AS Cluster, Prob FROM OPENQUERY(LINKED_AS, 
  'SELECT Cluster() AS [Cluster], 
      ClusterProbability() AS [Prob] FROM [Customers - Clustering]
   NATURAL PREDICTION JOIN
   OPENQUERY([Movie Click],''SELECT * FROM Customers'') AS t')
   ) AS t
GROUP BY Cluster

That will give you a nice result showing you, in a way, the affinity of each cluster based on the input set.  That is, if you ran such a query against the training data, you could say that the clusters with a higher probability are "tighter" than the ones with low probabilities.  Anyway, that's besides the point of this post.

In any case, remember to double your single quotes and flatten any nested results and this technique should work just great for getting DMX into SQL.

-J