Data Mining news from the trenches

As we wind down on Beta 3 the data mining team has been getting some good exposure. We’ve had several customers in our SQL Labs visiting primarily for data mining solutions.

 

One had implemented a web-based cross sales application using Decision Trees in SQL Server 2000. On their moderate hardware it was taking 3 days to produce recommendations for their millions of customers. Despite the time, they were very happy with the implementation, because they doubled their sell-through rate by recommendations as a result. In our labs, we implemented their solution using SQL Server 2005 Decision Trees on a 4-way box and were able to generate recommendations for all of their customers in only 45 minutes! One of the tricks we used was to divide their data into four partitions and use an Integration Services package with four data mining query tasks running in parallel.

 

Another customer had been using a different data mining product using Logistic Regression to create a “data mining scorecard.” We were able to demonstrate how to set the parameters of the Neural Net algorithm to make it behave as Logistic Regression, and, to top it all off, Bogdan wrote a Data Mining stored procedure to generate the scorecard data which we were able to display and distribute using Reporting Services. For those who download the upcoming Community Preview drop, you will see that we’ve encapsulated the parameter settings that cause Neural Nets to behave like Logistic Regression in a new algorithm, Microsoft_Logistic_Regression.

 

We’ve also gotten some good press. For those new to SQL Server Data Mining, be sure to check out Alexei and Jesper’s article in SQL Server Magazine, Data Mining Reloaded. InformationWeek ran this article on David Heckerman’s, et al, work involving using SQL Server 2005 Data Mining technologies in the search for new AIDS vaccines. Readers of our newsletter would have already known about this, we reported it in our December issue.

 

Finally our old friend Richard Lees has developed a site that has many live samples for SQL Server 2000 OLAP and Data Mining running on a 64-bit Itanium machine. Users of our sqlserverdatamining.com site may recognize the ThinMiner application, it was the inspiration and source for our XMLA Thin Miner sample.