You know it always happens – you can run a demo everyday for two years and then when you’re giving a webcast available to the world – something goes horribly wrong. OK, so it wasn’t horribly wrong, but I did manage to flub it up – you can check out me tripping over myself here.
I figured out that eventually there were two problems. First, I found the bug before I ran the webcast, but only changed the code on my slide and not actually in the code (the code in the webcast is a newer, cleaner version than that on SQLServerDataMining.com). The problem was that I returned the minimum probability ratio of the input attributes and not the likelihood of the case.
Second was that we currently don’t implement smoothing in our clustering prediction – something we will fix in a future Community Technical Preview drop. What smoothing does is ensure that no predictions of known states would ever go to 0. During the demo, I had set Age to 25, Education Years to 25 and Relation to Head of House as Son. My data mining enhanced application came back and correctly identified Education Years as the incorrect field. However, when I raised Education Years to 50, it said that Relation To HOH was out of whack. I guess it’s true that noone with 50 years of education is likely to be living with his dad, but it’s not what I expected. Looking into the details showed that the probability distribution was on the order of 10^-83, but the probability of “Son” was 0. Zero pretty much trumps 10^-83 – smoothing will mitigate this effect.
Anyway, Raman Iyer on my team has released the latest issue of The Data Miner. This month’s issue includes the following topics
SQL Server Data Mining Wins New Converts in Denmark
SQL 2005 Data Mining Programmability Article on MSDN
Apollo Data Tech Launches Predictive Analytics Solution Based on SQL Server 2005 DM
New Visualization Tools for Clustering and Decision Tree Models
Microsoft Hometown IT Mag Highlights SQL 2005 Data Mining
Data Mining Webcasts: New and Upcoming