I recently posted a earnest plea for beta users to report and file bugs they found – well we got a good one. A customer reported that when they created a cluster model they got nice clusters and drillthrough returned about 150 cases in each. However when they queried against the source data for which case was in which cluster, all cases came back in Cluster 5. Ouch.
Luckily the customer was able to supply us with their model and data or I don’t think we would have figured it out so fast. It turns out that the bug was somewhere we wouldn’t have thought to look. One of the data columns from their source data contained all 0’s except for one row, which was null. The jury’s still out, but it looks like this causes a special case in our compression code, and basically causes all of the data we see to be effectively randomized. The bug impacts both data mining and OLAP cubes.
You can try this at home. Create a table with a key column and a value column. Set the value column to all 0’s except have a null thrown in somewhere for good measure (pun intended). Create a cube from this data and change the “Source” property of the measure to set “NullProcessing” to “Preserve”. Process and browse the model – and viola! You got a stinker.
Thanks to this customer, you won’t see this in the next community preview. Have you seen stinkers like this? Send ’em on and we’ll clean ’em up.