Resource Recommendations - Data Mining

I'm often asked "what references should I have for ... <Insert Technology here>. "

For Data Mining, the MUST HAVE "bible" text book is...

Data Mining with SQL Server 2005

DataMining2005
(See the link for ISBN & similar details)

Why do I love this book?

  1. It walks thru each data mining algorithm, 1 per chapter, explaining exactly how the algorithm works, what the advanced options do & the types of problems it is best suited to solve.
  2. It has a really good coverage of Data Mining eXtensions (DMX). It just makes it really simple & clear how to write your own queries.
  3. It has very complete chapter on using Data Mining in Analysis Services cubes. And another on all the ways you cay use it within Integration Services.
  4. Good coverage on how to incorporate SQL's Data Mining within your code.
  5. Covers how to add your own Extensions DM Algorithms into SQL's Data Mining Engine.

Pretty much everything that can be done it covers.

On the authors

At the time they wrote this book Jamie MacLennan lead the team of developers who wrote the product. It wasn't a big team, so he knew the code really well. ZhaoHui Tang was the Program Manager, he designed what the product would do. Both guys spent a lot of time liasing with the team in Microsoft Research who lead all the breakthrough's on Data Mining algorithm design which let SQL Server Data Mining overcome the memory constraints, common to other data mining products prior to that time. In Short, If these guys didn't know about it, it's not in the product. So there is no-one on the planet more qualified to write the definitive book on SQL's Data Mining. Fortunately they did a good job.

Note: I define a "Bible" Textbook as THE book you should have have at your fingertips if you are working with that product.

 

Honourable mentions ...

DataMining2008
Reading Jamie's Blog I discovered that this book is almost ready to go to the printers. I expect that it will become the new "Data Mining Bible", Jamie says it is improved. But I've not read it & sometimes the sequel is not as good as the original. So until I do, it stays as a mention.

  • Data Mining Techniques For Marketing, Sales and Customer Support

Data_Mining_Techniques

  • Authors: Michael J. A. Berry & Gordon Linoff
  • Publisher: Wiley computer Publishing
  • Copyright: 1997
  • ISBN 0-471-17980-9

This book is a great overview of Data Mining. It is good for background in this discipline. It is not associated with any software product. It just walks thru each of the core types of algorithms, describes the concepts behind them. Then compares & contrasts them for their suitability to different types of business problems. Unfortunately it is out of print, largely because they released a 2nd Edition. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management  (2nd Edition) While not bad, I didn't think the second edition was as good as the first. It is organised differently & it lost the focus of a chapter for each type of algorithm.
Summary: If you can find an old copy of the original, snap it up. If not, flick thru it in a book store & decide for yourself.

"Bible" Web Site ...

SQL Server Data Mining is the semi-official site of the SQL Data Mining Development team.

I don't plan on writing a lot of posts on Data Mining. Largely because this site has so much available; Whitepapers, Code samples, Sample data, Tutorials & Webcasts, & similar. All written by people much smarter than I am.

Honourable mentions ...

Another site worth a visit is Richard Lees' EasternMining Web Site. Richard has a passion for performance tuning Analysis Services & Data Mining. His site has quite a few examples of using data (BI & Data Mining) to run & monitor your business. Great place to go for ideas.

Great Add-ins

For business users the "Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007" is a must have. It is a handy tool that makes data mining real for them. To ease them into it there is a tab called "Table Analytics". This lets them safely use data mining without being scared away by thinking it is too complex for them.

For developers the Microsoft SQL Server 2008 Datamining Viewer Controls is a godsend. With these you can embed rich data mining visualisation into your application, with almost zero effort. You should also look at the Visio part of the office add-ins mentioned above, as they make it easy to view & print the results of really big mining models.

Both are part of the Microsoft SQL Server 2008 Feature Pack, October 2008. Free to download & no additional cost to use. They do require a valid SQL Server license. (Either User has a SQL CAL or server is "Per Processor" licensed).

NB: There is an experimental project Table Analysis Tools (for the Cloud) in theory it is the same as "Data Mining Add-ins for Office" but the work is done on the Internet somewhere. It offers similar capability, but as it is not yet a "shipping product", not even a beta. Use it for free but do not make it a production part of your business.

Great Blogs

Jamie MacLennan's Blog Jamie is now Principal Development Manager for SQL Server Analysis Server Team (Data Mining team is a part of his team)

Bogdan Crivat's Blog Bogdan is a Senior S/W Engineer on the SQL Server Data Mining Team.

Thanks

Please give feedback. Is this type of info useful? Did it save you time? What was good, What could be better?, Notice any errors? What would you like me to cover? All thoughts, comments, suggestions welcome.

Technorati Tags: Data Mining,Recomendations,Text Books