Programming Collective Intelligence

I tend to read a lot of books, and most of them have a technical focus. Every once and awhile, I run across a gem that is timely, coherent, unique, and well written. Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran is one of those books. If you've ever wanted to understand how search engines perform their magic, how a site like Amazon.com knows what products to recommend, how spam detection works, and how dating sites predict good matches (among others), this book is for you. Check out Toby's Web 2.0 Berlin slides for a quick overview.

Many of the algorithms and methods that Toby describes are very complex, but he doesn't assume that you have any special knowledge of data analysis, machine learning, or statistics. Toby does a fantastic job explaining mathematical concepts in a remarkably straightforward and simple fashion. If you have a programmers understanding of math, you should do just fine.

The book is full of real-world examples that pull live data from sites like del.icio.us, ZEBO, Kayak, Zillow, HOT or NOT, eBay, Yahoo!, and Facebook. All of the code is written using Python. If you've never written a line of Python code in your life, fear not! Toby explains a few of the less obvious Python constructs and syntax in the Preface, and frankly, almost any developer should be able to easily understand what's going on. You'll be writing concise Python code in no time.

Chapters include: Making Recommendations, Discovering Groups, Searching and Ranking, Optimization, Document Filtering, Modeling with Decision Trees, Building Price Models, Advanced Classification: Kernel Methods and SVMs, Finding Independent Features, and Evolving Intelligence (cue the ominous music). Some of the algorithms that are covered include: Bayesian classifiers, decision trees, neural networks, support-vector machines (SVMs), k-nearest neighbors (kNN), hierarchical clustering, K-means clustering, multidimensional scaling, non-negative matrix factorization (NMF), simulated annealing, genetic algorithms, and even genetic programming. Don't worry if you've never heard any of these terms...the names may be scary, but the text is extremely approachable and lucid. And you're sure to be a hit at the next geek gathering!

Not only will you come away from this book with a very good understanding of collective intelligence, but you'll also have a very powerful set of practical Python routines that you can immediately apply to your own data.

Highly recommended!