Book Review - Hadoop: The Definitive Guide, Third Edition

Book Review - Hadoop: The Definitive Guide, Third Edition (May 2012)
By Tom White

Copyright 2012
Published by O’Reilly Media Inc
ISBN: 978-1-449-31152-0

Overview
Organized into sixteen chapters and three appendices, Hadoop: The Definitive Guide, provides a comprehensive and detailed guide to the Hadoop ecosystem.  The first three chapters provide an overview and history of the Hadoop project and introduce the two primary components; HDFS and Map Reduce. The following chapters focus in great depth on the architecture of HDFS and Map Reduce. These chapters build upon the introductory chapters and dive deeper into topics such as architecture, availability, compression, file systems, building Map Reduce applications, jobs and tasks. The core components of Hadoop are HDFS and Map Reduce and and they are covered by the author in a progressive and digestible format.

Each chapter provides in addition to description, technical details and recommendations, working examples that build chapter upon chapter to walk you through simple illustrative examples of each of the concepts. The examples are well presented and easy to understand and consistently use the data and use cases from previous chapters so that the reader does not need to comprehend a new use case for each example distracting focus from the example’s message.

There are two chapters devoted to configuring and operating a Hadoop cluster and these are augmented by the three appendices that cover installation and prepping for the example code.  Working through the example set up in combination with these chapters should prepare the reader for their own Hadoop implementation.

Pig, HBase and ZooKeeper are addressed in later chapters but are not covered in the same depth as HDFS and Map Reduce. Each of these additional tools deserves its own reference and there are many available.  If you have covered the previous chapters the author’s introduction and examples to these three tools will get you started. 

 
The table of contents and index are well executed and topics are easy to find. The example code is available online which is something that I always appreciate.  Throughout the book the author provides links to the latest documentation online, hints and, best of all, Gotchas to look out for.

How I Read this Book
This is a lot of information to take in and if you are new to Hadoop as I was I would recommend viewing an overview video or two online to become familiar with all the vocabulary before diving into this book.  I got a LOT out of this book by tackling it in two passes; First, I read it all the way through but just glanced over the examples – understanding the use cases, but not getting all wrapped around the Java. It has easily been 10 years since I last developed anything in Java, so I postponed my personal syntax journey.  In my second pass through the book I focused on working and understanding the examples.  By reading first and then working the examples I found that I had more focus on understanding the details and concepts by reading with an uninterrupted flow. Following with a “working the examples” pass allowed me to review and reinforce concepts with example activities. I tackled this material using a “lecture then lab” aproach that works well for me.

Hadoop: The Definitive Guide is an excellent learning tool and reference for Hadoop.  I should be required reading for anyone interested in working with Hadoop.