Free ebook: Introducing Microsoft Azure HDInsight

Front cover of Introducing Microsoft Azure HDInsightGreetings! We’re thrilled to share another new free ebook with you: Introducing Microsoft Azure HDInsight, by Avkash Chauhan, Valentine Fontama, Michele Hart, Wee Hyong Tok, and Buck Woody. Here are the download links (and below the links you’ll find an ebook excerpt that describes this offering):

Download all formats (PDF, Mobi and ePub) as well as link to the companion content hosted by the Microsoft Virtual Academy.

Introduction (excerpt)

Microsoft Azure HDInsight is Microsoft’s 100 percent compliant distribution of Apache Hadoop on Microsoft Azure. This means that standard Hadoop concepts and technologies apply, so learning the Hadoop stack helps you learn the HDInsight service. At the time of this writing, HDInsight (version 3.0) uses Hadoop version 2.2 and Hortonworks Data Platform 2.0.

In Introducing Microsoft Azure HDInsight, we cover what big data really means, how you can use it to your advantage in your company or organization, and one of the services you can use to do that quickly—specifically, Microsoft’s HDInsight service. We start with an overview of big data and Hadoop, but we don’t emphasize only concepts in this book—we want you to jump in and get your hands dirty working with HDInsight in a practical way. To help you learn and even implement HDInsight right away, we focus on a specific use case that applies to almost any organization and demonstrate a process that you can follow along with.

We also help you learn more. In the last chapter, we look ahead at the future of HDInsight and give you recommendations for self-learning so that you can dive deeper into important concepts and round out your education on working with big data.

Who should read this book

This book is intended to help database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of HDInsight and related technologies. It is especially useful for those looking to deploy their first data cluster and run MapReduce jobs to discover insights and for those trying to figure out how HDInsight fits into their technology infrastructure.


Many readers will have no prior experience with HDInsight, but even some familiarity with earlier versions of HDInsight and/or with Apache Hadoop and the MapReduce framework will provide a solid base for using this book. Introducing Microsoft Azure HDInsight assumes you have experience with web technology, programming on Windows machines, and basic data analysis principles and practices and an understanding of Microsoft Azure cloud technology.

Who should not read this book

Not every book is aimed at every possible audience. This book is not intended for data mining engineers.

Organization of this book

This book consists of one conceptual chapter and four hands-on chapters. Chapter 1, “Big data, quick overview,” introduces the topic of big data, with definitions of terms and descriptions of tools and technologies. Chapter 2, “Getting started with HDInsight,” takes you through the steps to deploy a cluster and shows you how to use the HDInsight Emulator. After your cluster is deployed, it’s time for Chapter 3, “Programming HDInsight.” Chapter 3 continues where Chapter 2 left off, showing you how to run MapReduce jobs and turn your data into insights. Chapter 4, “Working with HDInsight data,” teaches you how to work more effectively with your data with the help of Apache Hive, Apache Pig, Excel and Power BI, and Sqoop. Finally, Chapter 5, “What next?,” covers practical topics such as integrating HDInsight into the rest of your stack and the different options for Hadoop deployment on Windows. Chapter 5 finishes up with a discussion of future plans for HDInsight and provides links to additional learning resources.

Finding your best starting point in this book

The different sections of Introducing Microsoft Azure HDInsight cover a wide range of topics and technologies associated with big data. Depending on your needs and your existing understanding of Hadoop and HDInsight, you may want to focus on specific areas of the book. Use the following table to determine how best to proceed through the book.



Comments (12)
  1. Jonathan Bloom says:

    This is awesome~! Will there be a certification test for HDInsight? Thanks!

  2. Sanjeev Jha (@SQLSANJEEV) says:

    One of the best tweets of the day. Thank you for your time to write this book. Also, thanks for sharing with the community. I can not wait to read.

  3. Thanks for the great book ! Same question as Jonathan, when will be certification on HDInsight ?

  4. Diego Giuseppe says:

    Obrigado por disponibilizar o Livro!!

  5. Fredrik Lindedal says:

    This book is great!

  6. G subash says:

    Good one. I am new to this technology. But able to find some useful paragraphs. Very informative.. Thank you for sharing the ebook.

  7. Amit Yadav says:

    Any plans to launch comprehensive book on Big Data using C#, or any certification path for Big Data by microsoft?

  8. Sathish Raghuraman says:

    This book makes a wonderful read. However, I think some of the Powershell commands require updating.

    For example, Wait-AzureHDInsightJob -Subscription $subscriptionName -Job $wcJob –

    WaitTimeoutInSeconds 3600: -Subscription is deprecated. Would be nice if this was updated.

  9. Maurico Cadena says:

    Great Book!-

  10. This is awesome~! Will there be a certification test for HDInsight? Thanks!

  11. Miguel Chavez Garcia says:

    Thank you … Great Material

Comments are closed.

Skip to main content