Hadoop adventures with Microsoft HDInsight

What is HDInsight? 

HDinsight is the product name for Microsoft installation of Hadoop and Hadoop on azure service. HDInsight is Microsoft’s 100% Apache compatible Hadoop distribution, supported by Microsoft. HDInsight, available both on Windows Server or as an Windows Azure service, empowers organizations with new insights on previously untouched unstructured data, while connecting to the most widely used Business Intelligence (BI) tools on the planet.

It is available in two mode:

  • HDInsight as Cloud Service: Cloud Version running on Windows Azure
  • HDInsight as Local Cluster: A downloadable version to runs locally on Windows Server and Desktop

 

In this article we will see how to use HDInsight on local machine.

 

Where to get it?

 

What does Windows installer brings to your machine:

 

 

 After the installation is completed you will see the following applications are installed:

  1. Microsoft HDInsight Community Technology Preview Version 1.0.0.0
  2. Hortonwoks Data Platform 1.0.1 Developer Preview Version 1.0.1
  3. If you do not change the installed component, Python 2.7.3150 is also installed
  4. Java and C++ runtime is also installed as required in the machine

 

 

Once installer is completed you will see the following shortcuts are setup in your machine:

 

 Here is the list of shortcuts:

  1. Hadoop Command Line
  2. Microsoft HDInsight Dashboard
  3. Hadoop MapReduce Status
  4. Hadoop Name Node Status

 

By default the Hadoop is installed at C:\Hadoop as below:

If you launch the “Hadoop command Line” you will see the list of commands as below:

  • namenode -format format the DFS filesystem
  • secondarynamenode run the DFS secondary namenode
  • namenode run the DFS namenode
  • datanode run a DFS datanode
  • dfsadmin run a DFS admin client
  • mradmin run a Map-Reduce admin client
  • fsck run a DFS filesystem checking utility
  • fs run a generic filesystem user client
  • balancer run a cluster balancing utility
  • fetchdt fetch a delegation token from the NameNode
  • jobtracker run the MapReduce job Tracker node
  • pipes run a Pipes job
  • tasktracker run a MapReduce task Tracker node
  • historyserver run job history servers as a standalone daemon
  • job manipulate MapReduce jobs
  • queue get information regarding JobQueues
  • version print the version
  • jar <jar> run a jar file

 

  • distcp <srcurl> <desturl> copy file or directories recursively
  • archive -archiveName NAME <src>* <dest> create a hadoop archive
  • daemonlog get/set the log level for each daemon
  • or
  • CLASSNAME run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

Try checking the Version as below:

 

c:\Hadoop\hadoop-1.1.0-SNAPSHOT>hadoop version

Hadoop 1.1.0-SNAPSHOT

Subversion on branch -r

Compiled by jenkins on Wed Oct 17 22:28:56 PDT 2012

From source with checksum 80f5614dfb0743b569344f051a07b37d

 

 Now if you Launch “Microsoft HDInsight Dashboard” shortcut you will see the dashboard running locally as below:

 

 

Launching “Hadoop MapReduce Status” shortcut will give you the following info:

 

 

And Launching “Hadoop Name Node Status” shortcut you will see the following:

 

 

 So as you can see above, you do have Hadoop Cluster running on your local machine.

 Play with it a little more and my next article is coming with more info on this regard.

 Have fun with Hadoop!!