Apache Hadoop on Windows Azure: How Hadoop cluster was setup on Windows Azure

Once your provide following information to setup your Hadoop cluster in Azure:

  1. Cluster DNS Name
  2. Type of Cluster
    1. Small – 4 Nodes – 2TB diskspace
    2. Medium – 8 Nodes – 4 TB diskspace
    3. Large – 16 nodes – 8 TB diskspace
    4. Extra Large – 32 Nodes – 16 TB diskspace
  3. Cluster login name and Password

 

The cluster setup process, configure your cluster depend on your settings and finally you get your cluster ready to accept Hadoop Map/Reduce Jobs.  

 

If you want to understand how the head node and worker nodes were setup internally, here is some information to you

 

Head node is actually a Windows Azure web role running. You will find Head Node Details as below:

  • Isotope HeadNode JobTracker 9010
  • Isotope HeadNode JobTrackerWebApp 50030 ß Hadoop Map/Reduce Job Tracker
  • Isotope HeadNode NameNode 9000
  • Isotope HeadNode NameNodeWebApp 50070 ç Namenode Management
  • ODBC/HiveServer running on Port 10000
  • FTP Server running on Port 2226
  • IsotopeJS is also running at 8443 as Interactive JavaScript Console.

 

About Worker Node which is actually a worker role having endpoint directly communicating with HeadNode WebRole, here are some details important to you:

Isotope WorkerNode – Create X instances depend on your cluster setup

For Example a Small cluster use 4 nodes in that case the worker node will look like as below:

  • IsotopeWorkerNode_In_0
  • IsotopeWorkerNode_In_1
  • IsotopeWorkerNode_In_2
  • IsotopeWorkerNode_In_3

Each WorkerNode gets its own IP Address and Port and following two ports are used for individual job tracker on each node and HDFS management:

 

If you remote login to your cluser and check the name node summary using https://localhost:50070/dfshealth.jsp you will see the exact same worker node IP Address as described here: details

https://localhost:50070/dfshealth.jsp

Cluster Summary

16 files and directories, 2 blocks = 18 total. Heap Size is 271.88 MB / 3.56 GB (7%)

Configured Capacity : 3.83 TB
DFS Used : 14.46 MB
Non DFS Used : 52.23 GB
DFS Remaining : 3.78 TB
DFS Used% : 0 %
DFS Remaining% : 98.67 %
Live Nodes : 8
Dead Nodes : 0
Decommissioning Nodes : 0
Number of Under-Replicated Blocks : 0

If you look your C:\Resources\<GUID.IsotopeHeadNode_IN_0.xml you will learn more about these details. This XML file is the same which you finds on any Web or Worker Role and the configuration in XML will help you a lot on this regard. 

Keywords: Windows Azure, Hadoop, Apache, BigData, Cloud, MapReduce