Apache Hadoop on Windows Azure Part 8 – Hadoop Map/Reduce Administration from command line in Cluster

Article
01/01/2012

After you created your Hadoop cluster in Windows Azure, you can remote into it to start the Map/Reduce administration. Most of the processing log & HDFS data is already available over port 50030 and 50070 however, you can run bunch of standard Hadoop commands directly from command line.

After you login to your main node, you will see Hadoop Command Shell shortcut is already there which launches the command as below:

D:\Windows\System32\cmd.exe /k c:\apps\dist\bin\hadoop.cmd

Once you start the Hadoop Shell shortcut you will see the list of commands you can use as below:

For example you can check the name node details by using “Hadoop namenode” command:

If you want to start a datanode you just run “Hadoop datanode” command:

Now let’s check if any jobs are running using command “hadoop job –list”

c:\apps\dist>hadoop job -list

0 jobs currently running

JobId State StartTime UserName Priority SchedulingInfo

Now let me start a Hadoop Job and then we will check the job list again:

c:\apps\dist>hadoop job -list

1 jobs currently running

JobId State StartTime UserName Priority SchedulingInfo

job_201112310614_0004 4 1325469341874 avkash NORMAL NA

c:\apps\dist>hadoop job -status job_201112310614_0004

Job: job_201112310614_0004

file: hdfs://10.186.22.25:9000/hdfs/tmp/mapred/staging/avkash/.staging/job_201112310614_0004/job.xml

tracking URL: https://10.186.22.25:50030/jobdetails.jsp?jobid=job\_201112310614\_0004

map() completion: 1.0

reduce() completion: 1.0

Counters: 23

Job Counters

Launched reduce tasks=1

SLOTS_MILLIS_MAPS=19420

Launched map tasks=1

Data-local map tasks=1

SLOTS_MILLIS_REDUCES=15591

File Output Format Counters

Bytes Written=123

FileSystemCounters

FILE_BYTES_READ=579

HDFS_BYTES_READ=234

FILE_BYTES_WRITTEN=43645

HDFS_BYTES_WRITTEN=123

File Input Format Counters

Bytes Read=108

Map-Reduce Framework

Reduce input groups=7

Map output materialized bytes=189

Combine output records=15

Map input records=15

Reduce shuffle bytes=0

Reduce output records=15

Spilled Records=30

Map output bytes=153

Combine input records=15

Map output records=15

SPLIT_RAW_BYTES=126

Reduce input records=15

As a new job has been started you will also see data coming out at datanode windows as well:

Keywords: Windows Azure, Hadoop, Apache, BigData, Cloud, MapReduce

Apache Hadoop on Windows Azure Part 8 – Hadoop Map/Reduce Administration from command line in Cluster

Additional resources