Apache Hadoop on Windows Azure Part 8 – Hadoop Map/Reduce Administration from command line in Cluster

After you created your Hadoop cluster in Windows Azure, you can remote into it to start the Map/Reduce administration. Most of the processing log & HDFS data is already available over port 50030 and 50070 however, you can run bunch of standard Hadoop commands directly from command line.

 

After you login to your main node, you will see Hadoop Command Shell shortcut is already there which launches the command as below:

 

D:\Windows\System32\cmd.exe /k c:\apps\dist\bin\hadoop.cmd

 

Once you start the Hadoop Shell shortcut you will see the list of commands you can use as below:

 

 

For example you can check the name node details by using “Hadoop namenode” command:

 

If you want to start a datanode you just run “Hadoop datanode” command:

 

 

Now let’s check if any jobs are running using command “hadoop job –list”

 

c:\apps\dist>hadoop job -list

0 jobs currently running

JobId State StartTime UserName Priority SchedulingInfo

 

Now let me start a Hadoop Job and then we will check the job list again:

 

c:\apps\dist>hadoop job -list

1 jobs currently running

JobId State StartTime UserName Priority SchedulingInfo

job_201112310614_0004 4 1325469341874 avkash NORMAL NA

 

c:\apps\dist>hadoop job -status job_201112310614_0004

 

Job: job_201112310614_0004

file: hdfs://10.186.22.25:9000/hdfs/tmp/mapred/staging/avkash/.staging/job_201112310614_0004/job.xml

tracking URL: https://10.186.22.25:50030/jobdetails.jsp?jobid=job\_201112310614\_0004

map() completion: 1.0

reduce() completion: 1.0

Counters: 23

        Job Counters

                Launched reduce tasks=1

                SLOTS_MILLIS_MAPS=19420

                Launched map tasks=1

                Data-local map tasks=1

                SLOTS_MILLIS_REDUCES=15591

        File Output Format Counters

                Bytes Written=123

        FileSystemCounters

                FILE_BYTES_READ=579

                HDFS_BYTES_READ=234

                FILE_BYTES_WRITTEN=43645

                HDFS_BYTES_WRITTEN=123

        File Input Format Counters

                Bytes Read=108

        Map-Reduce Framework

                Reduce input groups=7

                Map output materialized bytes=189

                Combine output records=15

                Map input records=15

                Reduce shuffle bytes=0

                Reduce output records=15

                Spilled Records=30

                Map output bytes=153

                Combine input records=15

                Map output records=15

                SPLIT_RAW_BYTES=126

                Reduce input records=15

 

As a new job has been started you will also see data coming out at datanode windows as well:

 

Keywords: Windows Azure, Hadoop, Apache, BigData, Cloud, MapReduce