How to troubleshoot MapReduce jobs in Hadoop

When writing MapReduce programs you definitely going to hit problems in your programs such as infinite loops, crash in MapReduce, Incomplete jobs etc. Here are a few things which will help you to isolate these problems:

 

Map/Reduce Logs Files:

All MapReduce jobs activities are logged by default in Hadoop. By default, log files are stored in the logs/ subdirectory of the HADOOP_HOME main directory. Thee Log file format is based on HADOOP-username-service-hostname.log. The most recent data is in the .log file; older logs have their date appended to them.

 

Log File Format:

HADOOP-username-service-hostname.log

  • The username in the log filename refers to the username account in which Hadoop was started. In Windows the Hadoop service is started with different user name however you can logon to the machine with different user name. So the user name is not necessarily the same username you are using to run programs.
  • The service name belong to several Hadoop programs are writing the logm such as below which are important for debugging a whole Hadoop installation:
    • Jobtracker
    • Namenode
    • Datanode
    • Secondarynamenode
    • tasktracker.

 

For Map/Reduce process, the tasktraker logs provide details about individual programs ran on datanote. Any exceptions thrown by your MapReduce program will be logged in tasktracker logs.

 

Subdirectory Userlogs:

Inside the HADOOP_HOME\logs folder you will also find a subdirectory name userlogs. In this directory you will find another subdirectory for every MapREduce task running in your Hadoop cluster. Each task records its stdout and stderr to two files in this subdirectory. If you are running a multi-node Hadoop cluster, then the logs you will find here are not centrally aggregated. To collect correct logs you would need to check and verify each TaskNode's logs/userlogs/ directory for their output and then create the full log history to understand what went wrong.