Collecting logs from Apache Storm cluster in HDInsight


While running an Apache Storm topology in a multi node storm cluster different components of the topology log in different files that are saved in different nodes in the cluster, depending on where that component is running. Today in this blog I will discuss the log files that are available in a storm cluster and from where and how you can collect those.

In a storm cluster we mainly have the following three types of logs.

  1. Nimbus log
  2. Supervisor logs
  3. Worker process logs

Nimbus Log:

Nimbus runs on the head node and the Nimbus logs are present on the active head node in the logs folder under storm-home at C:\apps\dist\storm-<version>\logs. To get the nimbus log you need to connect or RDP to the head node and then copy the file.

Supervisor logs:

Each worker node runs a supervisor demon that handles the work assignments from the nimbus. Therefore each worker node has separate set of supervisor logs. Supervisor logs are present on the logs folder under storm-home on worker nodes. Supervisor logs are saved as a chain of 102 MB files so you may see multiple supervisor log files.

To collect to supervisor logs you need to RDP to each worker node. Once you RDP to the head node from there you can RDP to the worker nodes by specifying workernode0, workernode1, etc. When collecting the logs if not sure in which node you are at you can always run "hostname" from a command line and find out.

Worker process logs

Worker processes log into worker logs that are basically JVM processes and are stored in worker nodes too in the same location as the supervisor log. Workers process run on predefined ports on the worker nodes. A worker process belongs to a specific topology and all the executors (threads) and tasks within the same worker process log in the same file. However, like the supervisor log worker process logs are saved as a chain of 102 MB files so you may see multiple worker process log files for the same port.

You can RDP to the worker nodes and collect the worker process logs. However there are other convenient ways to view and download worker process logs. You can view and download the worker process logs from a) Storm UI through Azure portal and b) Visual Studio.

Viewing and downloading Worker process logs from Storm UI

After selecting your HDInsight Storm cluster in the Azure portal click the "Storm Dashboard" button at the bottom to open up the Storm Dashboard.

Now click "Storm UI" link at the top to open up Storm UI from Storm dashboard.

Once you are at the Storm UI page under the "Topology summary" click the topology for which you want to view or download the worker process logs.

It should take you to the details page for that topology. Now click a spout or bolt for which you want to view the logs under the "Spouts (All time)" and "Bolts (All time)" sections. It should take you to the details page for that spout or bolt.

 

Now under the "Executor (All time)" section you should see a line for each executor (thread) and a port number with the hyper link.

Click the port number with the hyper link and it should open up the log file for the worker process that was running on that port. In the above screen short if you click port 6703 from the first line it will open up the worker process log on workernode0 that was running on port 6703. You can download the whole file by clicking the download link.

Viewing and downloading Worker process logs from Visual Studio

There is a similar option to view and download the worker process logs from Visual Studio. If you haven't tried to create a storm topology in Visual Studio yet check this Azure document. In Visual Studio connect to your Azure account in Server Explorer and then navigate to your storm cluster under HDInsight. Right click the cluster name and select "View Storm Topologies"

It should show you the list of the topologies that are running on your storm cluster.

Select any of the topologies for which you want to view or download the worker process logs. You should see the same exact view as in "Storm UI" and from there should be able to navigate to the worker process logs to view or the download them by following the same steps we detailed in the earlier section.

Conclusion

To collect the storm worker process logs you do not need to connect to each worker node rather you can easily view or download them from Storm UI or Visual studio. However, for the storm nimbus and supervisor logs you still need to connect or RDP to the respective nodes.

Hopefully this blog was able to give you some idea about the different logs that you can collect from a storm cluster, which storm component logs in which log file, where do they reside and how to collect them.

Comments (0)

Skip to main content