Coming from SQL background where you have dependency on SQL Profiler Trace for understanding what’s going on SQL Server like who is running non optimized long query, where its scanning through all partitions and yes who is using non indexed columns for filter right? So coming to HDInsight world wondering do we have something like SQL Profiler for tracking bad jobs – Yes we do, as a part of Apache Ambari Project – (as per wiki) Whole objective of Ambari is to make Hadoop management easy by providing a mechanism of managing or monitoring Hadoop clusters with it's easy to use UI, in this Post I will show how to use Ambari for getting long running or resource intensive jobs.
Objective: How to find long running or resource intensive Hive query?
- Open following link of HDInsight Cluster (https://<HDInsightClusterName>.azurehdinsight.net/ )
- Enter username (admin) and password
- Successful login will open Ambari dashboard which looks something like this:-
- Click Yarn > Quick Links > Active Head Node >>Resource Manager UI
Or if you don’t want to go through the whole hassle just type https://<HDInsightClusterName>.azurehdinsight.net/yarnui/hn/cluster, in popup provide security information admin and its password - and it would open Resource Manager - it provides you all information about jobs (running/pending/finished/etc.) on HDI cluster
- In following screenshot - you can notice multiple options for checking Jobs like in this case I have this specific job which is consuming:-
- A - Memory - 2.2 TB
- B - % of Queue - 64%
- C - % of Cluster - 60%
- In this case filtered job may be problem making query which is causing other jobs to wait in queue as its consuming 60% of resource and 2.2 TB of total 3.75TB allocated for this cluster. I can further look into job by clicking into job id (Job ID > Logs and Pull query from there dag) and check how's query look like and what's its doing under the hood - is it scanning TBs of partitions or filtering on non-partitioned fields or what.
- Like SQL You can Kill this job from Resource Manager it self – Click on Job and on top you will find a button for killing same query, like this job is running for last 7 hours, you can Kill it by clicking Kill Application
Hope this will come handy.
Feel free to add comments, may be in next post I can take a deep dive into each log and action.