Often we have several jobs running on our HDInsight clusters that have tight timelines requirements associated with them. This could be in terms on how much time it takes for the job to start, how much time does the job run, what is the maximum time before which the jobs should complete etc. Oozie allows defining these SLA requirements in our workflows and coordinators to ease monitoring such metrics. Oozie SLA monitoring allows oozie to actively check the state of these jobs and notify when SLA is met or missed.
Three metrics that are tracked by Oozie are:
- Start Time
- End Time
- Job Duration
For more details on Oozie SLA please refer to this Oozie Documentation
Configuring Oozie SLA
Oozie SLA monitoring requires configuring Oozie JMS using ActiveMQ to consume the notifications published by Oozie and further trigger actions like triggering emails. It further requires changes in oozie-site.xml to enable oozie to publish these notifications. All these steps can either be performed manually as detailed in below sections, or this entire procedure has been automated to allow for a simple one command configuration of Oozie SLA. Both these options are explained below.
If you are configuring Oozie SLA on an already running cluster, you can do so either using Azure portal as defined in the below section or by logging into the headnode and running the automation script.
Script Action based installation from the Azure portal
You can also use script actions to install Oozie SLA from Azure portal both during cluster creation or once the cluster is created.
During Cluster Creation
- Start creating a cluster as described at Create Hadoop clusters in HDInsight.
- Under Optional Configuration, for the Script Actions blade, click add script action to provide details about the script action, as shown below:
- Click Save to save the configuration and continue with cluster creation.
|Name||Oozie SLA Installation|
|Head/Worker||Check only Headnode|
|Parameters||<CLUSTER_ADMIN_USERNAME> <CLUSTER_ADMIN_PASSWORD> <CLUSTER_NAME>
Ex: admin DummyPassword oozieslasample
On a Running Cluster
If you are configuring Oozie SLA on an already running cluster use the steps described in Applying Script action on a running cluster with the same property values as defined above.
Installation from within the cluster [Headnode]
The automation script is also hosted on Github which can be used to configure Oozie SLA from the cluster's headnode.
Run the below commands on the headnode to achieve the same.
If you prefer to perform these steps manually for better control, the steps are detailed below
Steps to Configure JMS using ActiveMQ
- Create a dir /opt/ActiveMQ
- Download the ActiveMQ from below link depending on the OS, and extract it in the /opt/ActiveMQ directory
- Give the directory appropriate permission
- Go to bin directory and start the daemon
sudo tar -xvf apache-activemq-5.14.3-bin.tar.gz
chmod 775 /opt/ActiveMQ
chown root:root /opt/ActiveMQ
sudo ./activemq start as root user.
Oozie Config Changes
Login to ambari and make the following changes to Oozie Config
- Add oozie.services.ext property in oozie-site.xml to include the following services.
- Add the event handlers property
- Set Oozie Scheduler threads to 15 [Optional]
- Add JMS Properties
- Add the JMS topic name
- Save all the settings and restart Oozie.
Your modified ext property should similar to this
Add the below properties in Custom-Oozie Site:
<value> org.apache.oozie.jms.JMSJobEventListener,org.apache.oozie.sla.listener.SLAJobEventListener,org.apache.oozie.jms.JMSSLAEventListener,org.apache.oozie.sla.listener.SLAEmailEventListener </value>
<value></value> - Empty Value. [This can be used to append a prefix to the topic in oozie.service.JMSTopicService.topic.name. For eg: oozie.]
Sample workflow with SLA monitoring enabled
Below is a sample worfklow that shows SLA monitoring in action
Once this workflow is run, on an SLA miss, you will get an email similar to this if email notification is configured
Further if you look at your Oozie UI, you will see a new tab for SLA.
Here if you search for your job, a result similar to this will give you its SLA status
PS: Feel free to drop in your questions and provide any feedback in the comments section.