Configure Alerts for a Hung/Stuck SQL Agent Job step


SQL Server does a decent job alerting us of SQL Agent Job failures/ Success

But, what if we  want to know if a job is hung, been running for a long time and has been stuck?

When my customer approached me with the below problem I realized that the Job history table in the MSDB does not have entries for Job or Job Steps that are hung or in Transit.

It only adds an entry to the Job History table after the Job has completed or Failed.

 

We had the following SQL Agent Job in our environment
1. We were using an SSIS package with an WMI Event Watcher Task to continuously watch for arrival of new files. Each day the new file would arrive at 3AM

2. After this there were more steps in the Job execution which would process the file..etc

The Job had steps 1 to 8

The problem was that it was getting hung on the 2nd  step which was the SSIS Package execution. This was to process the file after the WMI Event Watcher Task determined that the file had arrived. It was getting hung “intermittently” with no specific pattern , even after the file has arrived.

Each time the application user would raise a complaint that they did not recieve a file for  processing their Daily report which would then alert the SQL team. They would find the Job stuck executing the Step 2.

To fix it they would check if the file had arrived and then jump to step3.

We needed a proactive way to alert us if the Job step was stuck. Hence we used the below approach

Find your Step_id and Job_Id from

Use msdb
go
select * from dbo.sysjobsteps

 

We created a new SQL Agent Job for msdb database to alert us:

declare @run_date int
set @run_date = (select TOP 1 run_date from dbo.sysjobhistory where step_id = 2 and job_id='XXXXXXX-XXX-XXXX-XXXX-XXXXXXX' ORDER BY run_date DESC)
--Getting the latest run_date of the Job step
declare @return_value int = 0;

declare @d1 int = (SELECT YEAR(GETDATE()) * 10000 + MONTH(GETDATE()) * 100 + DAY(GETDATE()))
-- Getting todays date in the same format as the last run_date

declare @run_time int = (select TOP 1 run_time from dbo.sysjobhistory where step_id =2 and job_id='XXXXXX-XXX-XXXX-XXXX-XXXXXXX' ORDER BY run_date DESC)
--Getting the latest run_time of the Job Step

declare @threshold_time int = 50000
--Declare a threshold time at which you want to be alerted. Example Job step was to complete at 3am ,but even at 5AM it hasnt posted a success message
--Here 5am is your threshold_time

if @d1 > @run_date and @run_time < @threshold_time SET @return_value = 1 ;
-- If the Job History doesnt have run_date record for Today and if Job History shows that the latest run_time is less than the threshold_time
--it means the Job Step hasn't run or is probably stuck , which means we should be alerted

IF (@return_value) > 0
begin
exec msdb.dbo.sp_send_dbmail
@profile_name = 'Default',
@recipients = 'me@example.com',
@subject = 'warning step 2 is stuck'
end

To setup Database Mail you can refer the following steps We now get alerted when our job step is stuck! Hope this helps if someone has the same issue. If you have any more ideas on how to approach this please do share in the comment section below:)


Comments (1)

  1. jacob jerry says:

    nice articles Pooja 🙂

Skip to main content