Troubleshooting EPM 2007 Queue issues

Thanks to Sharry please find below some useful steps when troubleshooting EPM 2007 queue issues:

1. Use Manage Queue page (Sever Settings -> Queue -> Manage Queue) to look at correlations (use the CorrelationUID column for help here) to see why a certain correlation is blocked. Usually if your queue is still working for other jobs/entities, then restarting the queue is not necessary. If you cannot see any problems and your queue is still working, then your filters on the Manage Queue page are not right – check them. Restarting the Queue is only necessary when nothing is processing. Using the “By Project” filter works nicely for looking at the queue job history of projects. For other correlations, use CorrelationUID.

2. Look first for Failed and Blocking states – those are the jobs that are “blocking” others on the same correlation (again, use the correlation UID here to see what jobs are affected). You can either retry these jobs if the error looks like something having to do with something recoverable (like loss of network or database connection), or you can cancel. Canceling with the default settings will cancel the entire correlation, so make sure you know what data you could be losing by doing so.

3. Are jobs stuck in the “Getting Enqueued” state? If so, WinProj needs to be opened again on that user’s machine who submitted the job to see if WinProj will continue sending the project. If that doesn’t work, then you will need to cancel the jobs in this “getting enqueued” state. Note that this effectively means that the save from WinProj never happened, and that data will need to be resaved again. This is the same thing that happens when you just blindly restart the queue. But at least doing it this way means that you know what is being lost.

4. Look at the error (click the link in the Error column) to get an idea about why the failure occurred. Sometimes you can correct the problem and re-save/re-submit your job.

5. Start comparing Event Logs to what you’ve found on the Manage Queue page. Look for errors around the same time as failed jobs in the queue.

6. SharePoint Logs (usually located here: C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\LOGS ) . Same technique as #5 – look for errors around the same time as failed jobs in the queue.

Once you clear the blocking job(s), the queue should immediately resume processing on that correlation again, and pick up from where it last left off.