MOSS 2007: Hung Timer Service on a Newly Built or Hardware Upgraded Server


Overview

In MOSS 2007, there is a known issue where the Windows SharePoint Services Timer service enters a deadlock after being started. If you try to stop it, you’ll notice that you’ll a message from the SCM that the service is not responding. The symptoms of this problem are that your SharePoint Timer Jobs will not run, and since most administrative tasks depend on timer jobs to get completed, you will not be able to get your new server fully functional.  You may also see this problem if you add more CPUs to an existing server. Other symptoms of this problem include solutions not getting deployed, alerts not firing etc.

Validation

To validate whether or not you’re running into the same problem that this post describes, try and stop the Windows SharePoint Services Timer service from the services console. If you’re running into the problem described here, you’ll get the following message:

image

“Count not stop the Windows SharePoint Services Timer service on Local Computer.

Error 1053: The service did not respond to the start or control request in a timely fashion”

Cause

This issue occurs if your server has more than 16 cores available for OWSTimer. The number of heaps that OWSTimer creates on startup increases with the number of CPU cores available to OWSTimer. The problem is, there is another thread in OWSTimer that monitors excessive heap creations and reports these to the ULS logs. By default, the threshold is 32, which means that if more than 32 heaps are created in OWSTimer, the monitoring thread will try to report it to the ULS logs. If the server has 24 cores, 48 heaps will get created in OWSTimer at startup and the monitoring thread will try to report it to the ULS logs. We enter a deadlock state as OWSTimer is the one that initializes the ULS logging infrastructure, and with the ULS logging infrastructure not initialized yet, the monitoring thread just sits there waiting for the ULS logging infrastructure to become available.

Too much detail for most SharePoint Administrators – you just need the solution right? That’s in the next section below.

Solution

The first step to resolve this problem is to determine the number of CPU cores that are available to OWSTimer (The Timer Service). Even though task manager should show you this information, I have seen many folks getting confused getting the right number especially when hyper threading is involved. Here is a more reliable way to find the number of cores being used by OWSTimer on your SharePoint server:

1. Download process explorer  from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

2. Run process explorer on the server and find the OWSTimer.exe process

clip_image002

3. Right click on OWSTimer and select “Properties”

4. Select the “Environment” tab

5. Check the value in the NUMBER_OF_PROCESSORS. This is the number of cores OWSTimer is using

clip_image003

If OWSTimer is using more than 16 cores, please add the following registry key on the affected server

1) Open Registry Editor

2) Move to the following key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions

3) Right click on "Web Server Extensions" and click [New] - [Key]

4) Name the new key created in step 3 as "HeapSettings"

5) Ensure the following key is created:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\HeapSettings

6) Right click on "HeapSettings" key and click [New] - [DWORD value]

7) Rename DWORD key created in step 6 as "LocalHeapWarnCount"

8) Double click on "LocalHeapWarnCount"

9) "Edit DWORD Value" dialog will open. Enter [Value data] = double the number of CPU cores. (e.g. if the number of CPU cores is 24, then enter "48")

10) Reboot MOSS server.

This should allow OWSTimer to run normally without any further issues. Hope this helps!

Happy SharePointing!


Comments (0)

Skip to main content