The other day I came across a very common but intriguing scenario of SQL Server service NOT starting. Initially when I came across this issue, I felt that this would not take much of my time but interestingly it did not turn out as I thought!
This was actually the SQL Server resource which was not coming ONLINE. As usual the DBA said NOTHING changed, it just happened!!
One of the first things we normally do is to start the service from the SQL Server Configuration Manager/Services console so that we can isolate the issue. In this case we found that the service was failing to start with the below errors:-
Windows could not start the SQL Server (MSSQLSERVER) service on Local Computer.
Error 1053: The service did not respond to the start or control request in a timely Fashion.
Generic error, you don’t expect much from the GUI anyways!! Went for the SQL Server Errorlogs and Application/System Event viewer logs. We did not find the Errorlog created in the SQL Server folder nor did any errors get logged in the Application/System logs.
Commonly, in such scenarios, the start-up parameters are goofed up but in this case it was not.
So the next logical step was to start the service in the command prompt in console mode:-
C:\Windows\system32>"C:\Program Files\Microsoft SQL Server\MSSQL10_50.R2\MSSQL\Binn\sqlservr.exe" -sR2 –c
It did show any message indicating that the service in not even initialising. It appeared that it was sort of in a hung state. Strange!!
Then I pulled out the next tool from my toolkit: - PROCESS MONITOR a.k.a PROCMON (I hope most of you get the reason why I put that in CAPS for those of you who did not ..read on)
So I collected a PROCMON while running the service in command prompt. I filter it on SQLServr.exe and here is what I find.
I see that SQL Server accesses the ERRORLOG file but continues to perform a delete operation on files like ERRORLOG.321234533, ERRORLOG.321234532, ERRORLOG.321234531..Why would SQL do this and why these weird files names??
Scratching my head for some time and looking at the PROCMON output for some more time I figured it out. I see that before SQL attempts this file deletion it accesses the below registry key
Event Class: Registry
Path: HKLM\SOFTWARE\MICROSOFT\Microsoft SQL Server\MSSQL10_50.R2\MSSQLServer\NumErrorLogs
But now what is NumErrorLogs and when would that be set??
Remember there is an option in SQL Server where you can customise the Number of Errorlogs to be retained, by default its 6.
To confirm my understanding, I set this to 99 and collected a PROCMON on my test machine.
I see the similar symptoms of the Errorlogs being attempted to be deleted.
In short, what is happening is that SQL Server is internally is attempting to delete/rename the Errorlog files and arrange it so that only 99 or less Errorlog files are retained!!! Now since the registry key is set to such a high value SQL Server is looking for the Errorlogs and decrementing the number every time and hence it SQL Server appears to be hung!!!
But if you notice, the maximum value which can be set here is 99. So how does the NumErrorLogs registry key has such a high value??
No points for cracking that. The possible options for that happening:-
- Manual intervention of changing this registry key
- Running something like below in SQL Server
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'NumErrorLogs', REG_DWORD, 321234533
So finally, we go ahead and change the registry key back to the default value and BINGO SQL Server starts up!! Issue resolved!!
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular.