SQL Server Agent failing to start because "The EventLog service has not been started";

(This post has been extended with SQL Server Agent failing to start because "The EventLog service has not been started" (Second Part) )

One customer reported to us that their SQL Server 2000's SQL Server Agent services, recently started to fail when trying to come up running, automatically after a reboot of the operating system hosting the SQL.

When that happened, the following messages were dumped into SQL Server Agent's log (SQLAGENT.OUT):

2007-11-04 20:23:01 - ! [241] Startup error: Unable to initialize error reporting system (reason: The EventLog service has not been started)
2007-11-04 20:23:21 - ? [098] SQLServerAgent terminated (normally)

Customer claimed that even after creating a dependency on EventLog service for SQL Server Agent service, the error continued to show up after every machine reboot.

It happened to be a bug in SQL Server Agent's code, and here's a brief explanation of what it is:

As part of SQL Server Agent's startup tasks, it initializes a Logging component, which will report errors, warnings, or informational messages to agent's log file (SQLAGENT.OUT) and, if it's running on any non Win9x platforms, to the Event Log (by using the EventLog service Windows APIs). So, as part of that initialization process, it has to make sure that the EventLog service is started, because if it isn't, then the agent cannot function properly and so it exits with the error described above.

The function which checks whether the EventLog service is running or not, calls EnumServicesStatus twice with SERVICE_ACTIVE as the value for its dwServiceState parameter (so it will only enumerate services that are in the following states: SERVICE_START_PENDING, SERVICE_STOP_PENDING, SERVICE_RUNNING, SERVICE_CONTINUE_PENDING, SERVICE_PAUSE_PENDING, or SERVICE_PAUSED).

The first call to that API passes NULL to the lpServices parameter and zero for cbBufSize, that to only determine the size of the buffer required to hold the list of services in such states at the moment of the call. Later, it allocates as many bytes as the pcbBytesNeeded parameter reflected that were needed, and finally it calls EnumServicesStatus again. If at the time of the second call to EnumServicesStatus there are more services in any of the states mentioned earlier, than during the first call to EnumServicesStatus, then it returns FALSE and GetLastError returns ERROR_MORE_DATA.

The function implemented in SQL Server Agent to check whether EventLog is running or not, doesn't take into account such "exceptional" condition and simply returns a failure as if the EventLog service wouldn't be running.

A fix would be required that would rewrite the mentioned function, so that it retries the calls to EnumServicesStatus everytime the second call of the loop returns FALSE and GetLastError returns ERROR_MORE_DATA. But since that fix doesn't meet the bar at this point as to be addressed via a hotfix, I better give you a couple of workarounds you could implement in the event you encounter yourself face to face with this problem.

As it was already mentioned earlier, a simple dependency rule on the Service Control Manager doesn't suffice. So, your two options are:

  1. Use LoadOrder by Bryce Cogswell to set the order in which SCM must start your services, and set SQL Server Agent to be the last one in the list. With this method there's still a minimal chance that the issue can be seen.
  2. Use the following script (provided by my colleague Johann "Julien" Weskandt) so that SQL Server tries to start the SQL Agent service after a given delay.

use master
go
create proc spStartAgent as
-- Version 1.0 by JohannWe, EMEA GTSC, Microsoft customer support
set nocount on
declare @inst sysname, @cmd nvarchar(1000), @rc int, @msg nvarchar(1000)
set @msg=null
set @inst=convert(sysname, serverproperty('InstanceName'))
if @inst is null
set @cmd=N'net start SQLServerAgent'
else
set @cmd=N'net start SQLAgent$' + ltrim(rtrim(@inst))
create table #res (id tinyint not null identity, msg nvarchar(1000))
insert #res(msg) exec @rc=master.dbo.xp_cmdshell @cmd
if @rc=0
return 0
else
if @rc=2
begin
select @msg=msg from #res where id=3 and msg like '%NET HELPMSG 2182%'
if @msg is not null -- we get error 2182, if SQLAgent was already started
return 0
else
begin
raiserror ('An error was raised while starting SQLAgent. See System Event log for details!' , 19, @rc) with log
return 1
end
end
go
create proc spStartAgentDelay as
declare @t char(8), @rc int
set @t='00:01:00' -- Edit to change waiting time!
waitfor delay @t
exec @rc=spStartAgent
go
-- Register as startup proc:
declare @rc int
exec @rc=sp_procoption 'spStartAgentDelay', 'startup', 'on'
select ReturnCode=@rc
select IsStartup=objectproperty (object_id('spStartAgentDelay'), 'ExecIsStartup')
go

Stay tuned.