Upgrade for SQL Server 2008 can fail if you have renamed the 'sa' account

UPDATE TO THIS POST:

We now have a fix for this problem as documented at

968829  FIX: Error message when you try to upgrade an
instance of SQL Server 2005 to SQL Server 2008: "Wait on the Database
Engine recovery handle failed. Check the SQL Server error log for potential
causes" https://support.microsoft.com/default.aspx?scid=kb;EN-US;968829

However, because this is setup, you will need to use the setup slipstream process to apply the fix as part of running setup (Note this problem does not happen in SQL Server 2008 R2). You can read more about the setup slipstream process at:

https://blogs.msdn.com/b/psssql/archive/2009/03/17/how-to-fix-your-sql-server-2008-setup-before-you-run-setup-part-ii.aspx

 

 

Gail Shaw, a SQL MVP, presented me with a new issue for SQL Server 2008 I had not heard of before. It was posted at the following web site:

https://www.sqlservercentral.com/Forums/Topic560965-391-1.aspx

This customer had renamed the 'sa' account as many others have done to avoid login attacks on the 'sa' account. An example of a posting that talks about how to do this can be found at:

https://blogs.msdn.com/sqltips/archive/2005/08/27/457184.aspx

One comment here before I get into the real problem. The following section in the SQL Server 2008 documentation is incorrect:

https://msdn.microsoft.com/en-us/library/cc280562.aspx

It says that you will have the option to rename the 'sa' account during installation or upgrade but in fact that feature is not in the final SQL Server 2008 product.

So onto the details of the problem....

If you have renamed the 'sa' account and attempt an upgrade to SQL Server 2008, during the upgrade process you will encounter the following error in the form of a dialog box:

image

I'm including the error text here as well so it can be found with searches on the web:

Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.

Unfortunately the meaning of the message is not obvious but the direction of what to do will point to the problem (look at the ERRORLOG). This message is actually generic in the sense that it means the SQL Server Engine failed to start as part of installation. The reason for its failure to start can be found in the SQL Server ERRORLOG.

If you click OK, the upgrade will continue until you are presented with a final screen confirming there was a failure:

image

If you hit next, you can select the Summary log for setup which shows the following at the top:

Overall summary:
  Final result:                  SQL Server upgrade failed. To continue, investigate the reason for the failure, correct the problem, and then repair your installation.
  Exit code (Decimal):           -2068643839
  Exit facility code:            1203
  Exit error code:               1
  Exit message:                  SQL Server upgrade failed. To continue, investigate the reason for the failure, correct the problem, and then repair your installation.
  Start time:                    2008-09-04 15:08:17
  End time:                      2008-09-04 17:19:26
  Requested action:              Upgrade

Later down in the summary log you see this:

Detailed results:
  Feature:                       Database Engine Services
  Status:                        Failed: see logs for details
  MSI status:                    Passed
  Configuration status:          Failed: see details below
  Configuration error code:      0x4BDAF9BA@1306@24
  Configuration error description: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
  Configuration log:             C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20080904_150530\Detail.txt

Anytime someone tells me SQL Server will not start and hands me an ERRORLOG, I always start at the bottom and work my way up. this technique works well here as the bottom of the ERRORLOG looks like this:

2008-09-04 15:42:43.83 spid7s      Executing msdb.dbo.sp_syspolicy_create_purge_job
2008-09-04 15:42:44.23 spid7s      Error: 515, Severity: 16, State: 2.
2008-09-04 15:42:44.23 spid7s      Cannot insert the value NULL into column 'owner_sid', table 'msdb.dbo.sysjobs'; column does not allow nulls. INSERT fails.
2008-09-04 15:42:44.26 spid7s      Error: 912, Severity: 21, State: 2.
2008-09-04 15:42:44.26 spid7s      Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 515, state 2, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2008-09-04 15:42:44.28 spid7s      Error: 3417, Severity: 21, State: 3.
2008-09-04 15:42:44.28 spid7s      Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

The last error just indicated we cannot recover master but the reason for the recovery problem has nothing to do with a transaction. Error 912 is an indication of a failure to run an upgrade script, in this case sqlagent100_msdb_upgrade.sql. What failed in the script? The messages above this provide the clues:

1) Error 515 indicates an INSERT tried to insert a NULL value into the msdb.dbo.sysjobs table

2) The message above this indicates we are executing a procedure sp_syspolicy_create_purge_job

The problem can be found in the upgrade script in this stored procedure with the following T-SQL code fragment:

DECLARE @jobId BINARY(16);
EXEC @ReturnCode =  msdb.dbo.sp_add_job
        @job_name=@job_name,
        @enabled=1,
        @notify_level_eventlog=0,
        @notify_level_email=0,
        @notify_level_netsend=0,
        @notify_level_page=0,
        @delete_level=0,
        @owner_login_name=N'sa',
        @job_id = @jobId OUTPUT;
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

As you can see this code was written to call sp_add_job with an owner_login_name = 'sa'. But if you have renamed 'sa', then sp_add_job will not find the login and attempt to insert a NULL SID into the sysjobs table.

The natural reaction should be to just rename 'sa' back and then re-run the upgrade or repair the installation. There is only one problem here. SQL Server will shutdown each time until these scripts are successfully run so there does not appear to be a way to get into the engine to rename 'sa' back.

I really don't want customers to feel they need to hack this upgrade script and although the posting on sqlcentral.com has a clever way to get around this (creating a Windows account called 'sa'), here is the process you should use and what we will document in an upcoming KB article on this subject:

1) Start SQL Server using the command line parameters /f and /T3608. One way to do this is from a command prompt like the following:

net start mssqlserver /f /T3608

2) This should allow SQL Server to start and avoid any of the upgrade scripts.

3) Connect to the server using sqlcmd.exe with the Dedicated Admin Connection. Be sure to ONLY run the necessary commands to rename back the 'sa' account. On my machine I ran a query like this:

sqlcmd -E -A -Q"alter login [sys-admin] with name = sa"

4) Shutdown SQL Server

5) Now run the Repair feature of setup through the SQL Server Installation Center.

6) The remaining part of upgrade should proceed and complete.

Once you have completed this procedure, you are free to rename the 'sa' account again.

Anything Else...

What about the SQLAgent job that was created as part of the upgrade? This job is designed to purge the history for Policy Based Management jobs. If you rename 'sa' again, what will happen and what should you do about it?

The answer to this question is not simple, so let me try to walk you through it:

- If you don't rename 'sa' after the upgrade or don't plan to, the you don't need to know the rest of these details. Please read on to the section titled There's More.... below.

- If you plan to rename 'sa', then I recommend you change the owner of this job to a sysadmin login that you provision. Since a role or group can't own a job, it must be a specific login that is a sysadmin. This will avoid any future problems.

- If you don't do this, when this job runs, it may fail with the following error in the Job History:

The job failed. The owner (sa) of job syspolicy_purge_history does not have server access.

- If you don't want to provision a sysadmin account to resolve this, there is another less elegant solution. You can avoid this error by running the following set of T-SQL commands to refresh the SQL Agent job cache:

EXECUTE msdb.dbo.sp_sqlagent_notify @op_type = N'J', @job_id = @job_id, @action_type = N'U'

where @job_id is the job_id for the syspolicy_purge_history job. You can find this job_id using the following query:

select * from msdb.dbo.sysjobs where name = 'syspolicy_purge_history'

So on my server, I ran the following script to update the owner account name for this job in the job cache:

EXECUTE msdb.dbo.sp_sqlagent_notify @op_type = N'J', @job_id = '58AF91D3-3ADB-4381-B1C2-45F31CB12AD0', @action_type = N'U'

The job then ran successfully when scheduled.

- The failure of this job can also occur even if you don't upgrade but rename 'sa' after a new installation. Follow the same steps above (either change the owner to a provisioned sysadmin account or use the script I've provided) to allow the job to run successfully. In fact, the problem encountered here can occur for any job you create and then rename the owner login.

For the future we are thinking about how we should create the job to ensure the right owner exists even if you decide to rename the 'sa' account.

Note that disabling the 'sa' account has no bearing on this upgrade problem or whether this job will run successfully.

There's More....

Now the SQL MVPs are extremely sharp. Gail Shaw pointed out that the account 'sa' is hard-coded in other places in the script sqlagent100_msdb_upgrade.sql. What about these situations and what issues could they cause?

One of these is a fragment of the script to drop and recreate a stored procedure for DBMail:

PRINT ''
PRINT 'Creating procedure sysmail_update_profileaccount_sp...'
IF (EXISTS (SELECT *
            FROM msdb.dbo.sysobjects
            WHERE (name = N'sysmail_update_profileaccount_sp')
            AND (uid  = SUSER_SID('sa'))
              AND (type = 'P')))
  DROP PROCEDURE dbo.sysmail_update_profileaccount_sp

As you can see from this query, the IF EXISTS will be false since 'sa' does not exist. This means the existing procedure from SQL Server 2005 will not be dropped. Which means the CREATE PROCEDURE that follows this T-SQL statement will fail as you can see in the ERRORLOG:

2008-09-10 15:36:43.30 spid6s      Creating procedure sysmail_update_profileaccount_sp...
2008-09-10 15:36:43.58 spid6s      Error: 2714, Severity: 16, State: 3.
2008-09-10 15:36:43.58 spid6s      There is already an object named 'sysmail_update_profileaccount_sp' in the database.

So this seems like a bad situation. The existing 2005 procedure is supposed to get dropped and a modified version created in its place, but that didn't happen.

Well because of the main problem I've described with the Policy Based Management (PBM) procedure, the solution to resolve that problem "resolves" this problem. When you rename 'sa' back and run repair, this script runs again, properly drops the procedure, and creates the new SQL Server 2008 version. OK, I'll admit we got lucky on this one.

We are currently researching a few other references to 'sa' in the upgrade script and I'll post an update at this point in the blog in this section when that research is complete.

 

Bob Ward
Microsoft

logo-header-sql08-dg