SharePoint and Mirroring: Taking action on failovers

You might have seen the database mirroring whitepaper for SharePoint. It mentions a great article from the SQL folks on alerting on database mirroring events. One might think that monitoring the WMI events in SQL is a good or maybe the only way to take action on failovers. However, I propose a better method: Query the [Master].sys.database_mirroring view via a task or process. Here's the query:

Select mirroring_role from sys.database_mirroring where mirroring_role is not null

If the server is principal each database will return 1. If mirror, each database will return 2. Unfortunately, this isn't the only state you have to handle. Because each SQL server can be both the mirror for one database and principal for another, you have to monitor for this situation or SharePoint will be most unhappy. The query to do this is:

Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null

If this query returns more than 1, you know that you have what we call a database mirroring split situation and you need to fail over some of the databases to return to a good state. However, you can't simply failover each principal database because it's assumed that the partial failover occurred for a reason. Take for instance, the situation where one of your content databases resides on a storage array that has failed. In that situation, you would not want to fail the database back over to the original node as it would not come online until the array is repaired. To handle that situation, you want to know the original state of SQL before running your failover logic. If the role is principal, but one of the databases has failed over, you want to fail over the rest of the databases. However, if the node is the mirror, you want to ignore the situation and let the principal handle it. This means your application or process to handle failovers needs to be stateful. I've been storing this state in a user created table called roles in the MSDB database, but I will be moving this to the registry as this seems like a more robust way to handle it.  

BTW.. I experimented with triggering off of the database mirroring events. I found that there were quite a few events that fired when a database fails over. It was confusing determining how to use this information. Additionally, the events didn't tell me the information I wanted to know. I ended up having to query the sys.database_mirroring table regardless. After spending some time on this, I decided it was best just to create a windows service that queries the table as described above. Additionally, I created a simple SQL agent job that tried to do the same thing as the service. Unfortunately, my SQL skills are pretty poor as I prefer C#, but it's the thought that counts right? This TSQL sample will create a SQL agent job that runs once a minute and writes the current state to a table called roles residing in MSDB. It will also attempt to handle a database mirroring split exception. This is where the code breaks down as I noticed that instead of failing over the database to the mirror, it fails it back over to the principal. Anyway's, here you go. Have fun. Oh.. and as always this code is unsupported and is provided as a sample for you to create your own failover mechanism. I left a gotcha in the code so that you'll have to clean it up before you can use it. ha HAAA! as Phil Seben says.

    1: USE [msdb]
    2: GO
    3: /****** Object:  Job [Mirroring_State_Monitor]    Script Date: 10/15/2007 11:46:50 ******/
    4: BEGIN TRANSACTION
    5: DECLARE @ReturnCode INT
    6: SELECT @ReturnCode = 0
    7: /****** Object:  JobCategory [[Uncategorized (Local)]]]    Script Date: 10/15/2007 11:46:51 ******/
    8: IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
    9: BEGIN
   10: EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
   11: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
   12:  
   13: END
   14:  
   15: DECLARE @jobId BINARY(16)
   16: EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Mirroring_State_Monitor', 
   17:         @enabled=1, 
   18:         @notify_level_eventlog=0, 
   19:         @notify_level_email=0, 
   20:         @notify_level_netsend=0, 
   21:         @notify_level_page=0, 
   22:         @delete_level=0, 
   23:         @description=N'No description available.', 
   24:         @category_name=N'[Uncategorized (Local)]', 
   25:         @owner_login_name=N'domain\username', @job_id = @jobId OUTPUT
   26: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
   27: /****** Object:  Step [Check State]    Script Date: 10/15/2007 11:46:52 ******/
   28: EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Check State', 
   29:         @step_id=1, 
   30:         @cmdexec_success_code=0, 
   31:         @on_success_action=1, 
   32:         @on_success_step_id=0, 
   33:         @on_fail_action=2, 
   34:         @on_fail_step_id=0, 
   35:         @retry_attempts=0, 
   36:         @retry_interval=0, 
   37:         @os_run_priority=0, @subsystem=N'TSQL', 
   38:         @command=N'
   39: -- This script monitors the state of the mirror partners and takes action to fix any problems
   40: -- This script uses a user created table [MSDB].dbo.role to store the state of the individual node
   41: -- Principal = 1; Mirror = 2;
   42: --
   43: -- Identify if we have a split state
   44: -- Get the role from the role table and make sure we don''t have more than 1 database role
   45: IF (Select r.role from [MSDB].dbo.role r ) = 1 AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) > 1
   46:     BEGIN
   47:         --Generate list of principal databases
   48:         DECLARE @principalDB as table(dbname varchar(255) NOT NULL);
   49:         --Insert the names of the principal databases into the table variable
   50:         INSERT INTO @principalDB SELECT d.name
   51:         from sys.database_mirroring s, sys.databases d
   52:         WHERE s.database_id = d.database_id and s.mirroring_role = 1;
   53:         --Enumerate through the principal databases and fail over
   54:         WHILE (Select count(*) from @principalDB) > 0
   55:             BEGIN
   56:             --Get the first database name from the table
   57:             DECLARE @dbName as varchar(100)
   58:             SET @dbName = (SELECT Top 1 dbname from @principalDB)
   59:             --Create a string to store the failover command
   60:             Declare @cmdString as varchar(255)
   61:             Set @cmdString = ''USE MASTER; ALTER DATABASE '' + @dbName + '' SET PARTNER FAILOVER;''
   62:             --Execute the command string to failover the databases
   63:             EXECUTE (@cmdString);
   64:             --Delete the database from the list since it has aleady been failed over
   65:             DELETE FROM @principalDB WHERE dbname = @dbName
   66:             END
   67:         --Finally, update the role to mirror
   68:         Update [MSDB].dbo.role SET role = 2
   69:     END
   70: ELSE IF (Select r.role from [MSDB].dbo.role r ) = 1 
   71: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
   72: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 1) = 0
   73:     -- This server is principal in name only but should be the mirror
   74:     -- Update the role to mirror
   75:     Update [MSDB].dbo.role SET role = 2
   76: IF (Select r.role from [MSDB].dbo.role r ) = 2 
   77: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
   78: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 2) = 0
   79:     -- This server is mirror in name only but should be the principal
   80:     -- Update role to principal
   81:     Update [MSDB].dbo.role SET role = 1
   82:  
   83: -- We are done
   84: -- Other potential states exist, but we don''t need to perform any action.
   85:     Update [MSDB].dbo.role SET role = 2
   86: IF (Select r.role from [MSDB].dbo.role r ) = 2 
   87: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
   88: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 2) = 0
   89:     -- This server is mirror in name only but should be the principal
   90:     -- Update role to principal
   91:     Update [MSDB].dbo.role SET role = 1
   92:  
   93: -- We are done
   94: -- Other potential states exist, but we don''t need to perform any action.', 
   95:         @database_name=N'master', 
   96:         @flags=0
   97: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
   98: EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
   99: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
  100: EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule 1', 
  101:         @enabled=1, 
  102:         @freq_type=4, 
  103:         @freq_interval=1, 
  104:         @freq_subday_type=4, 
  105:         @freq_subday_interval=1, 
  106:         @freq_relative_interval=0, 
  107:         @freq_recurrence_factor=0, 
  108:         @active_start_date=20071015, 
  109:         @active_end_date=99991231, 
  110:         @active_start_time=0, 
  111:         @active_end_time=235959
  112: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
  113: EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
  114: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
  115: COMMIT TRANSACTION
  116: GOTO EndSave
  117: QuitWithRollback:
  118:     IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
  119: EndSave: