Management Servers turn gray when they are removed from the AMSRP

Recently I worked on an issue in SCOM 2012 SP1 where management servers that are removed from the All Management Servers Resource Pool (AMSRP) would turn grey. Once the management server (MS) was removed from the AMSRP errors would occur in the Operations Manager event log on the MS and an alert would be generated.

So why remove MS’s from the AMSRP? One reason is to dedicate a group of MS’s to a specific task such as network monitoring or SDK connections. In this case, however, it was for a data center failover scenario. The general recommendation is to not have more than 5ms latency between MS’s, but what if you want to have the capability to fail over a data center to a remote data center with little to no downtime and that remote data center has greater than 5ms latency? In this case we are using SQL 2012 Always On to deal with the database failover and for the MS failover we remove MS’s from the AMSRP and don’t assign them any agents. In a failover scenario we would remove the MS’s in the main data center from the AMSRP and add the MS’s from the failover data center back to the AMSRP. Finally, we would reassign the agents to the MS’s in the failover data center.

In theory this works, but we noticed that the MS’s turned grey when they were removed from the AMSRP. After further investigation we found and resolved the issue. The steps to reproduce and resolve the issue are detailed below.

Steps to Reproduce Issue

  1. Change AMSRP to manual membership using PowerShell (Get-SCOMResourcePool –DisplayName “All Management Servers Resource Pool” | Set-SCOMResourcePool –EnableAutomaticMembership 0)
  2. In the Operations Manager console, go to Administration\Resource Pools
  3. Select the All Management Servers Resource Pool and click Properties
  4. In Pool Membership, select the MS you want to remove and click Remove and Save

Steps to Resolve Issue

  1. In the Operations Manager console, go to Administration\Run As Configuration\Accounts
  2. Modify any accounts that are distributed only to the AMSRP to also be distributed to the MS you removed from the AMSRP. If you added that MS to a new resource pool you can use that as well. In my case I had two accounts that I needed to modify, the “Data Warehouse Report Deployment Account” and the “Global Service Monitor Run As Account Configuration”. You can also cross reference the 1108 events in the Operations Manager event log on the MS you removed to see which accounts it can no longer resolve.

Error Details

Event ID 1108

An Account specified in the Run As Profile "Microsoft.SystemCenter.Omonline.OutsideIn.RunAsProfile.Configuration" cannot be resolved. Specifically, the account is used in the Secure Reference Override "SecureOverride92d3e014_93cb_d967_17b4_8c5eb18ae099".

This condition may have occurred because the Account is not configured to be distributed to this computer. To resolve this problem, you need to open the Run As Profile specified below, locate the Account entry as specified by its SSID, and either choose to distribute the Account to this computer if appropriate, or change the setting in the Profile so that the target object does not use the specified Account.

Management Group: Litware
Run As Profile: Microsoft.SystemCenter.Omonline.OutsideIn.RunAsProfile.Configuration

Event ID 1108

An Account specified in the Run As Profile "Microsoft.SystemCenter.DataWarehouse.ActionAccount" cannot be resolved. Specifically, the account is used in the Secure Reference Override "SecureOverride0bc452d6_7bf2_17cc_a183_5aa213df34e6".

This condition may have occurred because the Account is not configured to be distributed to this computer. To resolve this problem, you need to open the Run As Profile specified below, locate the Account entry as specified by its SSID, and either choose to distribute the Account to this computer if appropriate, or change the setting in the Profile so that the target object does not use the specified Account.

Management Group: Litware
Run As Profile: Microsoft.SystemCenter.DataWarehouse.ActionAccount
 

Alert

System Center Management Health Service Unloaded System Rule(s)

The System Center Management Health Service 56D4A306-C0F1-598F-77FF-8E4258FA3060 running on host TFS-Lit.Litware.com and serving management group with id {453EC1E1-382C-DD51-5EB3-81798AD7F7D2} is not healthy. Some system rules failed to load.

State Change

“Secure Reference overrides cannot be resolved” on the MS Object