I wanted to go through and outline some of the changes we made for MCF since our last release. The things I really wanted to accomplish in redesigning MCF was to make it more integrated with SCOM 2007 than it was in MOM 2005, to make necessary performance improvements and maintain at least the same level of functionality as we had before, while keeping with the model-based approach of SCOM 2007.
Integration with SCOM 2007
With the significant changes that took place between these releases, we had an opportunity to integrate MCF more closely with the product than it was before. The first step here, was getting rid of resolution state as the mechanism to mark alerts for forwarding. Instead, the alert object has a special nullable field (ConnectorId) that represents the identifier of the connector that currently "owns" this alert.
There are a few ways to mark alerts for forwarding. In the UI, you can right-click on an alert and there will be a fly-out menu with available connectors so you can explicitly mark an alert for forwarding. Programmatically, the ConnectorId field on a MonitoringAlert object is settable and once set can be applied by calling the Update() method on the alert. Lastly, we have introduced a concept of connector subscriptions that are exposed via the UI and programmatically (MonitoringConnectorSubscription) that allow users to "subscribe" to a particular set of alerts by using criteria that should be marked for a particular connector.
Another aspect of more tight-knit integration is the fact that 100% of MCF functionality is available via the managed SDK. While we still ship with a web-service (using WCF) and a proxy to communicate with it, the recommendation is to use the SDK to do MCF functionality unless you are not running on windows; this provides a richer object model and a better programming experience. The namespace in the SDK that servers MCF functionality is Microsoft.EnterpriseManagement.ConnectorFramework and the main object the provides the root of much of the functionality is ConnectorFrameworkAdministration which can be retrieved off a ManagementGroup object via GetConnectorFrameworkAdministration().
The last aspect of integration is the fact that a connector is modeled in SCOM 2007. While we aren't planning on shipping anything that provides health information for a connector out of the box (time constraints), there is an ability for customers and partners who write connectors to develop a health model for their connector. In a nutshell, creating a connector creates a new instance of Microsoft.SystemCenter.Connector. In order to develop a health model around a new connector, a user should extend this class with their own and develop a health model targeted at the new class. Once this is complete, the built-in setup process will need to be augmented by the connector author to discover their connector as an instance of their new class. Assuming the model is developed and working correctly, the connector object will now have state like anything else SCOM is managing.
In 2005 we have many performance issues at scale due to the fact that a lot of data is copied from place to place. In the MOM to MOM Connector scenario in particular, we physically moved alerts from bottom tiers to the top tier, replicating the data and keeping it in sync.
The first change for performance was with respect to tiering. We no longer copy alerts from bottom tiers to the top tier, but instead make available alerts from multiple tiers on the top tier. A SCOM administrator can setup tiers in the UI and make them available to connectors. Once available, MCF provides special "tiered" versions of various methods that aggregate data across these tiers the administrator setup. For instance, calling GetMonitoringAlertsForTiers() will loop through all available tiers and aggregate the results, returning alerts from each. The recommended approach for tiering, however, is to manage the calls to each tier from within the connector code itself as this allows more robustness in terms of error handling and allows for parallel processing for multiple tiers. There is much more to talk about with respect to tiering and how to accomplish this, but it is not in the scope of this article. If there is interest, I will put together a more detailed post on tiering.
The next improvement for performance has to do with how we process alerts and alert changes. In MOM 2005, we distinguished between "New" and "Updated" alerts and actually kept a cached copy of each change to an alert in a pending list for each connector. If an alert changed 3 times, we would have three independently acknowledgeable changes and three copies of the alert in the cache. This caused performance problems when dealing with large scale deployments. In order to remedy this problem, in SCOM we manage alerts for connectors in place. The alerts in the alert table have a timestamp associated with non-connector related changes (so you don't get an update to an alert you changed as a connector) and that timestamp is checked against the current bookmark of a connector when retrieving alerts. Acknowledging data is now a function of moving a connectors bookmark. In other words, when a connector retrieves 10 alerts, when they are done processing these alerts, they should acknowledge receipt and processing by calling AcknowledgeMonitoringAlerts() with the maximum time of the alerts in the batch that was retrieved. This will ensure no data is lost.
Important Note: - Since SQL datetime is not particularly granular, it is possible that two alerts with the same datetime timestamp get split between two SELECT calls. Even worse, when acknowledging across tiers, the system times on the machines will not be the same which might cause data loss by acknowledging with a time from one alert on one tier to another tier that has a time slightly in the past. In order to remedy this we actually have a built-in "delay" when retrieving alerts so that we don't get the newest alert but instead get alerts older than 30 seconds. This actually causes some "weird" behavior in that if you mark an alert for forwarding and immediately call GetMonitoringAlerts() you won't get anything, even though the alert is marked. This value is configurable in the database. This query should do it, pending any changes to the property name at RTM: (The SettingValue is the number of seconds of "buffer" you want in GetMonitoringAlerts calls. It needs to be at least 1)
UPDATE GlobalSettings SET SettingValue = 1 WHERE ManagedTypePropertyId in
(SELECT ManagedTypePropertyId FROM ManagedTypeProperty WHERE ManagedTypePropertyName = 'TierTimeDifferenceThreshold')
Other Random Things
- We no longer support discovery data insertion via the web-service. This can be done as described in my earlier post.
- We support batching (and in fact enforce it with a maximum size of 1000), however the batch size is an "approximation" as ties in the alert timestamp might require more than the requested number of alerts to be returned. Also, batch size for tiered methods is the total across all tiers, not per tier.
- The MonitoringConnector object in the SDK has an Initialized property and a Bookmark property available for reading. The web-service has GetConnectorState() and GetConnectorBookmark() for the same purpose.
- Alert insertion is not supported. Please see this post and this one.
- Alert history retrieval is not done as part of the alert retrieval. Via the SDK there is a GetMonitoringAlertHistory() method on the alert. Via the web-service we have GetMonitoringAlertHistoryByAlertIds() for the same purpose.
- We added a way to generically get alerts via the web-service: GetMonitoringAlertsByIds(). This is available in the SDK in a variety of methods.
- When updating alerts via the web-service, the connector is implicit. When updating alerts via the SDK on behalf of a connector, it is important to use the overload that accepts a MonitoringConnector object to indicate it is the connector that is changing the alert.