Through rules, SCOM provides a powerful system for capturing specific events from the event log of Windows Azure applications and notifying various groups of critical events through SCOM alerts. Events are stored in the SCOM database, which can be analyzed in views and reports. This walkthrough will demonstrate how to create a rule that captures critical events from Windows Azure role instances, how to setup alerting, and how the monitoring works after the setup work is done.
If you need a primer on installation of SCOM 2007 R2 along with Windows Azure monitoring, then please see my blog post here. This is one of several posts in a progressive “soup to nuts” series that will get you up to the point where you are ready to implement event log alerting using SCOM.
Very special thanks go out to my colleagues Brian Wren, Steve Harmon, and Fred Lee, who were instrumental in providing their various areas of technical expertise towards making this document accurate and informative, as well as possible.
Setting up Rules in SCOM
For this first part of setting up rules, I am essentially following guidance which can be found on TechNet at http://technet.microsoft.com/en-us/library/gg508734.aspx. To begin, we will launch the SCOM Console and select the Authoring view. Within this view, we will navigate to the Rules node under Management Pack Objects. You will see a plethora of rule targets, as seen below. You can collapse the targets to find the specific Azure targets we’re looking for, but there are so many it’s better to simply apply a scope to narrow what you see in this view.
Choose the Scope button in the toolbar, which will display the dialog Scope Management Pack Objects. From here, type “azure” in the Look for: textbox, and also check the View all targets radio button (since Azure is not currently considered a “common” target). In the list box below, check Windows Azure Role Instance as well as Windows Azure Role Instance followed by the name of your hosted service in parentheses (i.e., Windows Azure Role Instance (<your hosted service name>)), as seen below.
Choose the OK button, which will yield a nicely scoped down view, as seen below. I have expanded out the node for my role instance, which is where we will add the new rule. We won’t add the rule from here however, but will do that in the Authoring Console. Once we have added the rule, we will see the rule here once we refresh the view.
In order to create the rule, the first thing we need to do is to switch to the Administration view, and then select the Management Packs node. Find your management pack in the right-hand pane, right-click, and select the Export Management Pack… menu option. Go ahead and export your management pack to the desired folder.
This next step is only necessary if you have the SCOM 2007 R2 Cumulative Update 4 (CU4) installed, as there is a bug in the reference to the Microsoft.SystemCenter.Library library. Minimize the SCOM Console, and now navigate to the folder where you saved the management pack. Open the management pack with notepad and search for the number “61.” You simply want to change the “61” to “0” under the Microsoft.SystemCenter.Library reference and save the management pack. Once you do this, you will be able to successfully open the management pack from the Authoring Console.
Open the Authoring Console, navigate to the management pack you just saved, and open the file as seen below.
Once you have opened the management pack, you should see a view similar to below. If you see a bunch of cryptic rule names beginning with MomUIGeneratedRule<some nasty looking Guid>, then you will probably want to select Tools | Options from the menu and select the Use display names option, as seen below.
Notice from the screenshot below that we’re beginning with a clean installation, which by default has some performance counters already defined out of the box. We’re going to add our event alerting rule to our management pack that we created, which will be the container for all of our Azure objects in this series of documents. From the Actions pane on the far right, select the New link, and then from the popup menu, select the Custom Rule… item.
You will then be presented with a dialog asking for a unique identifier for your rule. Here, I just modified what was NewRule at the end of the ID to EventAlertingRule as specified in the TechNet article. Select OK to proceed.
You will be presented with a dialog as seen below. In the General tab, enter a name for your alert. In the Target: dropdown, select the Windows Azure Role Instance item that is followed by your hosted service name in parentheses.
Next, select the Modules tab. Select the Create… button next to the Data Sources list box, which will display the Choose module type dialog, as seen below. Type “windows azure” in the Look for: textbox to narrow down the module type items in the list box below. Select the Windows Azure Role Instance Event Log Collection Simple Data Source item. Enter a name in the Module ID text box at the bottom, and select OK.
After selecting OK, your rule dialog should look as below.
Now select the Edit… button next to the Data Sources list box. Enter values for IntervalSeconds and TimeoutSeconds, which are typically 300 and 60 respectively. Select the OK button.
Next to the Condition Detection list box, select the Create… button, where you will choose a module for detection of the condition you desire. As seen below, type in something like “system.exp” to narrow the list box items down. Select System.ExpressionFilter. Enter a name in the Module ID textbox and select the OK button.
Now we will create the selection criteria for our events. Select the Edit… button next to the Condition Detection list box. You will now see the Expression Filter dialog, as seen below. On this dialog, select the Configure… button.
In the Configuration dialog, select the Insert button (or alternatively, the arrow next to it, and then select Expression from the dropdown). For the Parameter Name field, type “EventNumber”, which is the parameter in the event we are targeting (note that this field can be a little cantankerous, so be sure to double check what you have typed if the scroll bar appears). Select “Equals” for the Operator field. In the Value field, type “100”. Basically, the condition on which this rule will fire will be any events collected in SCOM that have the Event ID set to 100, which is typical for user-generated events. Since we will generate our own event source later, there won’t be any conflicts with other Event IDs because it is the source coupled with the identifier that uniquely identifies each event. Your dialog should look as below. When you’re done, select the OK button to dismiss the dialog.
The Expression Filter dialog will now look like the following. Select the OK button.
Next, we will select the Create… button next to the Actions list box. Type “system.” to narrow down the results, and select System.Health.GenerateAlert. This will be our action if the condition we set above is met for any event. Enter a Module ID in the textbox at the bottom, and select the OK button.
This is a bit strange, but now we need to select the Edit… button next to the Actions list box to select the priority and severity of the alert whose Event ID matched our criteria above. Select the Configure… button on the Configuration dialog.
In the Configuration dialog, enter an alert name, and set the Priority as well as Severity. What really matters here is the severity, so you can leave it set at Critical. Leave the alert description as is, since we want the event description, but will have to do a bit of work to get at it (which will be explained shortly).
You now have the finished alerting rule. Select OK to dismiss the dialog.
Our Rules list should now look like the following.
The last step is to export the management pack back to SCOM so we can begin using our new event log alert rule. From the top-level menu, select Tools | Export MP to Management Group, and follow the normal prompts.
Let’s now close the Authoring Console, return to the SCOM Console, and refresh the page for Rules. If we expand out the rules for our role instance, we will now see the Event Log alert rule we just created, as seen below.
We are now done setting up our rule. The next step is setting up notifications so we can receive alerts from SCOM when the rule detects an actionable event.
Setting up Notifications in SCOM
Next, we need to setup our alerting mechanism in the SCOM Console, which entails setting up channels for different types of notifications, subscribers who will subscribe to particular channels, and subscriptions that connect subscribers to channels based on specific criteria. From the SCOM Console, elect the Administration view, and navigate to the Channels node under Notifications, as seen below. From the Channels node, select the New channel | E-Mail (SMTP)… menu item.
You will now be presented with the E-Mail Notification Channel wizard. Depending on the type of channel, it will seed the fields with relevant values for you, as seen below. Substitute with your own desired channel name and description, and then select the Next button.
I am presuming that a valid SMTP server has already been provided by your system administrator. If this has not been provided, and you need a good tutorial to set this up on your SCOM server, take a look here. Be sure to test out your SMTP server with telnet to verify you can receive e-mail from SCOM. Below is my successful telnet session that was launched from the command line with the command telnet localhost 25. You can use anything for the return address on the mail from: line that is syntactically valid, but it is recommended that you use a valid DNS name so there is a lower likelihood your test e-mail will end up being blocked by a spam filter or routed to your junk e-mail folder. You may want to use a valid operations administrator e-mail address here that will serve as the return address for e-mail notifications, which we will setup shortly.
Please note that for a lab environment you can use the IIS SMTP service, but for enterprise environments you will want to point this to your approved SMTP application relay server (whether that is an Exchange Hub Transport, Edge Transport, or other production mail system), as these systems should have the proper controls in place for relaying legitimate messages from approved hosts after checking for spam, antivirus, and so on.
Moving on, the next page of the E-Mail Notification Channel wizard will allow you to enter information for you SMTP server(s). In order to add servers, select the Add button.
You will now see the Add SMTP Server dialog. Enter the SMTP server name, and modify the port number and authentication method based on your standards.
For the return address, you will want to enter a valid email address in case a subscriber who is not an operations administrator wants to inquire about a specific e-mail notification. When you’re done entering the return address, select the Next button to proceed to the next page of the wizard.
On the final page of the wizard, you can modify to suit, but I would suggest leaving the default for now and selecting the Finish button to complete setting up your channel. You can modify later as needed.
We will now setup the subscribers. Right-click on the Subscribers node and select New subscriber… from the popup menu, as seen below.
You will now be presented with the Notification Subscriber Wizard. On this first page (which will default to the domain name of the currently logged on user), enter a free form description for a specific subscriber, or alternatively use the ellipsis (…) button to select a specific subscriber’s domain name from Active Directory. We will see how to add addresses and notification schedules for this subscriber in the last page of the wizard. Select the Next button.
On the next Schedule Notifications page, I accept the default, but you have the option here of only accepting notifications during certain times. Select the Next button.
On the final Subscriber Addresses page, you can now add one or more addresses for this subscriber, specifying various ways in which they may receive notifications, whether that is through E-mail, IM, SMS, or the execution of a command. Select the Add button.
Once you have selected the Add button, you are presented with the Subscriber Address sub-wizard. On this page, enter a friendly name that provides a description of this particular delivery address for this particular subscriber (e.g., Joe @ Work – IM, Joe @ Home – SMS). Enter the address name and select the Next button.
In the next page of the sub-wizard, you will select the channel type, such as e-mail, and then you will enter the appropriate delivery address in the textbox below. For example, if you selected the text message option, you would enter a phone number. You will see visual validation checking to ensure you have entered the delivery address in the correct format.
In the last page of the sub-wizard, as seen below, you can set a schedule for when you receive notifications for each delivery address. For example, you may only want to get e-mail notifications during business hours, but you may want SMS messages outside of business hours up until 9 p.m. at night during the weekdays. Leave this as is or set the values if you want to be notified 24×7, or set as desired using multiple criteria, as seen below. When you’re done, select the Finish button.
You will be brought back to the original wizard with your subscriber address now added. As seen below, I added a work e-mail address for business hours, and a cell number for receiving text messages after hours. Select the Finish button when you are done.
Next, we will setup the subscriptions that will bring together subscribers and channels with the criteria required for sending notifications. Right-click the Subscribers node and select New subscription… from the menu, as seen below.
You will now see the Notification Subscription Wizard. Enter a name for the subscription. As seen below, I am currently setting up an Azure-specific subscription. Select Next.
In the next wizard page, we define the subscription criteria. In the screenshot below, I will have two criteria. The first criterion is that I want alerts raised by any instance of a specific group or groups, such as one or more Azure applications. The second criterion is that I only want alerts generated by my applications of a specific severity.
In the criteria description area at the bottom of the Criteria dialog, select the specific link. You will see a group search dialog as seen below. You can filter here to find your particular Azure groups, and then you would select the Add button to add your group to the Selected groups: list at the bottom. Add as many groups as desired, and when you’re done select the OK button.
Next, we want to select the specific link on the Criteria page, where we will be presented with the Alert Type dialog as seen below. I have selected only critical alerts from Azure applications for notification. Select the OK button when done.
When you’re done, your Criteria page should look similar to the following. Select the Next button to proceed to the Subscribers page.
On the Subscribers page, we will select the subscribers who will be notified by this subscription. Select the Add button to add a subscriber.
On the subscriber search page, you can select from the subscribers you previously configured. A filter is provided if you need to narrow down your search. Search on your subscribers, and add them to the subscriber list by choosing the Add button. When you are done adding subscribers, select the OK button.
When you’re done, you will see your list of subscribers and the channels on which they are able to receive notifications. Select Next to move to the next page.
The channels page may be a bit confusing, as you have already selected the channels for your individual subscribers, but that only means these are their available channels, not necessarily the ones that would be used for any actual subscription. What this page allows you to do is to select or selectively filter which channels you wish to use for this particular subscription. You will select the Add button, as seen below, to begin adding desired channels for this subscription.
Below is the channel search dialog, where you can search on the channels that you added previously. If you didn’t add the channels under the Channel node, then they won’t be listed here. When you’re done, select the OK button.
When you’re done, you will see the list of selected channels for this subscription, as seen below. You can select options for delaying notifications on this page. When you’re done, select the Next button.
You’re now finished creating your subscription, and can recap your work as seen below. Select the Finish button and you’re done setting up notifications.
Alert Generation and Monitoring
We’re done setting up our rules, and we’re done setting up notifications, but there is more work to do to properly generate the alerts from Azure applications that we subsequently want SCOM to collect from Azure storage and monitor. It would appear all we need to do is generate event alerts in our Azure applications in the typical manner on a server or workstation, using the EventLog.WriteEntry method from System.Diagnostics. Unfortunately, it’s not that simple. The problem is that if we generate an event in an Azure application, the event description won’t be propagated to Azure storage (see diagram below to understand the overall flow), since interpretation of the event will fail within the event system of your web or worker role. The problem itself is not with Azure, but with Windows Server 2008 (Enterprise Version), which is the host virtual environment for Azure role instances. A little background information will help us to understand how we can resolve this problem.
When an event is generated by an application running on the Windows operating system (Windows Server, Windows 7, Windows XP, etc.), it must be associated with an event source. Typically, a developer would specify “Application”, as seen in the code snippet below:
System.Diagnostics.EventLog eventLog = new System.Diagnostics.EventLog();
eventLog.Source = "Application"; // Writing to existing event log source.
eventLog.WriteEntry(“Your message here.”, EventLogEntryType.Error, 100);
Note that the Event ID (third) parameter of the WriteEntry method is 100, which matches our rule discussed previously. If a developer uses the code above and then looks in the Event Log after generating an event, they will certainly find their event in the Application log. However, they will see that an error is generated in the detail of the event, but it is of no real consequence because the desired message is still displayed in the detail after the error message, as seen below.
Without getting too deep in the details, the reason for the error is because the event system in Windows expects each event to be associated with a particular event source. The source contains an EventMessageFile registry key that points to a specific .dll or executable, which is responsible for mapping parameter strings into placeholders defined in the message file description string (e.g., the description string “File %1 contains %2, which is an error” has two parameters that must be mapped). This file defines how the event system should display the data in Event Viewer. In the case above, our event is not properly mapped for the Application source, so an error is displayed. Let’s now look in the registry at the Application source in the Application event log, as seen below, to gain more insight.
The code snippet above is mapped to the Application source, and from our available API, we have no way of knowing the correct way to format our string to allow wevtapi.dll to map it correctly. Now let’s look at the event source immediately below the Application source, named Application Error, as seen below. This source is mapped to an EventMessageFile key. If we were creating our own event source from scratch, we would need to generate the appropriate event message file to source and display our event, and map it to the EventMessageFile key as seen below. This event message file would include a description string that contains insertion string placeholders. The event itself would include data elements that map to the placeholders. Fortunately we can avoid this and leverage the generic .NET event message file, which will map our message string properly. We will cover this in detail later.
In short, our problem is since we don’t have a mapping that works for the events we generate, the event system doesn’t know how to interpret the event data. And this becomes a consequence for an Azure application since events generated in the role instance must then be propagated to Azure storage, and since the event system doesn’t know how to interpret our event message (technically, the event description), our message text does not get propagated as the event description field in Azure storage.
Our approach to solving this problem will be to use the command line executable EventCreate.exe to generate an event source, and we will write to this source. There are other methods I won’t mention here, but this is the simplest method to implement. What this will entail is setting up a startup task in our roles that will generate the event source we will use when each role instance spins up. To begin, within our Visual Studio project for our Azure application, we will create a startup folder in each Azure role that will generate events. The reason we need to do this for each role is because each role will be mapped to its own instance (or instances) of Windows Server 2008.
Under the startup folder, you will create a command file, which must be created with a text editor such as Notepad. Do not create the command file directly from Visual Studio since, as described in this article on creating startup tasks, text files created in Visual Studio are saved by default in a format that includes a byte order mark, making it impossible to execute them as batch files. Also, be sure to set the Copy to Output Directory property on the command file to “Copy if newer” (as seen below).
In Notepad below, I have created a command file named ConfigureEvents.cmd, and added the EventCreate command necessary to create our own source. You can see the support article here on creating “custom” events (actually, not fully custom). In essence, what we are doing when we execute the EventCreate command upon startup of our Azure role is to create a new event source that will accept our events and interpret our message correctly as the event description. Reading parameters from left to right, we have specified the Application event log, the level is Information, the ID is 999, the Source Name is “SCOM Source,” and the description follows the /D switch.
Let’s now return to Visual Studio and walk through what is required in our application. In the code below, the event log source has now been modified to match the name of the event source specified in the startup task.
Next, the service definition file must be modified in each role that requires event alerting with a Startup section that launches our startup task, as seen below. Do not miss this step, or your role will not execute the command we need on startup.
For illustration purposes, below is a screenshot of a Remote Desktop Connection (RDC) session into the running Azure role instance to illustrate what the startup task with the EventCreate command will produce. As a result of the startup task, we see that our event source has been added, and that it is mapped to the EventCreate.exe file, which will correctly interpret our events.
If we next launch the Event Viewer in the RDC session, we can verify that the EventCreate command has generated one entry at the Information level, as specified in our command file. So now that we have verified this event log entry, we can begin generating events in the Azure application.
Now we’re ready to test this out. From my Azure event log test application (which was modified from an excellent article by Mike Kelly), I generated an error event, as seen below.
I then went to the RDC session to verify the event was written from the web role, as seen below (note it is easier to visualize the data in XML view).
Now we have to wait for a few minutes for the event to be transferred to the Azure storage from the role instance, and for SCOM to collect the event. SCOM is now awaiting events to show up in Azure storage, which in our case will monitor for events of Critical severity. We will learn about the alerts by monitoring the Active Alerts node in the SCOM Console (which updates itself periodically), as well as through the e-mail notifications. Below is a screenshot of the alert presented in the SCOM Console.
We can double-click on the alert to see more detail, particularly in the Alert Context tab of the alert detail dialog, as seen below, where we can view the event XML. Note that the message we sent when we generated our event is in the EventDescription element towards the bottom, where it should be. This element is what would be missing if we don’t follow the steps prescribed in this section. We would, however, see the message above in EventData/Data element, but SCOM isn’t looking there for the message, and most operations folks wouldn’t take kindly to hunting for the message. Additionally, the e-mail notifications would be frustrating because the alert message would be missing there as well.
Based on our subscriber setup previously, we would have received an e-mail notification. Below is an example in Outlook 2010 of an event log alert e-mail sent from SCOM.
That wraps up what is a very long document on Event Log Alerting and Monitoring. As you can tell, this is not trivial, but with some patience you will have this setup successfully, and will be able to take advantage of a key capability in SCOM 2007 R2 for monitoring your Azure applications.