Capturing a profile in production based on ETW events

ETW has many uses. Filtering allows you to get only the events you want and an out-of-process consumer can listen for those filtered events. ETW can also be used for profiling native and managed code. It can sample stacks and mark context switches. That data can be combined with other ETW providers such as the CLR provider which gives you GC events, among other things. ETW profiling can also use circular buffers so that you can leave it on while waiting for an issue to occur and then save the profile for analysis. This trick is especially useful in production where you don't want to make code changes or recycle processes or impact the performance of the process.

The profiler of choice is PerfView. If you search Channel 9 you'll find all kinds of great videos on how to use PerfView for analysis of performance problems. PerfView's command line options allow you to start and stop profiles with separate calls. There is also built-in support to stop collection on performance counters, slow HTTP requests, long GC collects, and events in the Application event log matching a regular expression pattern. However, let's say you are getting an occasional WCF exception in production. Rather than write a message to the Application event log, you could just listen for the event directly.

I put together a C# solution for this using the EventTraceWatcher helper class which is an excellent tool put together by Daniel Vasquez Lopez. The main code for this is very straightforward:

using Samples.Eventing;
using System;
using System.Diagnostics;
using System.Threading;

namespace CircularTrace
{
    class Program
    {
        private static ManualResetEvent waitForEvent = new ManualResetEvent(false);
        static void Main(string[] args)
        {
            try
            {
                RunProcess("PerfView.exe", "start -AcceptEULA -NoGui -Circular:250 -logfile:perfview_start.log -ThreadTime");

                // Wait for a WCF exception to occur
                Guid RewriteProviderId = new Guid("c651f5f6-1c0d-492e-8ae1-b4efd7c9d503");

                using (EventTraceWatcher watcher = new EventTraceWatcher("WcfErrors", RewriteProviderId))
                {
                    watcher.EventArrived += delegate(object sender, EventArrivedEventArgs e)
                    {
                        waitForEvent.Set();
                    };

                    // Start listening
                    watcher.Start();
                    waitForEvent.WaitOne();
                }

                RunProcess("PerfView.exe", "stop -AcceptEULA -NoGui -logfile:perfview_stop.log");
            }
            catch (Exception exc)
            {
                Console.WriteLine(exc);
            }
        }

        private static void RunProcess(string executable, string arguments)
        {
            Process process;
            ProcessStartInfo startInfo;

            process = new Process();
            startInfo = new ProcessStartInfo(executable, arguments);
            startInfo.UseShellExecute = false;
            process.StartInfo = startInfo;
            process.Start();
            process.WaitForExit();
        }
    }
}

This starts PerfView with a 250MB circular buffer. The AcceptEULA and NoGui parameters are probably not necessary with the logfile parameter specified. In the middle the wait is implemented by watching the WCF ETW provider on an existing session named WcfErrors. That session must be setup prior to running this tool.

Here are instructions on how to create the session, which are the same as Daniel included in his blog:

  1. Open the Performance Monitor tool (should be under Administrative Tools)
  2. Locate Data Collector Sets -> Event Trace Sessions in the tree on the left
  3. Right-click on Event Trace Sessions and picked New -> Data Collector Set
  4. Name the set “WcfErrors”, use manual setup, and hit next
  5. Next to Providers, click Add and find the provider "Microsoft-Windows-Application Server-Applications"
  6. Under Properties select Level and hit Edit
  7. Pick win:Error and hit OK
  8. Click Next twice
  9. Select "Open properties for this data collector set" and hit Finish
  10. Go to the Trace Session tab
  11. Change the Stream mode to Realtime
  12. Hit OK
  13. Start the WcfErrors session

With the session created you should be able to run the tool. When it finishes, PerfView will create a zip file and a summary file. If necessary PerfView will generate the NGEN PDBs and store them in a symbols folder in the zip file.

The only filter I apply above is on the level, which I set to Error. But you can be more specific by filtering to a certain channel and set of keywords. EventTraceWatcher unfortunately does not give you the whole event, just a dictionary of properties for the event. So if you wanted to filter in your code, then you would have to edit the EventTraceWatcher code to expose the rest of the event's properties.