Windows high speed logging: ETW in C#/.NET using System.Diagnostics.Tracing.EventSource

In my introduction to EventSource  posting and my posting of the EventSource specification, I tried to give you the 'quick start' for using EventSource to generate ETW events from C# (or any .NET language).   

In this posting I would like to back a bit and motivate the 'why' behind event tracing for windows (ETW) and EventSource in particular. 

What is the problem ETW / EventSource are trying to solve?

Simply put, EventSource was designed to solve the 'in the field monitoring/diagnostic' problem.    It can be useful for client applications, but its real value tends to be on the server side.   Diagnosing issues on servers is particularly hard because

  1. The servers are handling real user requests, so you don't have the luxury of disrupting users by attaching a debugger and stepping through code.
  2. Often the problems tend to only reproduce under load, which means they tend to be non-deterministic, so you don't even know when/where to attach.
  3. Within even a single process, server applications tend to be doing dozens of requests simultaneously, making it difficult to separate the processing of a particular request
  4. The problems are often performance issues, which if they are not dramatic, are hard to isolate from everything else going on
  5. You tend to have many servers, which means that even if you log, the volume of data requires that have automation for parsing the logs.

Thus for 'in the field' diagnosis, logging is the 'obvious' solution to getting the information you need.   What would an 'ideal' logging mechanism look like

  1. It would have zero overhead when it was turned off (pay for play)
  2. It would be very fast when it was on
  3. It would purturb the normal workings of the program as little as possible, even processor scheduling decisions (e.g. it would not take locks or cause extra context switches ...)
  4. It would caputure very accurate timestamps that work across processors so that races could be diagnosed. 
  5. It would allow extra data to be logged along with the fact that the particular event happen (e.g. file names URLs)
  6. That extra data would be STRONGLY TYPED, and VERSIONED, so that the 'readers' of the log can 'count on' the data coming back.  This allows robust AUTOMATIC processing to be built.
  7. It would start with a wealth of events provided by the operating system (Processes, Threads, CPU, context switches, Page faults, memory, network I/O Disk I/O ...)
  8. It would be extensible, so that user components could add their high level information to the information provided by the operating system.

Well, this list pretty much describes the attributes of 'Event Tracing for Windows (ETW).    Fundamentally ETW is simple a strongly typed (schematized), extensible logging system built into the Windows Operating System.  

How fast is ETW?   Well, that is actually super-easy to answer with ETW itself because each event comes with its own high-accuracy timestamp.   Thus you you can the EventSource demo and simply compare the timestamps of back-to-back events to see how long an event takes.    On my 5 year old 2Ghz laptop I could log 10 events in 16 usec so it take 1.6 usec to log an event.   Thus you can log 600K events a second.   Now that would take 100% of the CPU, so a more reasonable number to keep in your head is 100K.    That is ALOT of events.     The implementation does not take locks in the logging path and any file writes are handled by the operating system asynchronously so there is little impact to the program other than the CPU it took to actually perform the logging.

Unfortunately, making the system strongly typed, extensible and versionable, comes with some complexity.  To properly use ETW logging you can't just call some logging API, you also have to create a description of all the events you could possibly generate.  This description is called the Manifest and has traditionally been created by authoing an XML file.   This manifest needs to be published so that consumers of the log files can get at it.    While in theory generating and publishing this manifest is reasonably straightforward, in practice it has been a stumbling block (It is not well integrated into a normal development process). 

This is the problem that System.Diagnostics.Tracing.EventSource solves.   It gives you the advantages of ETW (strong typing, versioning, extensibility) without the pain of explicitly constructing a manifest.    Literally in just a line of code you can define a new event.  For example I can define a Load Event which takes two pieces of payload (the image base and name),

sealed class MinimalEventSource : EventSource
{
    public void Load(long ImageBase, string Name) { WriteEvent(1, ImageBase, Name); } 

    public static MinimalEventSource Log = new MinimalEventSource();  

}

Using it is even easier.   You simply call the event method with the payload items

MinimalEventSource.Log.Load(10, "MyFile");

That's it, you are logging fully functional ETW events, getting all the benefits listed above.  Notice that the user never needed to create an XML manifest file.  Instead of forcing the user to explicitly generate a manifest, EventSource looks at the class declaration of its super-class (in this case MinimalEventSource) and constructs a manifest from that.   It logs this manifest information into the log file (just like any other event), which insures that any reader of the file has the information it needs.

By default, ETW events are not on, they need to be turned on by an ETW 'controller'.   The easiest way to turn on ETW EventSources is with the PerfView tool.  See the EventSource demo for details of doing this.  

So in summary

  1. Diagnosing functionality/performance problems in the field typically needs a high performance logging mechanism. 
  2. Event Tracing for Windows (ETW) has the architectural characteristics of an ideal logging mechanism, especially for server applications.   It is light weight, strongly typed, and extensible. 
  3. Historically creating ETW providers has been difficult because of the complexity of creating and publishing the manifest (which is the key to the strong typing)
  4. System.Diagnostics.Tracing.EventSource makes logging ETW events from C# trivial. 

So if you are writing an application where you need to do 'in-the-field' functionality/performance diagnosis, you should be instrumenting your application with ETW events with EventSource.  See the EventSource demo for how to get started.   

Next time I will discuss more details of how ways and ETW controller can selectively turn your events on and off. 

Vance