Windows high speed logging: ETW in C#/.NET using System.Diagnostics.Tracing.EventSource


In my introduction to EventSource  posting and my posting of the EventSource specification, I tried to give you the ‘quick start’ for using EventSource to generate ETW events from C# (or any .NET language).   

In this posting I would like to back a bit and motivate the ‘why’ behind event tracing for windows (ETW) and EventSource in particular. 

What is the problem ETW / EventSource are trying to solve?

Simply put, EventSource was designed to solve the ‘in the field monitoring/diagnostic’ problem.    It can be useful for client applications, but its real value tends to be on the server side.   Diagnosing issues on servers is particularly hard because

  1. The servers are handling real user requests, so you don’t have the luxury of disrupting users by attaching a debugger and stepping through code.
  2. Often the problems tend to only reproduce under load, which means they tend to be non-deterministic, so you don’t even know when/where to attach.
  3. Within even a single process, server applications tend to be doing dozens of requests simultaneously, making it difficult to separate the processing of a particular request
  4. The problems are often performance issues, which if they are not dramatic, are hard to isolate from everything else going on
  5. You tend to have many servers, which means that even if you log, the volume of data requires that have automation for parsing the logs.

Thus for ‘in the field’ diagnosis, logging is the ‘obvious’ solution to getting the information you need.   What would an ‘ideal’ logging mechanism look like

  1. It would have zero overhead when it was turned off (pay for play)
  2. It would be very fast when it was on
  3. It would purturb the normal workings of the program as little as possible, even processor scheduling decisions (e.g. it would not take locks or cause extra context switches …)
  4. It would caputure very accurate timestamps that work across processors so that races could be diagnosed. 
  5. It would allow extra data to be logged along with the fact that the particular event happen (e.g. file names URLs)
  6. That extra data would be STRONGLY TYPED, and VERSIONED, so that the ‘readers’ of the log can ‘count on’ the data coming back.  This allows robust AUTOMATIC processing to be built.
  7. It would start with a wealth of events provided by the operating system (Processes, Threads, CPU, context switches, Page faults, memory, network I/O Disk I/O …)
  8. It would be extensible, so that user components could add their high level information to the information provided by the operating system.

Well, this list pretty much describes the attributes of ‘Event Tracing for Windows (ETW).    Fundamentally ETW is simple a strongly typed (schematized), extensible logging system built into the Windows Operating System.  

How fast is ETW?   Well, that is actually super-easy to answer with ETW itself because each event comes with its own high-accuracy timestamp.   Thus you you can the EventSource demo and simply compare the timestamps of back-to-back events to see how long an event takes.    On my 5 year old 2Ghz laptop I could log 10 events in 16 usec so it take 1.6 usec to log an event.   Thus you can log 600K events a second.   Now that would take 100% of the CPU, so a more reasonable number to keep in your head is 100K.    That is ALOT of events.     The implementation does not take locks in the logging path and any file writes are handled by the operating system asynchronously so there is little impact to the program other than the CPU it took to actually perform the logging.

Unfortunately, making the system strongly typed, extensible and versionable, comes with some complexity.  To properly use ETW logging you can’t just call some logging API, you also have to create a description of all the events you could possibly generate.  This description is called the Manifest and has traditionally been created by authoing an XML file.   This manifest needs to be published so that consumers of the log files can get at it.    While in theory generating and publishing this manifest is reasonably straightforward, in practice it has been a stumbling block (It is not well integrated into a normal development process). 

This is the problem that System.Diagnostics.Tracing.EventSource solves.   It gives you the advantages of ETW (strong typing, versioning, extensibility) without the pain of explicitly constructing a manifest.    Literally in just a line of code you can define a new event.  For example I can define a Load Event which takes two pieces of payload (the image base and name),

sealed class MinimalEventSource : EventSource
{
    public void Load(long ImageBase, string Name) { WriteEvent(1, ImageBase, Name);

    public static MinimalEventSource Log = new MinimalEventSource();  

}

Using it is even easier.   You simply call the event method with the payload items

MinimalEventSource.Log.Load(10, “MyFile”);

That’s it, you are logging fully functional ETW events, getting all the benefits listed above.  Notice that the user never needed to create an XML manifest file.  Instead of forcing the user to explicitly generate a manifest, EventSource looks at the class declaration of its super-class (in this case MinimalEventSource) and constructs a manifest from that.   It logs this manifest information into the log file (just like any other event), which insures that any reader of the file has the information it needs.

By default, ETW events are not on, they need to be turned on by an ETW ‘controller’.   The easiest way to turn on ETW EventSources is with the PerfView tool.  See the EventSource demo for details of doing this.  

So in summary

  1. Diagnosing functionality/performance problems in the field typically needs a high performance logging mechanism. 
  2. Event Tracing for Windows (ETW) has the architectural characteristics of an ideal logging mechanism, especially for server applications.   It is light weight, strongly typed, and extensible. 
  3. Historically creating ETW providers has been difficult because of the complexity of creating and publishing the manifest (which is the key to the strong typing)
  4. System.Diagnostics.Tracing.EventSource makes logging ETW events from C# trivial. 

So if you are writing an application where you need to do ‘in-the-field’ functionality/performance diagnosis, you should be instrumenting your application with ETW events with EventSource.  See the EventSource demo for how to get started.   

Next time I will discuss more details of how ways and ETW controller can selectively turn your events on and off. 

Vance

Comments (13)

  1. Barry Kelly says:

    What you write seems to imply that you don't think dtrace is a better solution?

    I think ETW is kind of over-engineered on one hand, yet limited and inflexible on the other. Dtrace seems way better to me.

  2. Vance Morrison says:

    I am not making (or implying) anything about DTrace or how it compares to ETW.   For those who don't know, DTrace is the analog to ETW on Unix operating systems.    It is simply not an option on Windows platforms so such a comparison can only be useful to the implementors of each system since end users simply don't have a choice.    For what it is worth, I do think that ETW could benefit from some of the features of DTrace, but I also believe that both existing systems are more than good enough for the vast majority of diagnostic and performance investigations.    

    The weakest link frankly is user education about what capabilities are available, and I am trying to fix that at least for people working on windows operating systems (which are the only people who would probably care to read my blog).  

  3. Randhir says:

    Thanks Vance for explaining this and making ETW palatable for general .Net public.

    Could you pls compare EventSource with TraceSource? I understand EventSource would be much more performant but do you see a place for TraceSource usage as well?

    Simply asking, if you are coding a version 1 of your product, would you only use EventSource or combination of EventSource and TraceSource?

    -Randhir

  4. vancem says:

    Sorry for the delay.  EventSource was designed so that it could be a complete replacement for TraceSource.   Thus in an ideal world, a complete new code base would only use EventSource.   The reason is that it is safer/easier (at the logging sites you ONLY specify what is actually needed (only the logger, event  name and payload, no 'events ids, or logging levels etc).   Moreover, EventSources (through EventListeners can do everything that TraceSources can do (in fact it is very straightforward to build a bridge where EventSource data gets written to TraceListeners (and thus any 'back end' infrastructure based on TraceListeners will 'just work'.  

  5. Sreeni says:

    I tried to split the producer and consumer in EventSource demo. Interesting thing is, EventListener.EnableEvents (myListener0.EnableEvents(MinimalEventSource.Log, EventLevel.Verbose)) is taking EventSource's instance as input. As producer's instance and consumer's instance are different, consumer/listener is not receiving any events. Could you share a sample that splits producer (trace sender) and consumer (trace listener) into two different applications? I am able to successfully do this with TraceViewer. But not with EventTrace.

  6. vancem says:

    As Sreeni has discovered, EventSource and EventListener are designed to work in the same process (actually the same Appdomain in the same process),   Thus you can't have a EventSource in one process and communicate with it with an EventListener in another process.  That was not the goal of EventListener.   EventSource was designed to hook nicely into the Window ETW system, however. what if you wanted strongly typed logging but for whatever reason did not wish to use ETW?   For that you would need a 'hook' that allows you to get at the all the events generated by your EventSources, so that you could send the log event data wherever you wanted (file, database, console …) this is what EventListener does.  

    For cross-process logging of events, you have two possibilities

    1) Use the built in support EventSource has for ETW, and use ETW as your cross-process communication mechanism.  The is effectively what the example blogs.msdn.com/…/and-end-to-end-etw-tracing-example-eventsource-and-traceevent.aspx does (this example happens to do it in one process for simplicity, but hopefully it is obvious how you would make two processes).  

    2) Use EventListener in the process with the EventSources to get the eventSTream, and build your own cross-process communication path you want (e.g a pipe).  Note that it non-trivial to keep the events 'strongly typed' so I don't recommend this unless option (1) can't be made to work (e.g. you have a phone or slate app and you want to log your events on your hosting machine).    

  7. Thanks a lot for bringing this topic up! These posts make a lot to finally convince one of our customers that ETW is the way to go for tracing (the perf figures and other stuff look very compelling).

  8. Stygen says:

    Thanks .. this really helped me in getting direction

  9. Karthik says:

    HI,

    Is there a way we can have the traces written to a log file as we do in tracelisteners?

    Can you please share the configuration for the same?

    Regards,

    Karthik

  10. vancem says:

    You don't turn ETW on or off with configuration file, you use a tool (e.g like PerfView) to indicate if logging should be on or off, and then look at the data with that tool.     I mention the demo of doing this.  .  

    You can also do this programmatically, which allows you to make your app send its logging data wherever you want, including a file, and this can be controlled via a config file if that is what you ant.   The semantic logging application block has code that does that

    blogs.msdn.com/…/more-support-for-eventsource-and-strongly-typed-logging-the-semantic-logging-application-block.aspx

    Finally it is a pretty easy piece of code to route EventSource data to TraceSources, which then you can control using TraceListeners (and config files), if that is what you desire.

  11. Pepe Lu says:

    Hi Vance,

    We have a service to collect log files customers needed and upload them to a cloud storage. Currently we are using flat files to log and our service will scan it every 5 minutes (We have 100K level log files on disk, and scanning in windows takes too much time). But this will definitely bring in 5 minutes latency for some important logs. Now we want to create a real-time channel using ETW. All seems good but we MUST make sure the logs(tracing) never get lost even if our service is not installed. Can ETW achieve that?

    I have one idea to implement my own trace listener to call APIs exposed by our service. If it fails, just write to disk like what TextWriterTraceListener does. Does it make sense? In your post above, I see you are comparing the performance of using trace listener and EventSource. How much they differ? Do you have a estimation?

  12. vancem says:

    You can certainly use an In-process EventListener that logs the data locally and/or forwards it to your cloud (and you have the option of trying to make that work in process, or have the uploading be a separate process on the collection machine).    If you look on the web for 'Semantic Logging Application Block' you will find some code that you may want to reuse in making such a listener.  

    As far as the performance difference between EventSource/EventListeners and TraceSource/TraceListeners, are concerned, I don't have definitive data.   It depends on scenario (what events are turned on (or not)), as well as what exactly is in the payload, and how much string manipulations before passing it to TraceSource and how much manipulation you do in the actual logging.      In general the EventSource code path to ETW is highly optimized, but that is not as true for the EventListener path (but it is not bad, it is probably comparable to TraceListener).   If logging is happening < 100 events /sec the perf is probably not that important one way or the other (and for monitoring you should be trying to keep it < 10 /sec ideally).    

    Note that it is pretty easy/straightforward to conduct a performance test (See 'PerfVIew' posts).