Event Tracing for Windows (ETW)


    A lot of work in performance tuning is
organizational.  There’s only so much work one can do with a
profiler and a single module.  A good example is the Registry —
we can attach profilers to the Registry access routines and optimize
them until they run as smooth as silk, but performance will still be
impacted if you do thousands of Registry accesses per second.  For
many problems, the cause is systemic: several components in a chain of
command that are individually well-tuned, but didn’t expect to call
each other in a huge chain.  A good example of that is DirectShow
— no matter how skillfully crafted an individual filter is, if the
mean path in a DirectShow graph is ten filters deep (with memory
management between each one for passing buffers of audio or video
around), latency is going to be high.



    More often than not, the best solution is simply
logging.  Log when filters are instantiated or connected, log when
Registry accesses are made, etc..  You want to mark high-level
concepts, and try to get a picture for what’s going on with the system
as a whole.  This works fine if you only have one application that
has to log… but more often that not, these systemic problems have
hundreds of files involved, most of which aren’t coded by you!  If
every programmer performs their logging in a different way, it can be a
nightmare to combine all those logs together, mixing different types of
timestamps and different methods of delivery, and get a single ordered
log of what happened over time.  Of course, that’s exactly what we
need… and that’s where Event Tracing for Windows, or ETW for short,
comes in.



    ETW is, at its core, a unified system for one-way
packetized I/O managed by the Windows kernel, built for logging. 
Every use of ETW has three participants in it — the controller, the
provider, and the consumer:

  • A provider is an module (DLL/EXE) doing something worth logging. 
    Most of the time, it runs without logging; it can, however, be
    “enabled” by a controller, at which point it recieves a handle from the
    kernel and starts logging “events” to that handle.  An event is
    an arbitrary struct (binary block) of data, the only condition being
    that it start with a 48-byte header.  This header contains a
    timestamp and identifying information.


  • A controller controls the actual act of logging.  The controller
    can ask the kernel to start a logging session, creating a handle and
    specifying that the kernel should take any events delivered to that
    handle and save them to a file.  (That file is usually on a hard
    drive, although we occasionally save them to RAM drives to ensure
    minimal interference.)  The controller can also enable and disable
    logging by providers, passing them a handle to log to.


  • A consumer reads events out of a file created by a logging session and
    parses them.  (It is also technically possible to have a consumer
    directly attach to a logging session’s handle and retrive events in
    real-time, but this is rare.)

    So, why use this system over your own homebrew system?

  • Uniformity.  If you’re debugging systemic problems involving
    multiple components, and all the involved components use ETW, you can
    have them all deliver their information to a single log file with
    uniform, steady timestamps, and write a single application that parses
    them all.


  • Speed.  ETW is extremely fast for providers to use, since all the
    I/O is handled by the kernel instead of by your module.  It
    typically takes only 1500-2000 cycles, depending on settings, to
    deliver an event and return to your code.  One can easily deliver
    thousands of events per second even on ancient machines.  We’ve
    achieved 20,000 events per second while only using 5% CPU load
    on a P3 500MHz
    (Yes, we have machines that old in our
    perf testing labs — not everyone who uses Longhorn will be using a
    modern machine!)



  • Consistency.  With fprintf() or other homebrew systems, logging
    tends to be very slow and intrusive and is thus usually compiled
    in.  With ETW, logging is extremely fast; furthermore, since
    logging is turned on by a controller and is usually off by default, you
    can actually leave the ETW events in final shipping code!  If
    problems are found in the field, send the tester an app that starts a
    trace and turns on the provider, then read it later.  Many, many
    components in Longhorn will ship as ETW providers.



  • Reliability
    .  ETW isn’t a new thing — it’s actually been in the
    OS and actively used since Win2K, and has been constantly refined since
    then.  Furthermore, ETW is available in both user-mode apps and
    kernel components.  (The latter access it through a
    MJ_SYSTEM_CONTROL IRP.)  This leads to…



  • OS cooperation
    .  The Windows kernel can provide many highly useful
    events via ETW for diagnosing performance problems.  Find out when
    and where disk I/Os, registry accesses, hard faults, and other
    performance problems happen!  More on this later…

    I’ll start discussing the actual APIs in the next
entry — those whose curiosity has been piqued can jump into the MSDN
documentation
, which is not very good IMO but better than
nothing.

Comments (10)

  1. Simon says:

    Hi!

    What about managed code (C#, …) support? What are the plans?

    Thanks

    Simon.

  2. ryanmy says:

    At this time, to my knowledge, there are no shipping managed libraries for ETW. Most of the advice I’ve seen for C# and VB.NET regarding ETW is to define a set of equivalents for the C structures, and P-Invoke the unmanaged Win32 APIs.

    However, there could be something in the pipeline. I will verify with the ETW team. 🙂

  3. Universalis says:

    The trouble with ETW (beautiful though it is) is that it only comes with Win2K+ and WinXP (and later versions).

    In a corporate environment the proportion of Win2K/WinXP users is high (often 100%) but in the wider world the proportion is around 10%, with most of the remaining 90% using just "Windows" (though some of the more sophisticated users will be able to tell you that it’s "Microsoft Windows").

    So application vendors either have to say to potential users "go away and don’t come back until you’ve discovered what version of Windows you’re using" (most potential customers will only understand the first two words) or they have to write their applications so that they run on all versions, not just the modern members of the NT series. In which case they have to duplicate the EWT functionality anyway. In which case, why maintain two sets of functions that essentially do the same thing?

    This is the application software vendor’s equivalent of you keeping a P3 in the testing labs!

  4. ryanmy says:

    Yep — if you’ve committed to supporting 9x, you can’t use ETW without supporting either two codebases or two EXEs with conditional compilation. The same is true of overlapped I/O, half of the Unicode APIs out there, any non-trivial GDI app, or multithreaded WinMM APIs.

    That’s an unfortunate reality of application development, and one that I’ve been bit by as a pre-MSFT developer. I’m a very big fan of backwards compatibility and supporting old APIs, but I don’t think that this always extends to making new innovations available for the old OSes. MSLU is mostly user-space and was doable; however, ETW is a kernel service that would be very difficult to port back to a non-NT architecture, so it probably won’t happen.

    The main gain of ETW comes from OS cooperation — namely, that the OS itself can provide information about what it’s doing, and automatically combine that with events from multiple providers across multiple processes into a single, time-ordered, high-performance log with extremely accurate timestamps. More importantly, all the I/O is done out of process, so ETW’s impact upon your application’s performance when enabled is minimal, contrasted with fprintf().

    I think this is good enough reason to at least use ETW on NT. One can either conditionally compile it, or use GetProcAddress() to detect its presence and, if so, use ETW with a private process logger instead of one’s homebrew facility. At least then, the logging is less intrusive on NT kernels, and functional but more intrusive on older OSes.

  5. Visual Studio Team System

    Bill Sheldon from InterKnowlogy has an item in the June 3rd edition of…