Collecting the boot-time events over the network.

Today I want to talk about what I actually do at work. I work on the service called "Setup and Boot Event Collector".  It has been included in the previous Server Technical Previews but disclosed only to the partners. Now it has been officially announced at the Ignite Conference and will be generally available in the next preview. There will be an official blog and some official blog posts about it but for now I want to tell about it in a more unofficial way (and by the way, you're welcome to ask the questions here).

Have you seen how when Linux boots, it prints all these interesting bits of information on the console? And on the headless machines you can get it to use the serial port as the console and get all it remotely. Have you ever wished that you could get the same from Windows, so that when things fail, you know what exactly went wrong? Now this wish has been answered. Even better, you can get this information directly over the network.

How it works: The information about the Windows start-up is sent in the form of the ETW events. And incidentally, you can get them even on the previous versions of the Windows if you happen to connect the kernel debugger and to configure the installed image just right. The boot event collector does kind of the same but in a more convenient and secure way: no debugger, one collector can get and record the data from many machines, and the provided PowerShell cmdlets help with configuring the image just right. And then you can read the collected events with the Message Analyzer or any other tools and find what is going on during the boot or setup (or really at any other time if you want to configure the image in a more custom way).

Now a short intro of how to get it working. You can get it installed in TP2 as well, though the PowerShell commands have changed a bit (and will change some more before the final release). We have a manual that has been shared with the partners in the previous previews but I'm not sure yet, how will it be generally distributed.

To install the collector you enable the optional feature "Setup And Boot Event Collection". It can be done through the Server Manager/Control Panel or from the command line through dism (or through the PowerShell commands):

dism /online /enable-feature /featurename:SetupAndBootEventCollection

That puts the binaries, the configuration files and the PowerShell scripts onto the machine (it can be a physical machine or a VM). The service gets started but its initial configuration is empty, so it does nothing. The PowerShell commands include "Sbec" in their names, so you can get the list of them with

PS> help *Sbec*

By the way, the path for the release of the proper help for PowerShell commandlets is a bit of a mystery to me. So far if I go to the URL for it, it says that this document hasn't been released yet. But there is a trick: if you look in c:\Windows\System32\WindowsPowerShell\v1.0\Modules\BootEventCollector\BootEventCollector.psm1, you can find the descriptions of the functions right in them.

The data files for the collector live in c:\ProgramData\Microsoft\BootEventCollector. The subdirectory Config contains the configuration files, Etl is intended for the saved event logs, and Logs for the logs of the collector. The normal logging is done by the collector through ETW as well, you can see it in the Event Viewer under Applications and Services Logs -> Microsoft -> Windows -> BootEvent-Collector. But you can also switch it to a file if you want, and there are the additional status log files.

In the Configuration directory you can find: Active.xml - the currently active configuration (it's best not to mess with it directly but use the PowerShell commands to change the configuration, then the collector will keep the configuration change history for you), Empty.xml - the empty configuration, in case if you want to return back to it, and Example.xml - essentially the description of all the possible configuration settings as comments in a configuration file, along with examples.

I won't go here into the details of the configuration, it's a separate subject that we can look at later. For now, suppose, you've configured the collector.

Then you go and configure the target machines (they're really the sources of events but we've kept the terminology consistent with the debugger). The event collection uses a small separate network stack borrowed from the kernel debugger (KD-NET), so it starts working very early in the Windows boot process, way before the normal networking starts. It means some inherited caveats though. The list of drivers supported by that small networking stack is shorter than for the normal drivers (but the typical popular NICs are covered). And the stack adds overhead on the NIC is uses, compared to the normal driver.

Just like KD-NET, the targets for event collection get configured with the address and port of the collector, the secret key for communications, and the information about which events to send. The PowerShell commands that came with the collector feature help with this configuration. They can be used to configure the WIM and VHD images, or to run the configuration on the target machines through the PowerShell remoting, and finally you can copy the scripts to a target machine and run them there locally, or to configure the network-booted and network setup through ADK/WDS.

The transport part gets configured with the command Set-SbecBcd, the selection of the events with Set-SbecAutologger (it provides a reasonable default set of events from the kernel, system logs and setup). The events from the kernel and system services normally stop the forwarding through the network after the system boot comes to the point when the event logging service is started, meaning that now the events can be collected locally. The setup events keep the forwarding. But all this is configurable, and can be changed if you have different preferences.

After the target has been configured, it needs to be rebooted (or booted for the first time from a VHD image). And then the events will come.

The targets can be physical machines or VMs.  But with the VMs both the system inside the VM and the Hyper-V host must run Win10. Win8 Hyper-V hosts actually can support the KD-NET protocol stack in the VMs but they do it in a way that's rather painful and difficult to configure, so they're not officially supported for the event collection. We'll see if they every will, but going to a Win10 host makes things much easier.

In the next installment, I plan to show an example of a simple diagnostics.