Windows Error Reporting (WER) for developers

Windows Error Reporting is the replacement for Dr. Watson on OSs newer that Windows XP. It monitors failures and collects useful information that can be sent to a server to be analyzed (if the user allows it). This feature helped Microsoft to fix a lot of bugs – due to the reports received, Microsoft was able to prioritize bugs (based on the biggest number of hits or other severity measures) and to solve a lot of problems. What is really cool is that any developer can request the reports for his/her product. FromBill Gates at PDC 2003 : “..whenever an application or the system malfunctions, you get the ability to send a report back to Microsoft. We get a lot of those reports, and we've created very good data-management systems to go in and look at those things, and therefore understand what drivers aren't reliable. We allow anyone who has an application that runs on Windows to sign up and get the reports that relate to their application, and we've got winqual.microsoft.com where people can do that.

Here’s how WER works: when a process crashes, WER collects the data and sends it to a server (if the user allows it; by default WER asks for consent). For non-Microsoft programs, this server is Winqual (Windows Quality Online Services). On the server side: Based on the parameters of the crash, a bucket is created to hold the new error, or the report is added to an already existing bucket (which means someone met this issue before). A developer can analyze the failure; if a solution can’t be found based on the collected data, he/she can request additional information (a dump, registry key values etc). Also, he/she can add a message explaining the failure. When an instance of the same type of crash happens, the server will display the message the developer set and will ask for the additional information if necessary. If the developer finds a solution, at the next crash the server can provide solutions (see diagram below). Read more about error reports collection and classification.

WER Flow 

On Vista and above, the user with the crashed program can go to Control Panel -> Problem Reports and Solutions (start wercon.exe) to see what failures happened on the machine.

Problem Reports and Solutions

Here, you can check for solutionsand see identified problems.

Problem Reports and solutions - View problems

You can look at the parameters for each issue and see the bucket in which the report was categorized.

Problem Reports and Solutions - Report Parameters

As a developer, you can go to Winqual and register to receive reports for failures. But there are other ways you can take advantage of WER. You can configure WER to send the reports to one of your servers, so you can take a look at them directly. For example, imagine you have a couple of machines you want to monitor. You can use System Center Operations Manager (SCOM) with Agenteless Error Monitoring (AEM) to transfer all reports to a monitoring server instead of sending them to Winqual. You don’t even have to use Active Directory integration and group policies, but you can manually configure the WER registry keys to specify the CorporateWERServer (with the port and the security options you prefer).

Another way to take advantage of WER is to save the reports locally. For the rest of the article, I will assume that WER is enables (default setting).

Like I said, WER tries to collect as little data as possible and asks for more only if necessary. But this behavior can be configured. If you need a dump collected at every crash, you can set HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting\ForceQueue to 1 (or the HKLM\Software\Microsoft\Windows\Windows Error Reporting\ForceQueue=1, to apply the settings globally).This will force a dump to be generated and included in the report. The reports are usually saved at %localAppData%\Microsoft\Windows\WER, in 2 directories: ReportArchive, when a server is available or ReportQueue, when the server is not available. From here, the data is transferred to the server. Another way to look at the contents of the generated report, is to use wercon.exe (as explained above). If you want to keep the data locally, just set the server to a non-existing machine (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\CorporateWERServer=NonExistingServer).

Now what if you want to generate reports on different conditions that a crash? Maybe you are monitoring the event log or some performance counters like CPU or memory and you want to see what happens when the conditions are violated. Or you want to generate more data than just a dump (for example, your applications has a trace file you want to save, or you want to copy event log entries etc)? Well, you’re in luck, because WER has API you can use. With this API, whenever your desired conditions are met, you can:

- Create a report with WerReportCreate

- Take a dump with WerReportAddDump

- Add other files of interest with WerReportAddFile

- Set up to 10 parameters that can be used to categorize the failure (the faulting program, the faulting function, stack trace, whatever you consider useful to investigate the issue)

- Close and submit the report with WerReportSubmit

One thing you must keep in mind (and this was not obvious to me at all, I got burned actually) is that WerReportAddDump will suspend the threads one by one when taking a dump. That can’t ensure a consistent view of the memory - a thread can be suspended and then while the next one is suspended can change memory or do other damage; this is especially troublesome if your application has a lot of threads. It’s your responsibility to suspend all threads if you need a consistent view. Also, it is a good idea to call this function out of process.

Ok, so now you know how to use WER to always generate dumps on failure, send the reports to a server you specify or queue them locally and to build custom reports. Hope you'll find this info useful to debug and monitor issues related to your apps.