After reporting a non-responsive program to Windows Error Reporting, why does the process spawn a suspended child process?

A customer observed that when they try to close a program as not responding, Windows Error Reporting kicks in, which is not unexpected. But what is unexpected is that a new process is created that is a child of the original process (as reported by Process Explorer), and the child is suspended. "Why does werfault.exe create this child process?"

This suspended child process is a snapshot of the original. Windows Error Reporting creates this snapshot and uses the snapshot to generate the error report. The original process is allowed to continue executing so that it can exit (and possibly restart) normally.

The snapshot process does not have any running threads, but it has a copy of the original process's virtual memory, handles, thread IDs, stacks, and other information necessary to create an error report. Generating an error report take time, and Windows Error Reporting uses a snapshot so that the original process can get on with exiting.

Bonus chatter: This new behavior means that you don't have to wait for Windows Error Reporting to do its thing before it restarts the application. The "process seeing its own dead body" problem is mitigated by making sure that the snapshot doesn't own any resources. When programs look for already-executing copies of themselves, it's usually done by looking for windows or named kernel objects. Sometimes it's done by recording the process ID of the first instance somewhere, and having the second copy look it up. But the snapshot process owns no windows or kernel objects, and its process ID is not the one that got recorded. so it is comparatively unlikely to be mistaken for the real thing.

Comments (16)
  1. Stuart says:

    One scenario where this behavior can be unhelpful (I think, if I’m understanding it right) is if the system you’re running on is memory-constrained, especially if the application that crashed was a memory hog (or crashed because of a memory leak). Trying to start up a new copy while you’re keeping the memory contents of the old process around – and not only that but actively trying to scavenge through them – seems like it’d create a swapping nightmare in that situation, no?

    (Not coincidentally, I’m running on a crummy old laptop that’s in this situation and it did seem to end up in swap hell any time WER activated – until I disabled WER anyway)

    1. skSdnW says:

      The Windows Internals book says the new address space is a copy-on-write clone of the original. This feature is called Process Reflection.

        1. JAS says:

          It’s been a long time since Mark said anything interesting at all. You almost wonder if Sysinternals were bought up and he was put on Azure just to shut him up and dumb down the community which is already beginning to starve for reference material on the Windows OS.

    2. Doug says:

      You are correct that WER does use system resources. The resources used depend on how much data WER decides to keep from the dead process. WER tries hard to minimize impact, but on a resource-constrained system or when handling a crash from a memory-hog, you very well might notice WER causing a slowdown. You’ll have to make the decision: WER’s actions now will hopefully help get this crash fixed in the future, and submitting a report is a vote to get the issue fixed. Is that worth the slowdown I am experiencing?

      (And yes, WER reports are definitely used to track down issues. Note that there are some issues that only occur on slower systems or systems with less memory, and if you don’t report them, they’ll likely never get fixed.)

      1. Tilmann Krueger says:

        Oh my god, yes, the WER memory dumps are used!
        We ask for the crash dumps every time a user reports a crash! This stuff is a gold mine (from a developer’s perspective)!

        I fixed countless bugs and attributed countless other crahses to graphics drivers, virus scanners and whatnot!

  2. Gee Law says:

    If the program checks for processes with the same executable path, it will be very, very upset. However, this already fails some cases, like, when someone starts your process suspended and dies before resuming it (happens some time), in which case you’re stuck until someone helpfully terminates the suspended process. It seems the number of programs that do this is negligible, as Windows has made the change.

  3. IanBoyd says:

    You get the crash. You realize what it is. You fix the bug, and try to recompile the application…

    File in use.

    “No it’s not, the program is gone!”

    But it’s zombie corpse clone is still there…

    1. gdalsnes says:

      ” snapshot process owns no windows or kernel objects, ”
      So it should not hold files open

      1. alegr1 says:

        The snapshot needs to keep the memory-mapping sections alive, including the executables backed by the file objects. The real issue here is that WER should not try to do postmortem on a process which runs under debugger.

        1. Someone says:

          I don’t think so: When any located memory page is “cloned” by having a copy-on-write reference from the page table of the new process (as I read here), there is no need to have memory-mapping sections *as such* in the cloned process. Whatever the original process is doing afterwards with any page, the clone will keep the original.

  4. Yuhong Bao says:

    Is this similar to how fork() is implemented in Interix?

    1. skSdnW says:

      Yes, Mark talks a bit about that in the video I linked to.

  5. gdalsnes says:

    “a snapshot of the original” link is broken for me.
    what does it mean that the clone has a copy of the handles? Does it increase the reference on the kernel object the handle is referring to? If no, it does not really copy the handles, it copies raw data in memory (because the kernel object may be gone any moment). If yes, won’t this mess things up completely? (files opened with shareNone access etc.)

    1. ZLB says:

      It shouldn’t matter. The ‘ghost’ process doesn’t have any threads and won’t run any code.

      Once you have the crash-dump on another computer to perform a post-mortum, the handles are meaningless anyway.

  6. poizan42 says:

    The first link should be (but the links inside the article are also broken right now…)

Comments are closed.

Skip to main content