Why is CreateToolhelp32Snapshot returning incorrect parent process IDs all of a sudden?


A customer reported a problem with the Create­Toolhelp32­Snapshot function.

From a 32-bit process, the code uses Create­Toolhelp32­Snapshot and Process32­First/Process32­Next to identify parent processes on a 64-bit version of Windows. Sporadically, we find that the th32Parent­Process­ID is invalid on Windows Server 2008. This code works fine on Windows Server 2003. Here's the relevant fragment:

std::vector<int> getAllChildProcesses(int pidParent)
{
 std::vector<int> children;

 HANDLE snapshot = CreateToolhelp32Snapshot(
    TH32CS_SNAPPROCESS, 0);
 if (snapshot != INVALID_HANDLE_VALUE) {
  PROCESSENTRY32 entry;
  entry.dwSize = sizeof(entry); // weird that this is necessary
  if (Process32First(snapshot, &entry)) {
   do {
    if (entry.th32ParentProcessID == pidParent) {
     children.push_back(processEntry.th32ProcessID);
    } while (Process32Next(snapshot, &entry));
  }
  CloseHandle(snapshot);
 }
 return children;
}

(The customer snuck another pseudo-question in a comment. Here's why it is necessary.)

One of my colleagues asked what exactly was "invalid" about the process IDs. (This is like the StackOverflow problem where somebody posts some code and says simply "It doesn't work".)

My colleague also pointed out that the thParent­Process­ID is simply a snapshot of the parent process ID at the time the child process was created. Since process IDs can be recycled, once the parent process exits, the process ID is left orphaned, and it may get reassigned to another unrelated process. For example, consider this sequence of events:

  • Process A creates Process B.
  • Process A terminates, thereby releasing its ID for reuse.
  • Process C is created.
  • Process C reuses Process A's process ID.

At this point, Process B will have a th32Parent­Process­ID equal to Process A, but since the ID for Process A has been reused for Process C, it will also be equal to Process C, even though there is no meaningful relationship between processes B and C.

If Process B needs to rely on its parent process ID remaining assigned to that process (and not getting reassigned), it needs to maintain an open handle to the parent process. (To avoid race conditions, this should be provided by the parent itself.) An open process handle prevents the process object from being destroyed, and in turn that keeps the process ID valid.

There is another trick of checking the reported parent process's creation time and seeing if it is more recent than the child process's creation time. If so, then you are a victim of process ID reuse, and the true parent process is long gone. (This trick has its own issues. For example, you may not have sufficient access to obtain the parent process's creation time.)

After a few days, the customer liaison returned with information from the customer. It looks like all of the guidance and explanation provided by my colleague either never made it to the customer, or the customer simply ignored it.

The customer wants to detect what child processes are spawned by a particular application, let's call it P. We built a special version with extra logging, and it shows that the PROCESS­ENTRY32.th32Parent­Process­ID for wininit.exe and csrss.exe were both 0x15C, which is P's process ID. This erroneous reporting occurs while P is still running and continues after P exits. Do you think it's possible that process 0x15C was used by some other process earlier?

Yes, that possible. That is, in fact, what my colleague was trying to explain.

It isn't clear why the customer is trying to track down all child processes of process P, but the way to do this is to create a job object and put process P in it. You can then call Query­Information­Job­Object with Job­Object­Basic­Process­Id­List to get the list of child processes.

Comments (28)
  1. laonianren says:

    You can get the parent process creation time from HKEY_PERFORMANCE_DATA.  This is accessible by everybody.  Or at least it used to be by default; I haven't checked on recent versions of Windows.

  2. Torrin says:

    Yep, sometimes folks send a question and then continue to look and come up with the issue on their own.  By the way, I think there is a closing parenthesis missing on the inner if of the do/while statement.

  3. MacIn173 says:

    One issue with job objects is once you assign process to a job, you can't revert this assignment. And you can't reassign process to another job. Once I have had similar issue: Process A (service) was creating processes B1..Bn. For each, a job object was created to keep track of child processes for B1..Bn so once A needs to close Bn, it would be able to close it with descendants. But here's the trick: once A exits (there's need to restart service), there's no way to get those B1..Bn back. Problem is once A is closed, job objects (let's call them JobB1..JobBn) are closed as well (no longer listed in Object Manager). But processes B1..Bn still marked as processes inside of job objects JobB1..JobBn. While JobB1..JobBn does no longer exist. And now A can't create new job objects and put B1..Bn in those, since these processes are marked as being in job already.

    It looks like once Job object is closed during process shut down, processes inside of a job doesn't get their "BelongsToJobXYZ" fields cleared. There's some logic obvious: while A from my example would be restarted, Bn can spawn child processes that wouldn't get into a job. But now these B1..Bn are stuck in the middle of nowhere, we can't access these JobB1..JobBn since these no longer exist anyway. You can get a handle to Bn, but no way to put it into a job again. Luckily, there's workaround for this (thnx to TechNet forums).

  4. James says:

    > but the way to do this is to create a job object and put process P in it. You can then call Query­Information­Job­Object with Job­Object­Basic­Process­Id­List to get the list of child processes.

    Except that until Windows 8, job objects cannot be nested.  Maybe the child process you want to monitor uses jobs itself.  Or maybe it currently doesn't, but since you want to be a responsible programmer who wants to be reasonably future-proof, you have to assume that someday a future version of that child program might use jobs.  Or maybe the child program could be user-specified.  Or maybe you control that child program and can guarantee that it won't use jobs, but in that case, you don't need jobs anyway since you could modify the child program to notify the parent via other mechanisms.  For software intended to run on versions of Windows prior to Windows 8, I just don't think jobs are very useful for this purpose.

  5. size matters says:

    Concerning the dwSize fields: Would there be any downside to just initializing them in a constructor in the windows headers:

    struct PROCESSENTRY32 {

     DWORD dwSize;

     [...]

     PROCESSENTRY32() : dwSize(sizeof PROCESSENTRY32) {}

    };

    Might need some #ifdefs for C++ around. But I don't see how it would be any worse than not initializing?

  6. Myria says:

    Another issue with parent process IDs: the parent process ID is merely the process ID whose attributes the new process inherits.  It is not necessarily the process ID that *created* the new process.  A process can create a new process with a different "parent" than itself, assuming sufficient authority.

    This functionality has existed since at least NT 4.0, though I suspect actually the beginning of NT (3.1).  However, it was only accessible through undocumented APIs until Windows Vista exposed it in its Win32 API UpdateProcThreadAttribute function (PROC_THREAD_ATTRIBUTE_PARENT_PROCESS).

  7. Joshua says:

    The parent process ID got freed and reused, but the child still had the parent process id ?!?

    That behavior is not only wrong but stupid wrong. When the process dies, reparent its children to its parent.

    [Windows does not maintain a process tree. Indeed, the Windows kernel doesn't care about the "parent" at all. (Prior to Windows NT 4.0, it didn't even bother keeping track. The only reason it keeps track is so it can return it when you call Create­Toolhelp32­Snapshot!) The "parent" has no special status aside from being the source of inherited handles and other collateral. You are trying to impose the Unix process model onto Windows. -Raymond]
  8. Mike Caron says:

    @Joshua: I suspect you have not fully thought out the implications of what you are suggesting.

  9. John Ludlow says:

    The only time I've needed to hunt down all children of a process was to kill them. The application was a build server, and when a build needed to be aborted, the build's root process and all its children should be killed.

    Of course, in that case PID reuse wouldn't be an issue, since the actual parent process is still alive. You just have to be careful to kill the parent last.

  10. boogaloo says:

    @John Ludlow No, reuse is still an issue in your example. Even though your process isn't dead, it may have the id of a process that previously started some child processes before exiting. Therefore you are essentially killing random processes.

    "Of course, in that case PID reuse wouldn't be an issue, since the actual parent process is still alive. You just have to be careful to kill the parent last"

  11. Joshua says:

    Might as well document it as "does not work" then. The use caveat cannot be reliably overcome.

  12. Cesar says:

    @AndyCadley: Windows has a POSIX subsystem. Working "like Unix" is more or less what the POSIX standard is for.

  13. mikeb says:

    @Cesar:

    I thought the POSIX subsystem stopped being a supported feature a while back.  And even if Windows does have a POSIX subsystem, wouldn't you need to be running under that subsystem in order to assume that you'd get a POSIX process model?

  14. Killer{R} says:

    /*but the child still had the parent process id ?!?*/

    Child doesn't have parent process id, the only thing child has is its own process id.

    /*Might as well document it as "does not work" then*/

    But sometimes it works. Furthermore there're way to check if you've got really parent id process (just compare both processes creation times) or even proactively ensure that this will work (I will not disclose details, its a bad practice ;) )

  15. AndyCadley says:

    Joshua: Except it always works. It doesn't work *like Unix*, but Windows isn't Unix nor is it supposed to be.

  16. silent murder says:

    @size matters. That would only be safe if you could guarantee your source file would be processed by c++ compilers. But since the langs have so much overlap there's no way the standard Windows headers could enforce it. So your source might as well use a non standard pragma to warn of the need to init the size if no __cplusplus or put the init in a conditional which is blech.

  17. boogaloo says:

    @Joshua The documentation is pretty clear that it "does not work".

    msdn.microsoft.com/.../ms682489(v=vs.85).aspx

    When taking snapshots that include heaps and modules for a process other than the current process, the CreateToolhelp32Snapshot function can fail or return incorrect information for a variety of reasons

  18. Random832 says:

    What purpose, exactly, does the parent process ID field supposedly fulfill, even internally? All of the reasons you shouldn't rely on it seem to me like they should also be reasons the kernel shouldn't rely on it.

    [It is not used internally. The only reason it's there is so that it can be reported by CreateToolhelp32Snapshot, and CreateToolhelp32Snapshot reports it only for compatibility with the 16-bit TOOLHELP.TaskFirst function. -Raymond]
  19. Random832 says:

    Which also makes me wonder what @Mike Caron meant by "the implications of what you are suggesting." - surely assigning it to the ID of some living process that has _some_ relationship to the process, can't possibly be _worse_ than having it end up being some unrelated process that happened to pick up a used process ID. If we're accepting that "process whose ID is the parent process ID of this process" has absolutely no meaning at all, then reassigning an arbitrary value to the parent process ID shouldn't do any harm. I can't even begin to imagine that there are _any_ "implications" that aren't made even worse by the current behavior.

    [If you reassign it, then you can't tell whether the original parent has exited. You end up attributing things to the wrong process. Instead of "Inherited from a process that no longer exists", you say "Inherited from XYZ" where XYZ is not the process the value was inherited from. -Raymond]
  20. skSdnW says:

    The console/terminal does care about the process hierarchy for its Ctrl+C handling and process groups but it is probably not using the parent process id's. Maybe it has its own internal information in CSRSS?

  21. alegr1 says:

    @MakIn173:

    To keep Job Objects alive, duplicate the job handle into its first process.

  22. Joshua says:

    @skSdnW: It seems to be using a stack of who opened console input for reading.

  23. Joshua says:

    [If you reassign it, then you can't tell whether the original parent has exited. You end up attributing things to the wrong process. Instead of "Inherited from a process that no longer exists", you say "Inherited from XYZ" where XYZ is not the process the value was inherited from. -Raymond]

    Then either zero it out or make it possible to get the start time for all processes as any user. If you come back and say the latter is a reasonable security restriction while not addressing the former at all (getting a garbage value) then you demonstrate the lines of though that end up with stuff requiring admin on Windows where it would not on other operating systems.

    [Basically, the whole Toolhelp family of functions is for 16-bit compatibility only. Don't expect anything great to come of it. -Raymond]
  24. 640k says:

    Then why is Toolhelp still supported on 64-bit windows, which cannot execute 16-bit apps?

    [See the story of why we still have registry keys that exist for backward compatibility with four programs written in 1993. -Raymond]
  25. Alan says:

    I think you got yourself some mismatched { } in your code example there...

  26. Dave Harris says:

    @size_matters: Concerning the dwSize fields: Would there be any downside to just initializing them in a constructor in the windows headers...

    Just having the one constructor would get in the way of aggregate initialisation.

       PROCESSENTRY32 entry = { sizeof PROCESSENTRY32 };

    In this case you could add a 1-argument constructor, but in general you don't know how many members the caller wants to initialise this way. Also, you have lost the zero-initialisation that this idiom does.

  27. Yuhong Bao says:

    [Basically, the whole Toolhelp family of functions is for 16-bit compatibility only. Don't expect anything great to come of it. -Raymond]

    *Win9x compatibility.

  28. ParentProcessID can be useful for diagnostics - Sysinternals Process Explorer and Process Monitor both rely on it.  And because of that, UAC elevation actually changes the PPID of newly-elevated processes (a LocalSystem service actually launches the elevated process, but then changes the PPID to that of the original requesting process).  Just be aware of its limitations and whether the PPID matches the current process that has that PID (e.g., compare the start times of the two processes).  Like I said, it's *helpful* with diagnostics but it's not authoritative, and it doesn't imply that Windows has or should have a *nix-like process model.  If you need to track or manage groups of processes, use job objects.

Comments are closed.

Skip to main content