The case of the redirected standard handles that won’t close even though the child process has exited (and a smidge of Microspeak: reduction)

A customer had a supervisor process whose job is to launch two threads. Each thread in turn launches a child process, let's call them A and B, each with redirected standard handles. They spins up separate threads to read from the child processes' stdout in order to avoid deadlocks. What they've found is that even though child process A has exited, the threads responsible for monitoring the output of child process A will get stuck in the Read­File until child process B also exits.

The customer further reported that if they added a brief Sleep call between creating the thread that launches child process A and creating the thread that launches child process B, then the problem goes away.

The customer attached a small sample program which demonstrated the same issue and asked for advice on how to diagnose and fix the problem.

First of all, it was great of the customer to include a small sample program that demonstrates the problem. This is an important step in troubleshooting, which goes by the Microspeak term reduction.

re·duc·tion. n. The process of simplifying a bug to the smallest scenario that still reproduces it.

For source code, this usually takes the form of a small sample program. For Web pages, this means removing irrelevant styles, script, and HTML. In both cases, the reduction can be substantial. (You'd be surprised how big Web pages are nowadays.)

Reduction is so important that our defect tracking database has a special field: Reduced by.

I took a look at the sample program and didn't see anything obviously wrong with it. One of my colleagues, however, was able to use his psychic powers to determine the problem without even reading the code!

I'll bet $10 that you're launching processes in parallel, specifying TRUE for bInherit­Handles but not using a PROC_THREAD_ATTRIBUTE_HANDLE_LIST. You create your pipe handles inheritable and give them to your children. The problem is that if thread 1 is in the middle of setting up these inheritable handles for child process A, and thread 2 calls Create­Process for child process B, then child process B will accidentally inherit the handles intended for child process A. As a result, child process B unwittingly holds open the pipe handles you gave to child process A. Reads from a pipe will not return EOF until all writers have closed the handle, so the visible effect is that the monitoring thread for child process A will not complete its read until child process B also exits.

Another possibility is that the child processes are launching their own child processes which are inadvertently inheriting the pipe handles.

(Turns out the first guess was right on the money.)

The solution is to use the technique we discussed a few years ago: Use the PROC_THREAD_ATTRIBUTE_HANDLE_LIST to control explicitly which handles are inherited by specific child processes.

If the client application must run on versions of Windows prior to Windows Vista, then they can use the workaround described in the linked article: Manually serialize the calls which set up and then launch the child processes, so that handle inheritance management for a child process don't start until the previous one has completed.

Comments (44)
  1. Joshua says:

    What a LOVELY mess. The same thing can happen in the UNIX world. Not spawning processes from child threads seems to be the preferred solution. I've seen radical solutions such as closing all handles >2 in the child process.

    Incidentally, I'll bet System.IO.Process.Start is vulnerable to this, and the .NET docs give absolutely no clue. At least my code will eventually terminate in that case.

  2. xor88 says:

    Handle inheritance is as dirty as fork is. In my opinion a historic accident. If resources are to be shared/passed that should be explicit.

  3. GregM says:

    "If resources are to be shared/passed that should be explicit."

    It was at least partially explicit, in that you had to both mark the handles as inheritable and pass the flag that says to inherit handles.  It is now, or can be, completely explicit, as described in the post.

  4. Deduplicator says:

    Neither handle inheritance nor fork are dirty tricks. But there's a huge impedance mismatch between the common use case and what they are actually good for.

    That is the reason we have explicit handle inheritance and vfork.

  5. Henri Hein says:

    "You'd be surprised how big Web pages are nowadays."

    No, I would not, but I agree that people whose job description does not involve taking them apart in order to troubleshoot how a product messed up something in one of them would.  My jaw has seriously been in danger of dislocating while looking through sources of even some mainstream sites (domains elided to protect the innocent).

  6. xor88 says:

    Reading the vfork manpage I am disgusted by what hacks are in place in Linux. Starting a process seems to be a fork+exec call. Fork cloning the process, exec ripping everything out. I am at a complete loss why that would be a good model.

    There might be historic reasons for that, but today this is just so awful. Unbearable hacks.

    What about a simple CreateProcess call. Seems good to me.

  7. Joshua says:

    @xor88: vfork+exec implements CreateProcess easily. See spawn* family of calls. When the model changes again (like say the explicit handle inheritance), it is possible to fix as a library in userspace.

  8. dave says:

    Well, yeah: vfork was a hack to cover up an inability to implement fork sufficiently efficiently (BSD, I think); it is not surprising that vfork is ugly.   I'm no expert, but I think fork can be reasonably efficient with copy-on-write pages in your VM subsystem.

  9. Joker_vD says:

    @xor88: Well, at least with fork+exec you can reasily detect the fact that the program failed to start because some shared libraries were missing (exec will return ENOENT). With CreateProcess? If you pass it the right flags, the error dialog "DLL not found" wasn't show up, and all you can do is to do Sleep/GetExitStatus.

  10. xor88 says:

    I'm objecting to the notion that we want child processed to be filled with uncontrolled junk. This is not a basis for a dependable system. State separation is a fundamental tool that we use to make systems reliable.

    You have no idea what fork will copy into the child because you do not know what the parent contains. In the presence of libraries and user code (e.g. Apache websites or shell extensions) you have no idea what resources are open, whether they can be used in the context of another process or whether they will leak by being memcpy'd into the child. The child will now keep all memory pages and handle-referenced objects live. It will interfere with the same handles that the parent uses.

    It is so bad that I'm feeling the need to write down all these filthy thoughts so I can forget.

  11. Hi. Normally, I do not stop by just to say "your post was cool" but since I am failing to resist making this comment, I'd like to add something to it: This post is cool and what makes it coolest post today is the fact that it is the only blog post I read today that does not have "Windows 8.1" in its title.

  12. 640k says:

    Reduction is time consuming, and that's exactly the reason for PAYING for support. If a ISV need to do this himself/herself, there's not very much use of expensive msdn support.

  13. Bdell says:

    So… what do you do if it was the other possibility?

  14. Roger says:

    @xor88 CreateProcess takes 10 parameters with other variants taking more.  There is a design decision here.  A call that takes every parameter that could possibly be useful (plus -Ex versions in the future), or a call that does virtually nothing to setup the child, and in that child you then set whatever parameters matter before executing the intended binary.  The former approach isn't future proof, but is a single call.  The latter is very future proof but requires a little more cognitive overhead to code.  It does however use the same system calls – eg setting cwd.  The standard C library handles the simple cases anyway (exec*, spawn*).

    Ultimately Windows and Unix took the approach that best matched their process semantics.  Ultimately I think the Unix approach is better and simpler, but realistically Windows would never be able to do things that way.

  15. Roger says:

    @xor88: the child isn't filled with uncontrolled junk (unless your code contains uncontrolled junk).  Importantly you can decide what to keep and what to close.  You can change user ids and make security checks.  You can set resource limits.  The amount of things you can do vastly outnumbers the 10 parameters of CreateProcess.

    It is true that in the olden days file descriptors could run rampant.  However for many years you can open or set a close on exec flag for them so they are automatically not inherited by executed binaries.

    The case of fork without exec is far less interesting since it is the same binary running.

    [News flash: Your code contains uncontrolled junk. Maybe you use a library that uses a pipe to communicate with another process. You then fork and oops, you didn't close the pipe in the child process. Now the pipe handle is stuck open. Or maybe the library opens a database and then you fork and oops, you didn't dbm_close in the child process. Now the database is stuck open. -Raymond]
  16. Neil says:

    Old versions of the Unix C shell got very confused if you invoked them without all of their standard handles.

  17. Deduplicator says:

    Child configuring itself: So the child already knows all there is to know about what it should do? Is it psychic, or do you recompile it every time before executing?

    Child kows whats superfluous: Dangerous thought, and makes optional additional resources quite a dangerous and unreliable idea. Also, you are sure the child is 100% reliable and trustworthy, if you do not trust the parent to get its thing right even if you designed and built it yourself?

    CreateProcess is the pinnacle of achievement: Why is there CreateProcessAsUser, CreateProcessWithLogon, CreateProcessWithToken, and such an elaborate scheme for extending some of their parameters? All of this can be done with fork/exec.

    Everything added is garbage: Nice that you chuck everything you cannot imagine on the garbage heap. Double points if we take the ""newer"" replacement for fork in Linux, clone, which only has about 10 years.

    old Unix C shell: Well, that's a bit suboptimal, though happens even in windows. GIGO rules still.

  18. xor88 says:

    @Roger any configuration you can make with the fork model you can make with the CreateProcess model as well. Just have the child configure itself. Only the child knows what preconditions it needs to have established to run correctly. If you want to spawn a calc.exe, the parent cannot decide which handles must be present and so on. Only the owner of calc.exe can do that reliably.

    Some thing can be reliably configured by the parent such as priority and security. That's possible with CreateProcess as well. I do not spot a single thing that you cannot do with CreateProcess that you can with fork and *should* be doing with fork. Fork only supports more invalid use cases.

  19. Deduplicator says:

    "reduction": Isn't that a normal meaning for informatic? Strength reduction, solving by reduction to a well-understood problem, reducing a problem to its essence?

    [Yes, it's a perfectly normal name. Just remember to use that name instead of a synonym like "simplification" or "minimization". -Raymond]
  20. Kevin says:

    @Raymond: In the Unix world, if a library has a pipe it doesn't want duplicated, it should create it with O_CLOEXEC, in which case nobody has to explicitly close it.  It's hardly libc's fault you're using a poorly-written library that doesn't do that.

    [I suspect there are a lot of poorly-written libraries. Also, what if you are the library? How do you ensure that the app that is using your library is correctly managing CLOEXEC? -Raymond]
  21. Crescens2k says:


    The problem is, badly written libraries are a fact of life. This also shows that there is a lot of uncontrolled junk in an application. Also, from what I read, Raymond wasn't bashing libc, he was mentioning that when you write a non trivial application that relies on components you don't control, then you are open to this kind of problem.

    Another thing to remember, from things that I have read, O_CLOEXEC was added somewhere around 2.6.23, that is around the 2008 era. You are also assuming that people would know about this flag and are using it. From some of the communities that I hang around in, you see the whole uptime competitions going on, where people compete/show off their uptimes. These can go into years. There is also the issue that Linux kernel upgrades are non trivial at times. So it is possible that someone is using an outdated version because it works. Kind of like the people using outdated versions of Windows.

  22. Vlad says:

    I got exactly the same problem as that customer. Unfortunately processes in my application started and "random" times and it took me several months to find source of problem.

    There are 2 sides of this problem:

    1. You start new process and it inherits handles when another thread creates process

    2. Another thread creating process and inherits your handles

    And as I see PROC_THREAD_ATTRIBUTE_HANDLE_LIST fix only first problem. If you are using some plugins/library which creates process and unaware of PROC_THREAD_ATTRIBUTE_HANDLE_LIST you have exactly the same problem. So it only works when all parts of system uses it. Removing requirement that handle must have inherited attribute when using PROC_THREAD_ATTRIBUTE_HANDLE_LIST will fix all problems.

  23. Jerome says:


    I think you are right. I do just this in .Net in my own GUI that wraps ffmpeg for video conversions. (And hide the consoles while reading the redirected output for my dialogs progress bars. I have one dialog for each conversion.) Sometimes the dialogs hang inexplicably, and I had no idea what could be going wrong. This could be related…

  24. Medinoc says:

    Vlad said my thoughts exactly: The function's requirement that handles still be marked inheritable means the problem stays present unless the whole code (that creates child processes with handle inheritance) uses the fix.

  25. Deduplicator says:


    – If you have uncontrolled junk in your application, you've already lost. GIGO applies for spawning too.

    – On UNIX/Linux you have to control handles and priviliges for your child. IMHO that's easier done directly than with PROC_THREAD_ATTRIBUTE_HANDLE_LIST. You can customize the childs environment even more. Replacing (v)fork with clone gets you more isolation features, if you distrust your new process too.

    – No overcomit neccessary unless you insist on fork and your parent has significantly more modifiable memory than your child needs. In that case use vfork/clone, like good libraries do. You must not stomp all over memory, but you only wanted to start a child anyway, right?

    – BTW, if you distrust your child, you could replace fork/vfork with clone on Linux (no idea about other Unixes) and gain many more isolation features.

    [Any application that supports plug-ins has uncontrolled junk. So you're saying that all those apps have already lost? Sounds like you are making an argument for "No app in their right mind would support plug-ins." In which case, I'm going to call you on it if you ever ask that a program add a plug-in feature. -Raymond]
  26. alegr1 says:

    fork() is only marginally safe when your process is only running your own single thread. When you run multiple threads scheduled by kernel, which can be in different states (including inside the kernel), you cannot fork them safely.

    If you have a global mutex, and one of the process' threads owns that mutex, there is no way to fork it safely. Same for other kernel objects that should not be owned by multiple processes.


  27. Ens says:

    There are some performance reasons for the fork/exec model.  There are also cases where some operations have to be performed using information only the caller knows, but which must be provided early in the execution of the new process, and fork provides that.

    There are a couple problems that fork has that make me prefer CreateProcess-like semantics the vast majority of the time:

    1.  The "uncontrolled junk" problem.  If I'm not vetting absolutely everything, it's a problem of unknown magnitude.

    2.  It makes overcommit almost necessary since you have to double your memory commitment on a fork(), even though copy-on-write means the memory is not actually duplicated most of the time.  And overcommit means you need an OOM-killer, and OOM-killers are awful for robustness.  There are a few legitimate use-cases for it outside of fork, notably short-lived VMs for web services (which is basically a higher-level "fork" that runs on almost arbitrary base OSes).

    This said, with phones and tablets, automatically terminating background processes is becoming common even without overcommit, so the robustness battle might be lost anyway.  But it irks the programmer in me that you can do everything right and get all the right promises and *still* crash in a situation that it is not called a bug.

  28. Deduplicator says:

    @alegr1: Thanks for the shouting. Doesn't change anything though.

    Fork() is completely safe even if you run multithreaded. It forks the process, duplicating only the executing thread, leaving everything else untouched. If your process was ok without fork, it is still ok with it. Nothing hackish in sight.

  29. Joker_vD says:

    @Deduplicator: "It forks the process, duplicating only the executing thread, leaving everything else untouched." That's only because on Linux threads are simulated with processes. So a 5-threaded process is actually 6 processes, and obviously the thread (which is actually a process) can fork only itself. But there are other Unices.

  30. alegr1 says:


    >Fork() is completely safe even if you run multithreaded. It forks the process, duplicating only the executing thread, leaving everything else untouched. If your process was ok without fork, it is still ok with it. Nothing hackish in sight.

    So it drops all other threads in the duplicated process? What if a dropped thread was holding a process-wide lock? What if the remaining thread depends on other threads? For what the duplicated process will be good anyway, other than immediately calling exec()?

  31. Kevin says:

    @Joker_vD: "That's only because on Linux threads are simulated with processes."

    Don't be ridiculous.  Linux has had real pthreads for ages.

  32. Joker_vD says:

    @Kevin: 8 years is not "ages". Heck, it is just a year older than Vista.

  33. Deduplicator says:

    @Raymond: Hopefully, the plugins even if crap are at least marginally well-behaved. Otherwise you cannot do anything reliably at all, spawning or forking notwithstanding. GIGO (you yourself advocated killing the process fast if basic program assumptions are violated, as far as i can recall. (DrWatson))

    @Joker: Actually, it's one process (according to one definition) or five processes (according to another definition). That Thread-Ids are system-wide and threads are historically light-weight processes is neither here nor there.

    @alegr1: If you fork, you obviously cannot do anything depending on the non-existing other threads making any headway. You are responsible for not depending on them to do so. Just as you are expected not to flush any buffers twice and trusted not to do anything else stupid. Only you can know if you can use the new process directly as is, or have to set up for calling exec.

    What could a clone be useful for? Anything you could use threads for, with the added benefit of avoiding most synchronization overhead, conceptual and runtime.

  34. Joshua says:

    @alegrl1: The consequence is if you are multithreaded anywhere, then you can't allocate memory in the child process before exec(), which means there is no more a good reason to not use vfork() instead of fork.

  35. Deduplicator says:

    @Joshua @alegrl1: Even if the parent is multithreaded and you use fork(), you can allocate memory before exec(), just not with any userland mechanic already in use. So allocating some new pages is ok, and running your own allocator on them is as well. But what the hell are you still doing to set up your child process? All that number-crunching should already be done, to avoid needless complexity. And yes, so you might use vfork() as well.

  36. Joshua says:

    @Deduplicator: The general form of the handle reassignment logic requires MAX_HANDLES (historically 1024 on UNIX) short ints of space and must be executed in the child process. Please note nobody uses the constant in usermode anymore as it is changed by recompiling the kernel (server kernel has much bigger MAX_HANDLES). We can prove that substituting the largest handle referenced for MAX_HANDLES always works; however this means the allocation is no longer a constant. Unless coding in C99 or asm, this now means heap (C++ doesn't have C99's dynamic array on the stack.)

    Before you start to think MAX_HANDLES is too small, remember that it's per process and only file and socket handles count. Mutex handles are the pointers to the mutexes in userspace heap, so they have unlimited numbers. Despite the fact that mutexes use userspace buffers, they do not crash the kernel if they get overwritten by garbage. Your program; however, might be killed by the kernel if you allow this.

    As for leaked database connections, etc. There's an easy way to clean them up that requires allocating a buffer and calling the syscall version of readdir (the standard library opendir tries to call malloc) so that handles may be enumerated.

  37. GWO says:

    @Joker_vD : Eight years is long enough ago to make the present tense incorrect.  For example "George Bush is President of the The United States" is an incorrect statement, even though it was true 8 years ago.

  38. Chris says:

    @algre1 – if a dev forks from multithreaded code, I'd hope they'd have half a clue what they're doing. The threading library, pthreads has library calls to define how fork should be handled – see pthreads_atfork() for example (although not perfect, it's a starting point). Writing thread-safe code is always an interesting exercise in anything but simple cases. Handling fork() correctly is just another tax that should not be different from having to handle locks and other cross-thread communication correctly. Of course, I'm sure someone will argue that pthread_atfork() itself is a hack to fix the hack that is fork, but I'll leave that to other people.

  39. Gabe says:

    GWO: The problem is that saying "Linux has had this feature for ages" is almost like saying "Windows has had this feature for ages" where this feature was introduced in Vista. Whenever Raymond posts some code that uses an API that's been available since Vista, people complain that it isn't supported in XP.

    Since most of my clients are still using XP (and have no definitive plans to upgrade), a feature that's been in Windows for ages (nearly 7 years) will be unavailable to me for another year of two. So in 2014 or 2015, when my last client has moved off of XP, they will be on Win7. At that point, I won't be able to use any APIs newer than 2009 (which will be 5 or 6 years old then).

  40. alegr1 says:

    >GWO: The problem is that saying "Linux has had this feature for ages" is almost like saying "Windows has had this feature for ages"

    Then WIndows had real threads since the dynosaurs walked the Earth.

  41. Bdell says:

    Medinoc: If a lock is held at fork time, any invariants on the cloned state protected by the lock are probably broken, so it's dangerous to do anything with it (to you, not your parent). In multithreaded programs, it's mainly only safe to exec after forking (no printf, malloc, etc). I imagine anything specified to be async-signal-safe could be used safely, but I don't know for certain.

  42. Joshua says:

    @Medinoc: Only that calling thread is forked to prevent chaos. Thread Ids are global. The mutexes are jammed if owned by any other thread (consequence: you can release mutexes from the wrong thread–the system can't check).

    Oh, and don't take Raymond's suggestion to call dbm_close() in the child. This will rollback any pending transaction in the parent (same reason you don't flush buffers).

    @Bdell: Indeed, anything async-signal-safe is safe. Anything is safe in single-threaded programs provided you flush buffers first (what do you think fflushall() is for?). This is why library functions that spawn threads are clearly marked.

  43. Medinoc says:

    Wait, *n*x mutexes are process-local objects, right? If so, I don't see the problem with forking even while a mutex is owned: Duplicate the whole process (ALL of its threads, not just the calling one) and you now have two processes with identical state. Each with its own mutex and the thread owning it.

    However, one is to be more cautious for GLOBAL mutually exclusive resources (semaphores?), but I don't know *n*x enough.

  44. Joker_vD says:

    And now after reading this discussion, I am now even more sure than before that threads is not the abstraction I wouls like to use in the daily multitask programming. Locks, mutexes, buffers and flushes, jeez. I don't even want to know these gory implementation details of task dispatching and message passing.

Comments are closed.

Skip to main content