Theorize if you want, but if the problem is right there in front of you, why not go for the facts?

On an internal discussion list, somebody asked a question similar to this:

My program launches a helper program. This helper program does different things based on the command line parameters, but the way I'm using it, it just prints its results to the console and exits. When I launch this program and wait for it to exit, my wait never completes. Why isn't the helper program exiting? Here's the code that I'm using to launch the helper process...

It wasn't long before people chimed in with their suggestions.

Have your main program call exit() at the end.

If you're redirecting stdout, you may be forgetting to drain the pipe. Otherwise, if the program generates too much output, the pipe fills and the helper program blocks writing to it.

Alas, that didn't help. Whether I redirect the output or not, the helper process still hangs.

Eventually I had to step in.

I can't believe I'm reading this discussion. It's like watching a room full of child psychologists arguing over why Billy is crying. They're all expounding on their pet theories, and none of them bothers to ask Billy, "Why are you crying?"

Connect a debugger to the helper process. See why it's crying.

You can sit around developing theories all you want, but since the problem is right there in front of you, the quickest way to figure out why the helper process isn't exiting is to connect a debugger and look at it to see why it's not exiting.

This is like the software version of the black crayons story.

Comments (29)
  1. Rob H says:

    There’s an underappreciated book called Debugging by David Agans. (It’s about debugging systems in general, not software in particular, but completely applicable.) My favorite chapter heading is "Quit Thinking and Look," which sounds exactly like the advice needed on your list.

  2. nathan_works says:

    Alas, until Billy is around 2 or so, you can’t exactly get an answer from him on why he’s crying. And sometimes no meaningful answer until maybe he’s 4.

    Fortunately, the set of things that can make babies cry isn’t too large: { hungry, dirty diaper, tired, lost a toy, bonked something, "just being fussy" } Though I wish my psychic debugging skills were stronger at times, as the list grows with the child’s age.

  3. blah says:

    Been there, done that. It helps to close stdin before waiting for the demise of the child process.

  4. Chucky says:

    @Nathan_works: seriously dude, Raymond wrote "child psychologists" and NOT "baby psychologists."  I interpreted the phrase "child psychologists" to mean that a child was involved, a child that is capabable of listening, speaking, and answering questions.

    Raymond’s advice stands.  Attach a debugger to the helper program and see what it is doing.

  5. Programmerman says:

    Raymond, you are very wise.  Suddenly, I realize I’ve been a fool about a similar problem I’ve been having in my environment.  Thanks for the brainkick, I needed it.

  6. mvadu says:

    what if the helper program  is a pre compiled release version of a third party program?

    Seriously, after removing Vista SP1 RC and installing Vista SP1 release version, in my machine if I click on Customize in Network and Sharing centre i get nothing. No windows. MS Support closed the issue as "Unresolved". I tried attaching Windbg to the Explorer process hosting the applet and I could see that it is loading two dlls, nothing else. How to proceed..

  7. I’ll bite.  Why wasn’t the helper process exiting?

    [There was no reply. That’s one of the sad things about mailing lists – when people solve their problem they frequently forget to report the answer back to the list. -Raymond]
  8. Alexandre Grigoriev says:

    It’s sad to know there are such clueless programmers at Microsoft. Though I see their results all the time. Like Visual Studio 2005 SP1, which takes hours to install. Or a system update, that only replaces a 200KB module, but takes a few minutes.

  9. codekaizen says:

    Jumping to theorizing when you have the data is an exasperating behavior, but it has roots in an appropriate response. When you have a black-box system (for example, this dead-ish desktop system sitting in to my left), sometimes your only way to get into it is to guess-and-check. It’s a standard troubleshooting approach when you can’t see the details of what is wrong.

    The interesting bit comes when you have better tools, but either don’t know how to use them or forget to. I’ve seen a number of times when, even after I showed someone how to attach the debugger and run through the stacks of executing threads, they go back to the guess-and-check approach. My theory (ha, ha) is that the mental effort required to master the new tools and methods is greater than the payout of the old, comfortable habit, since eventually the latter works.

  10. LS says:

    Billy was crying because he had a helper program that wouldn’t exit properly and nobody would tell him the answer…

  11. "It’s sad to know there are such clueless programmers at Microsoft. Though I see their results all the time."

    This is a perfect example of the sort of logic failure that this post is about. If you want to dislike Microsoft, fine, but the examples you cite have nothing to do with the subject of this post. First of all, every programming shop has sub-standard programmers. Second, you don’t know why a given update takes a long time to install. It might be because of clueless programmers, or it’s more likely that the problem is a lot more complicated than you think it is. Several of Raymond’s posts describe the complexities that exist in Windows that aren’t immediately obvious, but that have really good reasons for existing.

    Do us a favor and don’t start pulling theories out of the air until you have a clue what you’re saying.

  12. zmx says:

    Raymond didn’t give us enough background info about why "Billy" was sending the email to the mailing list in the first place.

    Who owns the "helper program"? If "Billy" owns it, than he should debug it himself, instead of crying in front of others.

    Could it be that the helper program is owned by the people on the mailing list? If so, then it makes sense for "Billy" to send a "help needed" email to the list(, though proactively debugging other’s code is a good thing).

    Or is the "helper program" owned by a 3rd party? In that case, then Billy should look elsewhere.

    Anyway, Raymond doesn’t clarify which of above is the case. Or perhaps "Billy" didn’t clarify.

  13. Jonathan says:

    You can see a surprising amount of details with a process dump (which the Vista task manager will take for you, yay!), windbg and, the public MS symbol server, and the command ~*k.

    I’ve gone into the habit of capturing hung processes before I kill them, just so I know afterwards. And I’ve found several culprits this way. For example, my previous laptop would sometimes have processes hung for no reason, and you couldn’t kill them either. A dump would always show a thread going into some WinMM call, and disappearing into the kernel with some IOCTL – which led me to a questionable sound driver, probably didn’t handle hibernation well. This would manifest in wmplayer.exe, iexplore.exe (usually with flash on there too), or explorer.exe (the volume icon thing).

    As raymond said more than once: Don’t be helpless.

  14. Tim says:

    Yup, been there. Sometimes it seemed like an endless parade of people asking for help without looking at the what it was doing. One person was honest, at least. He told me that it was easier and quicker for me to get the answer than for him to figure it out.

  15. Mark Steward says:

    mvadu: proceed as Raymond suggests.  Find the right CLSID in HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionNetworkListProfiles, and run the following command line in a debugger:

    rundll32 pnidui.dll,NwCategoryWiz {CLSID} 1

    This is the helper process. Check out debugging help (or send me an email) if you’re still stuck.

  16. @Alexandre Grigoriev: Really, you believe Microsoft should have no programmers that engage is this kind of stuff? Could you share with us the 100% foolproof method you use at your company to ensure that all of your developers are 100% on the ball at all times and never engage in any senseless activity?

  17. Yuhong Bao says:

    Yep, ideally I’d say that before you blame a problem on anything, debug it properly first!

    (and this applies to all OSes, not just Windows)

    Except that not everyone is a programmer.

  18. Ulric says:

    replies like "Have your main program call exit() at the end." always really anger me.  People have lot of knowledge, but don’t seem to have the deduction skills to go with it.

  19. Alexandre Grigoriev says:

    A friend working at certain pow(10., 100) company told me that they have mandatory code review before check-in. I wonder if Microsoft has the same. Even senior developers have their stupid moments and code review, even for "insignificant" changes helps quality enormously.

    Replying to "steveshe", a junior programmer should have his/her code/changes reviewed by a senior programmer before checkin. Even if you don’t mandate code reviews for senior programmers, you should for juniors.

    These policies would help to avoid many disasters.

  20. Alexandre Grigoriev says:

    And regarding exit() function, a properly designed program has no use for it. Putting exit() call to a library, though, should be grounds to dismissal.

    The only proper place for exit() call is inside ASSERT() and alike, which doesh’t go to production build. And even there I would prefer TerminateProcess(GetCurrentProcess()). I mean unconditional seppuku for the process.

  21. Ulric says:

    Here’s the funny thing about the exit() comment

    "Have your main program call exit() at the end"

    If it’s the end of program, you don’t need to call exit().  

    If there is no end of the program, there’s your problem.

  22. Yuhong Bao says:

    Yep, ideally I’d say that before you blame a problem on anything, debug it properly first!

    (and this applies to all OSes, not just Windows)

    There is a reason why I say that. Too many people are quick to blame a problem on something without even attempting to debug the problem. For example, blaming something on Vista.

    Except that not everyone is a programmer.

    Which is unfortunate.

  23. Lars Viklund says:

    Notable is that exit() is quite a dangerous function to call if the program in question is a C++ program.

    To quote part of 18.3 p8 in the C++ standard: "(Automatic objects are not destroyed as a result of calling exit().)"

    Basically, that means that if main contains any non-POD stack objects, or if you’re somewhere else than main, you will have nasty side effects if relying on RAII to do proper cleanup.

  24. nathan_works says:

    Jonathan —

    Don’t be so sure it’s the sound driver.

    I spent a month trying to track why our IRPs (what IOCTL gets translated to in the kernel level) were corrupted. I cleaned up a ton of code, but it didn’t make a difference. Finally started stopping other services. Ah-hah, this !@#$@ network driver was trying to re-use IRPs that it had released (IIRC, you can queue your own IRPs for reuse, if you want..).

    I’m ashamed it took me so long, since all the finger pointing was "our problem", and figured the customer had already done the "update all drivers" and "selectively turn off services to help debug" steps. Got a big ding on my review for taking so long.

  25. Daniel Colascione says:

    You can take the Python approach: throw an exception that makes the program exit when it’s caught at the top level. The exception will properly destroy any objects on the stack as it works its way up the call stack.

  26. Isaac Lin says:

    Lars Viklund: In any case, a program ought to handle unexpected terminations. So if the program has acquired any resources that can survive its death, it should clean them up either on startup (e.g. lock files) or register with the appropriate OS cleanup mechanisms (e.g. System V semaphores).

  27. RaymundoChennai says:

    Rather than just make your suggestion, once again you act like a douchebag (and then make your suggestion).

  28. KenW says:

    @RaymundoChennai: Once again, you open your mouth and nothing but bovine feces emerges.

    If you don’t like Raymond’s posts, don’t read them. Either way, shut up and let the adults talk in peace.

Comments are closed.