My application seems to hang. What do I do? – Part 1


Defining “hang” is a good place to start.


 


When people say “hang” they could mean all sorts of things. When I say “hang” I mean the process is not making progress – the threads in the process are either blocked (eg. deadlocked, or not scheduled because of threads from other processes) or executing code (madly) but not doing useful work (eg. infinite loop, or busy spinning for a long time without doing useful work). The former uses no CPU while the later using 100% CPU. When a UI developer says “hang” he could mean “the UI is not getting drawn” so essentially they mean the UI threads are not working – other threads in their process could be doing lots of work but since the UI is not getting updated it appears “hang”. So clarifying what you mean when you say “hang”, which requires you to look at your process and its threads, is the first step.


 


If you start Task Manager (taskmgr.exe) it shows you how much CPU each process is using currently. If you don’t see a CPU column you can add it by clicking View\Select Columns and check the “CPU Usage” checkbox.


 


Note that if you have multiple CPUs, the CPU usage is at most 100. Let’s say you have 4 CPUs and your process has one thread that’s running and taking all the CPU it can you will see the CPU column for your process 25 – since your process can only use one CPU (at most to its full) at any given time.


 


The CPU usage for a process is calculated as the CPU usage used by all the threads that belong to the process. Threads are what get to run on the CPUs. They get scheduled by the OS scheduler which decides when to run what thread on which processor. I won’t cover the details here – the Windows Internals book by Russinovich and Solomon covers it.


 


If you see your process is taking 0 CPU, that would explain why it’s hung (for the period of time when the CPU keeps being 0) – no threads are getting to run in your process! The next thing to look at is the CPU usage of other processes. If you see one or multiple other processes that take up all the CPU that means the threads in your process simply don’t get a chance to run – this is because the threads in those other processes are of higher priorities (or temporarily of higher priorities due to priority boosting) than the threads in your process. The possible causes are:


 


1) there are threads that are marked as low priority which acquired locks that other threads in your process need in order to run. And the low priority threads are preempted by other (normal or high) prority threads from those other processes. This happens when people mistakenly use low priority threads to do “unimportant work” or “work that doesn’t need to be done in a timely fashion” without realizing that it’s nearly impossible to avoid taking locks on those threads. I’ve heard of many people say “but I am not taking a lock on my low priority threads” which is not a valid argument because the APIs you call or the OS services you use can take locks in order to run your code – allocating on native NT heap can take locks; even triggering a page fault can take locks (which is not something an application developer can control in his code).


 


2) the threads in your process are of normal priority but those other processes have high priority threads – this should be relatively easy to diagnose (and unless some process is simply bad citizens this rarely happens) – you can take a look at what those processes are doing (again looking at their threads’ callstacks is a good place to start).


 

That’s all for today. Next time I will talk about other hang scenarios and techniques to debug them.

Comments (16)

  1. Joe says:

    These tools: http://www.sysinternals.com/Utilities.html

    .. recently acquired by Microsoft, can be very useful to see what (if anything) an application is doing when it appears to have hung. In particular FileMon will show you what it is accessing on disk.

  2. "My application seems to hang. What do I do?"

    Press Ctrl-Break (or kill -3 on unix) to get a thread dump so I can see what my threads are doing, what monitors they have locked, etc.

    oh wait, you weren’t talking about Java…

    I hope in Part 2 you’ll tell us the .NET equivalent of this. After source-availability, it’s what I miss most when comparing .NET to Java…

  3. Luke, I think you missed the point – this is not specific to managed code at all. You can attach a debugger and get a dump (for both managed and unmanaged processes) or get the callstacks for all threads…that’s not a problem at all.

  4. Luke, in addition to maoni’s comment, you can use performance monitor (perfmon.exe) to get far more information than Java’s hard to decipher ctrl-brk dump.

  5. Jamie Gordon says:

    The first two things I do if I find a ‘hung’ app is to check the processor usage (to see if I am looking for an infinite loop or a deadlock) and then attach a debugger to take a look at each thread to see which one is at fault. It might be overkill to start with the debugger but it’s so powerful there’s virtually nothing you can’t do or find out about the app with it.

  6. Jamie, you are right – deadlock or infinite loop are the easiest cases to deal with and you should look for them first. I was going to cover deadlock next but perhaps I should have mentioned it in the first part.

  7. Roland Kaufmann says:

    Use Process Explorer instead of Task Manager, and you can see the CPU usage of each individual process. Install a debug server and you can even inspect each thread in a process to see what it is doing right now. (Of course, you could take a non-intrusive mini-dump as well, but I guess that is the topic of later installments).

  8. Note that the purpose of my article is mainly to talk about the possible causes of getting a hang, not so much focused on tools. TaskMgr will show you individual CPU usage as well; Process Explorer is also a good tool – you need to download it while TaskMgr comes with the OS which was the main reason why I mentioend TaskMgr.

    Perhaps I should say "Use TaskMgr or your favorite tool to get the CPU usage".

  9. Belliappa says:

    My system hangs if I go to internet. It will work only if I disable all startup and work in selective startup mode. Please solve my problem

  10. nativecpp says:

    "deadlock or infinite loop are the easiest cases to deal with and you should look for them first. "

    What are the toughest cases in term of hang?  memory leaks or memory retention (in .NET) ??

    To me, infinite loop is the easiest one since you can see which thread is doing it.

    deadlock is more difficult and is likely the flaws in sync logic.

  11. nativecop: for example the cases I listed in the blog entry are harder. I think you might be thinking about infinite hang – the types of hang I am talking including infinite or not infinite hang. A process could just be not doing anything for a few seconds which is harder to debug than a process that’s hung infinitely.

  12. nativecpp says:

    I see. I guess if either case 1/2 occurs, I probably woud do a hang dump and look at the call stack.

    Thanks for the clarification.

  13. Right, that can help sometimes. Sometimes looking at a dump doesn’t help with debugging hangs that last only a few seconds and occur randomly because a dump only shows you what the process is doing at that specific point in time; it doesn’t give you an idea of what the process has been doing during the few seconds it was hung; very often you see perfectly legal callstacks when you break into a process that hangs. It may not be the culprit since you don’t know what the process was doing before this; or the callstacks would become illegal if they are executed too often.

    I tell people to look at the callstacks a few times during the hang – if possible. But it’s often not because either they can barely break into the process in time (because the CPU is already at 100%) or the hang doesn’t repro anymore as soon as they look at it under the debugger since the hang was during to timing (how the threads get scheduled).

  14. nativecpp says:

    One of the bugs I fixed was that the server (IIS) wasn’t busy and yet the response was slow.  If you are talking about the process is not responsive for a few second and then go back to normal. That’s tough. I would still attach windbg and look at the call stack at that particular period of time. But it definitely is ‘hard’.

    I assume that you would be showing us some example, I hope.

  15. Last time I talked about the hang scenario where your process is taking 0 CPU and the CPU is taking by

Skip to main content