Why does Explorer generate a page fault every two seconds?


If you fire up Task Manager and look at the page fault count, you'll see that even when nothing is going on, Explorer takes a page fault every two seconds. Isn't this bad? I though you weren't supposed to poll.

Here's an interesting experiment: Change your update speed to High. Wow, the page fault rate quadruples to a page fault every half second. At this point, you should start suspecting some sort of Heisenbehavior, that is, that the behavior of the system is changing due to your act of observing it.

The page faults are coming from the CPU meter in the notification area. At each update, Task Manager sets a new icon into the notification area, and Explorer resizes it from the default icon size (which is the size of the icon that Task Manager hands it) to the notification icon size. To obtain the best quality image, the taskbar uses the LR_COPYFROMRESOURCE flag. This means that the window manager goes back to taskmgr.exe to locate the best match, which in turn triggers a soft page fault. It's a soft page fault since the information is already in the cache (after all, we access it every two seconds!), so no actual disk access occurs. But it still shows up as a page fault, and that makes some people nervous.

What could Task Manager do to avoid triggering this false alarm and freaking people out? Well, when it calls Shell_NotifyIcon, it could pass icons that were loaded at the size GetSystemMetrics(SM_CXSMICON) by GetSystemMetrics(SM_CYSMICON). That way, when the notification area makes a copy of the icon, it won't need to be resized since it's already at the correct size.

Now, there's really nothing wrong with the soft page faults aside from all the time the shell team has to spend explaining to people that nothing is actually wrong. Next time, we'll look at the wrong way of avoiding the soft page faults. Even though it's the wrong way, the exercise is still instructive.

[Raymond is currently away; this message was pre-recorded.]

Comments (31)
  1. Karellen says:

    OK, my knowledge of how the MMU and various levels of cache work isn’t that great – so can someone enlighten me on this…?

    "To obtain the best quality image […] the window manager goes back to taskmgr.exe to locate the best match, which in turn triggers a soft page fault […] since the information is already in the cache"

    Why does this even trigger a soft page fault? Because we access all the data every two seconds and the computer is otherwise idle, shouldn’t the data and all MMU mappings still be valid? What would be forcing this data out in order than it needs reloading from somewhere else in RAM?

    Does a context switch require the MMU state (TLB cache?) to be completely flushed? So it’s the context switch from explorer to taskmgr that’s causing the problem?

  2. Per Larsen says:

    Is there a good way of figuring out what’s causing excessive page faults in your own code – other than commenting stuff out until the page fault rate falls? IOW, are you aware of any public tools that can automatically pinpoint source code areas that cause excessive paging?

  3. Karellen says:

    Per > The cachegrind and massif tools in valgrind[0] might be able to do what you want. If you can run your code under Wine on Linux and you follow the instructions on the "Wine and Valgrind" page[1] then that could help you figure out where the most memory is being allocated and used.

    It won’t quite tell you what’s causing the most paging, but as a first approximation the most paging will probably be coming from the code that causes the most cache misses.

    Disclaimer: I’ve not tried to run valgrind under Wine, so can’t personally vouch for how easy it actually is to set up, or how good it is in that environment.

    [0] http://valgrind.org/info/tools.html

    [1] http://wiki.winehq.org/Wine_and_Valgrind

  4. microbe says:

    I agree with Karallen. This is very weird explanation.

    A page fault happens when you try to access a page that is not there, which could happen the first time you try to look it up. But the page stays there until it’s evicted from the memory by for example swapping, which doesn’t look like this case at all.

    The only possibility is that every time you do this it’s in a different (unmapped) address. It sounds very strange.

  5. David Walker says:

    This is interesting!  And I’m not being sarcastic.

    I would guess that the way to avoid those soft page faults is to cache all of the possible CPU meter icons (there are only, what, 8 of them, or 16, or 10, or something) at the right size.  Or is that what Raymond already said?

    Or maybe that’s the wrong way.  :-)

    Why does the notification area need to make a copy of the icon, if it’s already at the right size?  It could point to the original.  (Of course, the bits get copied into the display buffer eventually.)

    I’m curious what the answer is.

  6. mh says:

    Certainly interesting, but I’d be very concerned at the amount of noise it would generate if I was using Task Manager to get a quickie overview of page faults!  It should be possible to generate the icon image on the fly, purely in software, and without the round trip to taskmgr.exe, or am I talking through my nethers?

  7. Gabe says:

    It sounds like Explorer’s LoadIcon call just maps in taskmgr.exe, accesses the page with the icon (causing the page fault), and closes it. Since LoadIcon has no business keeping the file mapped, it takes a fault every time it gets called.

    The page fault could be avoided by having the taskbar cache all the icons, or having taskmgr send in the correct size of icon in the first place.

    My not have the taskbar cache the icons? Because there’s no point — the memory manager caches them for you by keeping that page of taskmgr.exe in RAM. Making them a permanent part of Explorer’s memory footprint would just make a hard page fault more likely. In other words, optimizing out the soft page faults (which are free) would make it more likely that you would get hard page faults (which are expensive).

  8. Erzengel says:

    David: And if the program frees the original icon?

    It makes a copy so it can guarentee that it will have the icon exactly as given, in a valid state, for the entire lifetime it needs it.

    That’s why it’s possible for notification icons to hang around after their creating program has crashed (thereby not allowing it remove the icon); explorer has a copy of the icon for itself, and doesn’t check with the application until you mouse over it. If it didn’t keep a copy, what would have happened when said program crashed and explorer tried to redraw the icon? Would explorer explode?

  9. alegr says:

    Adding to Gabe’s explanation:

    1. Task Manager requests the icon update every cycle. It has a few ready icons with different "fill" level, to show CPU usage. This is why Explorer can’t just use a cached icon. It certainly uses a cached icon if it just need to redraw it.

    2. When you hover a mouse over the icon, Explorer tries to send a message into your notification HWND. If your program is dead and HWND is no longer valid, Explorer removes the icon from the notification area.

    3. Soft page fault happens because when one maps a file into process space, all pages are marked as unavailable. When a page fault occurs, MM tries to bring the page in. If the page is already in the cache, it’s mapped to the process without actual hard read. This is a soft pagefault.

  10. microbe says:

    Gabe: make sense.

    Alternatively it could use read/write instead of mmap.

  11. KenW says:

    I’m amazed at the number of people here who felt the need to read and comment on todays’ entry, but apparently didn’t feel the need to read yesterdays. If you would read the following:

    http://blogs.msdn.com/oldnewthing/archive/2008/08/20/8880062.aspx

    It specifically explains why the icon is not loaded from the cache:

    "When you pass the LR_COPYFROMRESOURCE flag to the CopyImage function, the window manager goes back to the original icon source to create the copy you requested rather than blindly stretching the pixels of the icon you passed in."

    This also explains the soft page fault issue.

  12. Mike Diack says:

    It still sounds to me as though Task Manager could/should have been better written to avoid this.

    Mike

  13. mh says:

    It’s not really a problem though, more a case of something that one needs to be aware of when analysing Task Manager data (particularly if naively using Task Manager to get a true picture of performance).  It would be nice if taskmgr.exe could have an option to filter out it’s own data, but that’s probably something that’s already been considered and rejected for one reason or another.

  14. Karellen says:

    KenW > No, I got that, but I still don’t see why a (soft) page fault would happen.

    Explorer has the icon it shows. When it gets a new icon, it’s going to free the old icon and allocate space for the new one. But both of those allocations should both be the same size, and if malloc()/free() (or whatever API calls are responsible) are even semi-sensible they should end up, after a while, reusing the same chunk of memory each time for the icon. If the app creates the new icon before freeing the old, it will probably end up using the same 2 blocks of memory switching between them each cycle. (Does that make sense?)

    Remember, malloc()/free() are strictly userspace calls. free() won’t free memory back to the OS, it will just create a hole in the heap to be reused by the next appropriately sized malloc(). The OS and the MMU knows nothing about whether an application thinks that a bit of memory is in use or not.

    So the memory locations used for the destination icons in explorer should probably always be in the cache, as should the memory for the source icons in taskmgr, as should their page table entries.

    Shouldn’t they?

    So if everything should always be in the cache, as this is on an otherwise idle system, why are the soft page faults happening?

  15. Karellen says:

    Doh! Except Gabe’s explanation that it uses mmap() (sorry, CreateFileMapping()) to map the memory from the other process does explain that. mmap()/munmap() are OS calls and would create page faults.

    Must. Read. All. Comments. Next. Time.

  16. Michael says:

    Per Larsen: pfmon from the resource toolkits may be helpful. It doesn’t provide callstacks, which can be annoying if you just see a hard page fault listed at something like memset. One thing to keep in mind is that it attaches itself as a debugger and won’t detach on getting a CTRL-C. So when pfmon is killed, the monitored process is killed as well. At least, the version I’ve used.

  17. AC says:

    So a soft fault doesn’t sound like a big deal. Anybody know why taskman shows this but not hard faults (which are a big deal)?

  18. fo says:

    AC: It does show them.  Why do you think it only shows soft faults?

  19. Erzengel says:

    AC: The problem isn’t that it doesn’t show hard page faults (it does), the problem is that it shows hard page faults + soft page faults as a single number, so we have no way of knowing (from taskman) how many of those "page faults" we actually care about.

  20. Jorge Coelho says:

    If you think two soft page faults per second is bad, try writting a program that uses GDI+ to update a couple of bitmaps every second. We’re talking HUNDREDS of page faults per second here!

    My question is: don’t soft page faults include some kind of performance overhead, although nowhere near as bad as hard page faults?

  21. Godzila says:

    "My question is: don’t soft page faults include some kind of performance overhead, although nowhere near as bad as hard page faults?"

    In a Word? No, soft page faults are free and incur no overhead.

  22. David Walker says:

    @Erzengel: Several good points there.  I wouldn’t want Explorer to explode!

  23. Gabe says:

    Soft page faults are hardly free (about as expensive as a system call), but I would guess they are between 1000 and 1000000 times faster than hard faults.

  24. Antti Huovilainen says:

    Is there a way to find out why Explorer causes 99 extra page faults per second? I’m guessing some handler, but how to find out which?

  25. Dean Harding says:

    @Antti Huovilainen: Uninstall them one-by-one until it stops happening? You could use something like autoruns to disable them (so that it’s easier to re-enable them later).

  26. Igor Levicki says:

    Just to clarify — soft page fault isn’t free performance-wise. It has the same penalty associated with hard page fault minus disk latency. That means it still should be avoided by all costs. Remember, page faults degrade overall system performance, not just performance of a single program that causes them.

  27. David Walker says:

    Gabe:  I don’t know what the phrase "x times faster" means when you’re talking about duraion and not speed.  Does it mean "takes 1/x the time"?

    A mainframe Syncsort ad from many years ago, published in Computerworld:  "’Our sort jobs now run in 120% less time than before, thanks to Syncsort!’, gushes Bob Wahtshisname, system programmer for MegaCorp."

  28. Jorge Coelho says:

    Igor Levicki: "That means it still should be avoided by all costs."

    Guess somebody forgot to say that to the MS team responsible for GDI+.

    It’s a bit of a pain explaining to your users that the millions (literally!) of page faults they are seeing are actually not your application’s fault but rather Microsoft’s, that you can do nothing about it (unless they want to live without alpha transparency effects, etc…) and that it doesn’t really hurt performance (much!). :-(

  29. Gabe says:

    David Walker: When something goes twice as fast, it takes half as long. Does it make more sense to say that the hard page fault is 1000x slower than the soft page fault? I would tend to think that foo is X times slower than bar means the same as bar is X times faster than foo.

  30. Igor Levicki says:

    Jorge, then do not use GDI+ — roll your own.

Comments are closed.