Shutdown Is No Time For Spring Cleaning


I think my current performance pet peeve is shutdown.  Assorted flavors of it, they all seem to have the same kind of problem.  Sometimes we’re stuck with it… but maybe we shouldn’t be?


This is one time when our basic training, which normally I love so much, tends to let us down.  We were all taught to clean up our own messes — programming wise that means freeing your resources after you are done with them.  But this backfires in the shutdown case.


Many times I watch as I hit the [X] close button on some application and my poor computer starts to swap as the application goes about paging in vast amounts of its code and then dutifully walking all its data structures– more paging — and giving them back to the operating system.  My reaction to this in a word:  ACK!!!!


When your application is ordered to shutdown the last thing you should do is enumerate every piece of memory you have ever allocated and systematically give them back to the operating system.  Your program has a death sentence, and soon your resources are going back to the operating system whether you like it or not:  what you must do  is look at the minimum possible amount of memory necessary to get to a nice safe stable state and then exit as quickly as possible.  Abandoning your memory like this gives the operating system the best chance to get your process unloaded while swapping in the least amount of memory and causing the least impact to the systems disk and memory caches. 


In short, shutdown is no time for spring cleaning.


And why all this cleaning anyway?  Many people report that they have all these important resources that need flushing and so forth.  They couldn’t possibly get to a safe state without considerable work but usually that in itself is symptomatic of assorted problems.  Any application that has important data to manage almost certainly needs to be tolerant of power-failure and if that’s the case when the user makes important edits they likely should be automatically saved to a durable location.  In fact at any given moment, probably only a few seconds worth of data should be at risk.  If your application has been idle for any length of time it should be fully saved — and even if the user hasn’t chosen to save their work it’s still effectively saved somewhere so that you could restore in the current “dirty” state.


So if you have to do all this work to be resilient to power failures, then take advantage of that logic to simplify your shutdown paths.  Your users will thank you.

Comments (19)

  1. Peter Ritchie says:

    In .NET, without your shutdown event processing code being coupled to ever possible reference type, are you suggesting something like this:

    static void SystemEvents_SessionEnding(object sender, Microsoft.Win32.SessionEndingEventArgs e)

    {

    if (e.Reason == Microsoft.Win32.SessionEndReasons.SystemShutdown)

    {

    Environment.FailFast();

    }

    }

  2. I once worked on a storage service where we made an explicit decision not to do *anything* at shutdown, precisely because we *did* need to be reliable.  It turns out that you (hopefully) shut down a service a lot more often than you lose power or crash – so there’s no better way to test that abnormal termination works properly than to treat every shutdown as an abnormal termination.  

    Oh, and it’s faster that way, too. 🙂

  3. ricom says:

    I would really really like it if

    Environment.FailFast();

    Was the correct answer for many more applications and services.  If you take your abnormal shutdown responsibilities very seriously then I think it can be.

  4. ptaron says:

    How do you do this without sacrificing correctness?

    1) not all resources are memory (aka SQL connections, RPC handles, shared memory, sockets, etc)

    2) most leak detection schemes rely on the fact that apps will cleanup their messes, and the set of uncleaned resources corresponds to the set of leaked resources.

    I can see letting memory get leaked this way, but other things? Seems like a bad idea…

  5. ptaron says:

    Also, I would argue that if an app has a lot of memory to clean up at well-defined points in time (aka shutdown, recycle, update, etc) that application should use a memory pool instead of malloc/free. That allows the memory to be freed in one operation instead of many small ones.

  6. Lasse Karlsen says:

    I think my "favorite" application in this regard is Borland Delphi 2007. When closed, the BDS.exe file stays in memory and uses cpu (and interestingly enough, increases its memory size temporarily) for anywhere between 1 and 2 minutes.

    Couple that with a fairly unstable IDE which you really just need to close down periodically, and the fact that the program refuses to load if it’s already in memory, and you got yourself a party.

  7. Adrian says:

    I agree in general, but in real life, I would only make these optimizations if the spring cleaning became a true performance problem.

    As a previous commenter pointed out, doing your spring cleaning helps find leaks and other bugs.

    Another problem I’ve seen the "die and let the OS reclaim the resources" approach cause are when what was an "application" now becomes a "feature" within a larger application.  Now, to avoid leaks, you have to go back and write all the clean-up code that you elided before.

    Consider a GUI app, where the main window doesn’t do any cleanup on shutdown.  Later, the app is redesigned such that the "main" window is relegated to being an MDI child within another frame.  The fast shutdown scheme is now a liability as you experience leaks every time the now-child window is instantiated and destroyed.

    That RAII pattern, which is otherwise so useful, makes it very hard to skip clean-up on shutdown.  I’m not sure I’m willing to give up RAII until I enounter a specific situation where my "unnecessary" cleanup is a performance problem.

  8. monsur says:

    I’ve had Firefox crash on me a few times where it just completely disappears in the blink of an eye.  The speed at which it crashes is spectacular: there is no error message, no thrashing, just *poof* its gone.  But whats interesting about this is when I start Firefox back up, it asks me if I want to restore my previous session, and which point I’m back up and running where I left off.  I wish their shutdown process could use whatever magic that crash does!

  9. Ovidiu says:

    You want crash-only software (http://www.usenix.org/events/hotos03/tech/candea.html).

    The reason why this is hard to do in real world is simple: Software components. You need each component participating in a system to have dual cleanup modes: Fast and thorough. Currently a lot of components are neither 🙂

  10. KK says:

    Environment.FailFast() is not a good choice. According to MSDN, "The FailFast method writes a log entry to the Windows Application event log using the message parameter, creates a dump of your application, and then terminates the current process.".

    As you can see, it cause system to:

    1. write a log entry in event log.

    2. CREATE A DUMP of your application.

    Creating a dump may spend much time if  your application allocated a lot of memory.

  11. ricom says:

    Even failfast doesn’t really fail fast.

  12. ShayEr says:

    So WM_QUIT == ::ExitApplication(0)?

  13. David Weller says:

    Rico,

    If you’ll recall, early versions of Longhorn had the famous "Kill" option (including the cute skull icon) in the right click context on the taskbar.  Wish that would have stayed in the final version 🙁  Perhaps it had too much of an X11 feel to it? 🙂

  14. Sam says:

    For .NET, System.Diagnostics.Process.Kill() is fairly quick and thorough.

  15. Dave says:

    IMO the required shutdown semantics depends on the application.

    For a browser, offline editor, non-critical application, etc., a fail-fast shutdown approach may be appropriate, but a heart monitor, industrial controller, long-running service, etc. all have reliability requirements that argue for absolutely correct resource acquisition/release and shutdown behavior.

    In the apps I’ve worked on I usually write as much code for error handling and clean up as I do for feature support. Failure mode analysis is part of our design process.

  16. Dave:  A heart monitor?  I want my heart monitor to recover from a power outage just as quickly as it recovers from an ordinary shutdown…actually, it should probably recover even faster from a power outage, if possible. 🙂  That implies to me that it cannot depend on orderly shutdown.

    Now, I suspect most heart monitors are relatively stateless, so it’s not an issue…but maybe not….

    Doing proper failure analysis and error handling is not the same as deciding to do lots of explicit cleanup in response to errors.  Usually it’s the opposite – it’s about designing the system to be resilient to the cases where explicit cleanup was *not* done.  All the shutdown code in the world doesn’t help if your power supply or disk subsystem fails, or your app crashes due to a bug, which in my experience are the three major ways that long-running apps stop running.

  17. Code Heaven says:

    These are all blog posts I flagged as being particularly interesting, but ones where I may not have anything

  18. Lots of great comments on my last posting, I wanted to address the performance concerns especially.