Only an idiot would have parameter validation, and only an idiot would not have it


The great thing about parameter validation is that there are people who say that only idiots would have it, and other people who say that only idiots wouldn't.

Back in the old days, Windows didn't do parameter validation. If you called a function with an invalid window handle, your application crashed. If you called a function with an invalid pointer, your application crashed. If you called a function with an invalid bitmap handle, your application crashed.

There was a lot of crashing going on.

These crashes manifested themselves in the infamous Unrecoverable Application Error dialog, commonly known as the UAE message.

Windows 3.1 added parameter validation to all of the functions in KERNEL, USER, and GDI. If an application passed an invalid window handle, then instead of crashing, it just got an error back.

This change was met with derision. "Oh, I'm so impressed. You finally got around to doing something you should have been doing all along. Not doing parameter validation was a totally idiotic decision."

But nowadays, parameter validation is out of fashion again. If you detect an invalid parameter and return an error code, then all you're doing is masking a latent bug in the application. It should crash and burn with a big red fat'n'ugly blinking exception.

In other words, we should go back to the way things were before we added parameter validation. (Well, except that the bit fat ugly exception wasn't red and it didn't blink.)

Who's the idiot now?

Comments (59)
  1. Pierre B. says:

    Igor is clearly right…

    If a memory allocation fails, the Windows should blue-screen.

    If a non-existent file is open, white-noise should be played as loud as possible and all the window contents (if any) should flash in inverse video.

    If the user mistype a command in the shell, the screen should rapidly blink various colors randomly non stop until the computer is turned off.

    If you click on a dead URL in the web browser, all fans should be stopped and your CPU and graphics card left to melt.

    You’re either a real man or you don’t touch computers. That’s were the line is drawn. Igor knows.

  2. Nicholas says:

    If the application is going to cause an Access Violation then it should die as fast as possible so as not to mask a latent bug.

    However, the OS should not crash and burn.  I think it is proper that the OS validates input and returns an error code on bad input.

    The difference here is that the app is going down by its own doing.  The OS didn’t do anything wrong, it is just dealing with foreign programs that probably should not be trusted (these days, at least).

  3. Gabe says:

    The answer is actually simple: when running for your QA dept, it should crash immediately; when running for your customer, it should silently keep going.

  4. Marquess says:

    If the app crashed (for a lack of parameter validation), a lightning bolt should strike the programmer.

    Each and every time.

  5. mvadu says:

    Nice article Ray, but how on earth you keep track of all these comments you linked!!

    One from 2008, other from 2007 for an article in 2009 which speaks about valid point and links both. Impressed.

  6. Pierre B. says:

    More seriously, I’m surprised people are still doubtful about error checking.

    1. Only the function doing the actual work knows all the failure modes and thus can really validate the parameters.

    2. Those failure modes can change from implementation to implementation, making it nigh impossible for the caller to know what and how to validate.

    3. Some validation is impossible. Trying to create a file with a unique name? You need to open it and see if the open fails. Need to know if you can read a file? Only opening and reading it can actually validate that you can.

    4. It reduces code clutter: validation is only done in one place and it is done consistently. If the caller validates, then they’ll get it wrong half of time. (Being generous here.)

    5. Try-and-handle-failure is a common coding practice and usually leads to clearer code, at least in some languages.

    6. Failure is often not a local property but a global one. When something fails, the correct failure mode is usually to fail the whole operation. Most language still don’t get this right. Exception is a valiant attempt to make the error handling more global, but the true answer might lie in some form of aspect programming where the error handling policies are designed globally based on whole-program semantics.

    [You are starting to conflate invalid parameters with errors in general. Be careful. -Raymond]
  7. Nish says:

    > Who’s the idiot now?

    All programmers are idiots some of the time, and some programmers are idiots all of the time. You cannot claim to be an experienced programmer unless you provide provable claims to have indulged in various forms of idiocy over the years!

  8. John says:

    Not to start a flame war, but I blame this "everything throws an exception" mentality on the rise of programming paradigms that get further and further away from the hardware.  Anyway, my belief is that (at the Win32 API level) exceptions should only be raised in exceptional circumstances.  To me, passing an invalid window handle to some function doesn’t qualify.  Yes, this could potentially mask serious problems, but the people who don’t check return values are the same people who would catch and swallow all exceptions.

  9. BOFH says:

    Here is the red, blinking exception:

    http://haftbar.de/wp-content/guru-meditation_error.gif

    The red box blinked at 1Hz.

    Captcha 066; Wow, only one digit off.

  10. @Gabe: Suppose the customer’s software deals with money. If some sort of corruption occurs that affects the totals, then it could run incorrectly for hours (or days, if no end-of-day accounting is performed). In a situation like that, it would definitely be better for the program to exit as soon as the error condition is detected, when diagnostics might be more useful.

  11. Mike says:

    Yeah, validation is for p*ssies!!!

    Especially if writing web-apps! After all, in popular use "injection" is almost synonymous to "innoculation", so allowing users to inject anything they want can only make your system healthier right?

    Oh look! User "Fred; drop user app_user cascade; " logged in again!

  12. Koro says:

    I’m all for crashing and burning when in debug mode, because it makes errors obvious to the programmer.

    However, when the app is in the user’s hand and an error happen, the operation should just fail silently instead of taking the app down in flames.

  13. Duran says:

    I will say this: parameter validation certainly makes triaging by call stacks

    easier.

  14. Goran says:

    Hey, Raymond, would yo consider a star for Gabe?

    @John ("people who don’t check return values are the same people who would catch and swallow all exceptions")

    No, these people are e.g. dumb to know they need to check retval or e.g. either lazy. If exception is thrown at them, they would do nothing if they are dumb (don’t know they can catch) and they would do nothing if they are lazy (try/catch around each call is a much bigger pita than an if). So it clearly works better – dumb and lazy are punished ;-).

  15. Stuart says:

    … please forgive the boneheaded use of entirely the wrong name in the prior comment. Where it came from may be obvious but I’m not gonna state it because that just makes it even more embarrassing.

  16. bd_ says:

    @Stuart,

    I think with .NET it’s not so much that you have exceptions now, but rather, the language itself guarantees you can’t have invalid pointers.

    The kind of validation mentioned here is the kind of stuff that would complain if you passed some arbitrary number to a function that wants a window handle. In .NET this can’t happen – you can’t turn that into a reference to a window in the first place. The language automatically does this specific kind of validation for you, so there’s no need to add it everywhere manually.

  17. Paul Betts says:

    Who says you can’t have it both ways? Run it normally, it returns an error, but run it under App Verifier, and it blows up.

  18. brantgurga says:

    I think the issue isn’t so much parameter checking or not parameter checking… it’s what you do about it. You want the issue fixed so fail early and fail fast. That’s why you do parameter checking. You don’t not do paramter checking and sometimes work by accident and not work other times. It’s more about error detection than whether parameters are validated.

  19. Kyle S says:

    I think it’s important to differentiate between errors that occur by virtue of using invalid input in the normal course of operations vs. the actual error of passing invalid input itself. I always preferred validating parameters and throwing some InvalidArgument exception, rather than relying on the fact that passing an out-of-range value is going to wind up causing an exception/fault down the line.

  20. Olivier says:

    @Mike : injections in websites are essential! They are here to transform a normal and boring website into a social network where everybody can contribute to it :)

  21. Gabe says:

    Paul Parks: Yes, I wrote my comment somewhat tongue-in-cheek. However, there is nothing more annoying than a program crashing due to some irrelevant, inconsequential error. Remember, crashing not only causes the user to lose any unsaved data, but it also leaves any open files in an unknown state. As long as the state is going to be unkown in either case, you may as well let the user decide what to do.

    Of course, for security reasons, you might want to immediately crash at any sign of damage. Nothing makes it easier for malware to invade your program than trying to cover up faults like buffer overflows.

    So let’s say that you’ve just spent the last few months typing your document in your word processor and decide to save it for the first time. But when you tell it where to save you accidentally give it the name of an existing folder, causing an error that the author of the program never anticipated.

    Would you rather that it crash that instant, or just ignore the error and let you think your document is saved? Of course it’s horrible to let the user think their document is saved when it’s not, but at least it gives them another chance. Otherwise your document is gone and there’s nothing you can do about it!

  22. Stuart says:

    @bd_ – partly true but not entirely. There are all kinds of checks that you might need to do that aren’t enforced by simple typesafety.

    The most obvious is null-checks, because .NET doesn’t support non-nullable types. A method that requires its parameters not be null is supposed to check each of them and throw an ArgumentNullException if it is.

    Then there are all kinds of semantic requirements like ‘this method accepts a Stream argument and the stream that’s passed in must not be closed’.

    In my own code I’m certainly MORE than fallible in this regard, but at least there’s a clear best practice that addresses both sides of the concern, in the .NET world. In a world of error codes it’s ‘EITHER do parameter validation OR crash’. In the .NET world you get to have your validation and eat your crash too.

  23. Lucas says:

    In terms of writing code that’s easy to debug, the obvious approach is to always check error codes.

  24. JamesCurran says:

    I always question redundant parameter checking.  Given the following:

    string MakeUpper(string str)

    {

      if (str == null)

        throw NullReferenceException();

       return str.ToUpper();

    }

    Now, if you were to remove the parameter check, and you passed a null, you will still get the exact same error (on the same line number)

  25. JamesCurran says:

    Also, this reminds me of a discussion I had on two different projects.  On both, the project manager said "the application will be running 24×7 — it cannot crash, so we must have a top-level try/catch around the entire thing"

    My respond – "No.  since it cannot to allowed to stop, any problem which might cause it to throw an exception must be found and fixed immediately — hence, no catch…."

  26. Gaspar says:

    @Marquess, Oh god please no.  I make typos and I don’t like being struck by lightning.  =)

    To me it sounds like a NIMBA (Not in my back yard) issue.  The people complaining now never had to deal with most of the issues caused by non-validation.  

    My personal preference has always been that for a USER the application should never crash, it should exit as gracefully as possible and hopefully save all needed state.

    Where is the benefit to the USER if a program crashes?  From the average USER perspective, they just lost their work and it is someones fault.

  27. Gaspar says:

    @JamesCurran 1:  That is a perfect example of bad error handling.

    @JamesCurran 2:  I think your solutions is what should be implemented most often.  I still think that in many cases you should have a top level try/catch, but only to allow for a graceful exit instead of a "Your program is done, hit OK." message box.

  28. porter says:

    A standard way of injecting foreign code is to pass invalid data and force a crash or overwrite the stack. If you don’t check arguments or return codes then don’t be surprised that your application gains publicity for having exposing a stupid exploit.

  29. Anonymous says:

    In a world where the callee shares the caller’s address space, it makes sense that a bad pointer would cause a crash.  But I think it’s wrong to compare this with certain parts of the OS. With a syscall, it makes more sense for the kernel to check your buffer and return an error if it’s not valid or points to a kernel-only page (for example syscalls that return EFAULT in the Unix world).  Similar reasoning ought to apply for things like file handles and other handles to kernel objects.

  30. Yuhong Bao says:

    A good example is IsBadxxxPtr, which was invented in Windows 3.1 for parameter validation. Now it is recommended that it should not be called. To be honest, part of it is because MS did a poor implementation when they ported it to Win32, the problems of which are well-known now, but guess what the other reason is?

  31. Wojciech Gebczyk says:

    (NR1 star) "The answer is actually simple: when running for your QA dept, it should crash immediately; when running for your customer, it should silently keep going."

    "it should silently keep going"… yes! It’s brililant! excellent!

    OS should record each application’s last minute of it’s live and in case of failure windo content should be discarded and media player should start playing recorded content in loop.

    That would be REAL and TRUE "silently keep going" ultimate solution. :>

    NR1 – Not Raymond One

  32. Sm says:

    I’d say one should understand where are borders of the subsystem. On the borders do parameter validation, inside get code right.

  33. steveg says:

    I think parameter validation is very good indeed — improves your application’s (or OS) security. Whether you implement that by error codes or exceptions (eg vaguely useful ones, not a generic UAE), who cares, grab your soapbox and head to the park and join the line next to the Mac vs Windows people.

    @Mark: To nitpick a little: IsBadPtr does work as advertised some of the time.

  34. microbe says:

    It’s not a simple question, as it’s really case by case.

    But the bottom line is, if a wrong parameter could affect the OS’s stability, then it should be validated, otherwise feel free to kill the application.

    I imagine Windows 3.x wasn’t protected enough from user errors (only 3.1 had protected mode, and probably not as isolated as now due to the DOS legacy), so validation was probably more important.

    So, both are right, depending on the situation.

  35. Chris says:

    I’m not sure anyone who doesn’t work on low-level OS frameworks really understands this issue.

    Both of the views Raymond (proxily) expresses are valid. The tradeoff is performance.

    Validation takes time, but it’s worth doing if the consequences of not doing it are severe enough. If you crashed in Windows 3.1 or other OSes of similar vintage, the entire system went down, and maybe trashed the disk too, which is a pretty severe consequence.

    If you crash on a modern OS, you’re taking down the application only, so it’s less valuable. The danger with validation is that apps begin to routinely pass crap into the APIs, and the developers may not realize it.

    Note that you have no choice but to validate incoming handles received from a different process (or handles transmitted from user to kernel). Anything less is a security problem.

  36. Jolyon Smith says:

    I’m glad someone mentioned the Guru Meditation.

    And yet also I wish they hadn’t so that I might have had the pleasure.  :)

  37. avek says:

    I think it’s hardly fair to compare UAEs of those days past and exceptions.

    Exceptions of today unwind stacks partially, form a types hierarchy with unrestricted subdivision, can contain arbitrary data in their instances which will travel from the error point to the error handling point automatically.

    UAE box was bad because it could do nothing of these features, not just because it crashed the program. It didn’t let partially undo work or analyze the failure in great detail. The only solution to local problem (memory access error) was through the global state (program termination), and that couldn’t be changed.

    So throwing exceptions on bad input in "modern" languages just isn’t the all-or-nothing it was before. Exceptions are not the same thing as crash&burn, instead they are configurable, in the range from the end of the world to minor nuisance. But still, the explicit parameters validation doesn’t go anywhere. Without it, it’s usually not possible to obtain meaningful exceptions.

    For instance, in JamesCurran’s example, the parameter check doesn’t really do anything. But if we change it to throw ArgumentNullException instead, the situation changes. Now we know that the caller of the function has given it something untasty, it’s not just some bug within the function itself. So some meaningful recovery can be done in the caller, or in the caller of caller, or somewhere else above, like obtaining argument value from some other source and calling the function again.

    The further up the stack from the call, though, the less meaningful for potential handler ArgumentNullException will become. Would be even better if it was some completely custom exception type MissingValueYzwException, not one of the predefined classes. The simplest way to get such a custom exception that I know of is a parameters check.

  38. Anonymous Coward says:

    Sm, I agree. The old problem wasn’t that validation wasn’t going on as such, the problem was that things weren’t validated properly before entering the kernel. And we still have that problem with us today, although to a much smaller extent. Half the time Windows Update tells me to install a patch, it is because somewhere a system function didn’t validate its parameters correctly, allowing applications to get higher privileges or crash the whole system, although in a cooperative multitasking environment that is already possible anyway. Still, checking parameter calls restricts the amount of damage you can do, even accidentally, greatly.

    Furthermore, I posit that the current exceptions are actually more like returning an error code than like a random crash, since today exceptions can be typed, can unwind the stack, can be caught so you can do something about it, can be checked so you know if you’re handling them correctly, and cannot be silently ignored like error return codes. On the whole I think life would be a lot more pleasant if the compiler would check if you check your return codes properly, since it’s way too easy to simply ignore the result of a function call. We all promise ourselves at new year not to do that, but you know how it is. You get an idea on how to solve something, you code up a first try, you test it a bit, you forget that you didn’t check the result properly, and presto: a potential bug leaked into production code.

    Simon, Jolyon: I would love it if the windows crash dump tool were restyled to look like the guru meditation.

  39. Stuart says:

    Coming from the .NET world, seems to me the only reason these two views are incompatible at all is the limitation of the technology of error codes (as opposed to exceptions). (If I remember rightly, Eric has strong views on error codes *in general* as opposed to exceptions, so I’m making no comment on the *overall* tradeoffs, just the ones applicable to this particular question)

    The problem with error codes is that the default action in the case of an inept programmer is that the code gets ignored. With exceptions, the default action in the case of an inept programmer is that the application blows up.

    So in a world which supports exceptions, the answer is "of course you should do parameter validation, *so that* you can make the application blow up when the parameters are wrong" – and incidentally pass along a helpful message in debug mode telling the programmer which parameter they passed was incorrect.

  40. Ben Voigt [Visual C++ MVP] says:

    False dichotomy (as Paul Parks has observed).

    Raymond’s post presumes that:

    (1) Use of an invalid parameter causes an immediate spectacular crash.

    (2) Bailing out with an error code causes the operation to continue and succeed.

    The disadvantage of #1 is that some problems might have been recoverable but weren’t recovered.  The disadvantage of #2 is that programmers might not become aware of the problem even though it occurs frequently during development.  Hence the disagreement on inclusion of error correction.

    But the basis for both sides of the argument is totally bogus, because both ignore the substantial class of errors that corrupt state which causes a failure later (and Murphy’s Law implies in seemingly unrelated code).

    Definitely the parameters should be validated and there should be entries made in an event log that cannot be cleared by any user-mode program (let it roll over).  And encourage teams to include "no validation failures in the event log" as a mandatory part of certification (perhaps as a Windows Logo requirements).

    Wait a sec… Windows already does all of that except the encouragement part.  It’s called a "checked" build.  Imagine that!

    In my opinion, public enemy #1 of Windows users is failure to use a checked build for WHQL and Windows Logo testing.

  41. I think the main thing is that the fact of an error is recorded *somewhere* that a dev can look at when he’s debugging. At my work they generally want us to just have extensive logs, which works ok, although it’s not my favorite approach.

    I tend to like validating parameters with assertions. This is especially helpful if you write your own assertion macros that include stack traces, and both the expressions and values being compared. That way something like ASSERT_EQUAL(x,y) can give you a lot of information about why it failed.

    The nice thing about assertions is they dovetail with unit testing. It’s like mixing your tests with the code itself.

  42. @Gabe: Your point is valid. That’s why I said "exit" rather than "crash," although I do contend that in some situations the only thing an app should do is crash. It’s better, however, to exit as gracefully as possible, with the caveat that data may be corrupted. There are examples of software that cannot fail (life support, flight control, etc.), but even then the software should only continue to function while causing lights to flash and klaxons to blare. I mainly object to the notion of *silent* error recovery.

  43. Lawrence says:

    I make the distinction between ‘my own app’ and ‘anything else’.

    If I’m writing a function in my app and it’s only my app that calls it, I can choose whether I want parameter validation (typically I will use Assertions – so they die in debug but disappear in release)

    If I’m writing a function that is either called by or calls anything third party, I will validate everything, always. No third party code is going to crash MY APP if I can help it.

  44. porter says:

    > No third party code is going to crash MY APP if I can help it.

    Hang on, I’m just going to call CreateRemoteThread()…..

  45. Dave says:

    To nitpick a little: IsBadPtr does work as advertised some of the time.

    Despite the flak it gets, I really like IsBadPtr(), I always have these enabled in my debug builds, they’re great for catching mistakes made during code changes.

  46. Mark says:

    Yuhong Bao: because IsBadxxxPtr can *never* work.  Implementation details are irrelevant.

    Some things have to be crash-on-access, like invalid pages.  Some things have no dependable way to detect an error, like PostMessage.  Win32’s ABI by definition doesn’t "crash and burn", which is what makes programming it in Assembly as straightforward as VB.

    However, speaking of conflating errors with validation, isn’t sending a message to a closed window an error?  It’s not something you can validate beforehand…

  47. Anon says:

    What I want to know is that why the default behaviour in .NET is to ask the user whether to continue in the case of an unhandled exception. WTF??

    Parameters should always be validated, because sometimes an invalid parameter WON’T cause a crash.

    Whether to crash or not should be decided on a case-by-case basis. If I try to delete a file that isn’t there, it is hard to see why the app should halt. If I get a combination of parameters that should never happen (ie an undefined state), it is hard to see why the app should be allowed to continue.

  48. mordy says:

    if your function is likely to be used by idiots, then you have to validate incoming parameters or you too are an idiot…

  49. sergey says:

    Of course both opinions are correct. You should add parameter check in kernel functions of win31 because when win31 application fails in kernel function some important memory will be overwritten or something… and entire OS becomes unusable. User angry.

    With separate memory for processes and all that overall security in winnt-based systems it is of course better to fail and display flashy dialog box. The only application will fail. The OS and other applications will continue to work.

  50. RobO (It's not you, it's me) says:

    Taken collectively, these posts simply prove the title of the blog entry.

  51. Mark says:

    Killer{R}:

    This is not utopia, it’s an essay.  Don’t you have a blog?

  52. Alexandre Grigoriev says:

    @porter:

    “Hang on, I’m just going to call

    CreateRemoteThread()…..”

    Add “Deny PROCESS_VM_READ|PROCESS_VM_WRITE|PROCESS_CREATE_THREAD” for OWNER_RIGHTS SID. Works only in Windows 7/Vista SP2+, though…

    [I see your wall and raise you a ladder: WRITE_DAC. -Raymond]
  53. Killer{R} says:

    Any input parameter can take some set of values. Lets split that set into 2 subsets – ‘valid’ and ‘invalid’. ‘valid’ subset contains definetly more than one value, thats why he have input parameter, but not hardcoded constant.

    But the border between that ‘valid’ and ‘invalid’ sets is not clean and very fuzzy. Usual parameters ‘validation’ means that you check that passed parameter will not cause underlying functional that you’re using to terminate your application. For exampe if your function accepts some pointer you probably will check it using IsBad***Ptr so you can ensire that accessing it will not cause AV, because underlying functional – OS memory managment will raise SEH or segmentation fault if pointer invalid from its point of view.

    But stop. It means you’re checking that passed parameter is not valid for logic of underlying functional, but this check will not prevent using of parameter that acceptable by underlying functional, but not acceptable by YOU APPLICATION logic. It means that not all ‘valid’ subset you accepted is really ‘valid’ for ‘business’ logic you implemented inside your function or/and in you whole app. For example it can be valid pointer, but it points to some memory that should be modified by current logic execution flow. Or you specified some HBRUSH handle to close, but due to some logic error this HBRUSH handle value still will be used by someone else.

    You can tell ‘yeh, user will see some rendering bug and thats all’, but now lets go to question what is the purpose of your program to do?

    If parameter can be ‘invalid’ from underlying API point of view, then this means that parameter under some curcomstances can take some random values that can be ‘valid’ for your validator, but not valid for real application business logic.

    Lets return to wrong HBRUSH. If you close ‘valid’ HBRUSH that should be really closed now then you will get rendering bug. If this is a just some winamp visualisation plugin – this really doesn’t matter.

    But you you’re writing some graphical editor this means that it can render wrong image. Is this a good price for application not crashed previously? Or is it a good price for billing application to overpay some 10^n $$ to client account instead of showing fatal error some time ago, when ‘invalid’ paramter was passed, but silently ‘filtered out’ by your validator?

    Of cause, you may switch application to ‘debug’ mode and test it enough with you QA. But user will not work on you QA machines and system, and there can remain bugs that you didn’t find.

    So I think the simple answer for ‘to validate or not to validate?’ question is what will give user much more problems in result of invalid parameter – if application crash is more ‘expensive’ than a silent logic error – then you may hide you exceptions, and work until system will actually be powered off. But if logic error can potentially cause much more ‘damage’ than simple crash – then you should crash. And get a report about a bug from user and fix it.

    Yep. This is utopia, i know ;(

  54. Dan says:

    To go off in another direction, error handling in Erlang always sounded neat. An Erlang application will typically contain lots of lightweight threads (which Erlang calls processes, because they don’t share state). Processes might crash, but there are other processes that watch them. These watchdog processes decide what to do – they might restart the failed process, they might try something different, or they might bring down the whole subsystem. I think by default, linked processes bring each other down. What that has to do with parameter validation, I don’t know, but I think it’s cool.

  55. Spock says:

    @Gabe

    Your comments indicate what I believe to be an all too common, but erroneous assumption. That is; the worst thing a program can do is crash. This is not the case, The worst thing a program can do is continue to run with its internal state corrupted. I work in SCADA software, we call this kind of situation the "pour molten steel on someone’s head" scenario. If I have written a function and it is called with an invalid parameter, how can I safely continue? The call indicates a programming error, and the only safe recourse is to abort at this point. Obviously, this issue is not as important in every industry, but I’d still rather Word crashed on me than corrupted my data after running in an invalid state.

  56. Gabe says:

    Spock: Why not let the human decide if the program should crash or not? If I have lots of unsaved changes, I want Word to have a chance to save them before dying. On the other hand, if I have a large document with only minor unsaved edits, I might prefer it crash than corrupt my huge document.

    As for SCADA software, I work in the electric power industry. We write our software to continue after exceptional conditions, retry, or restart if it has to. The very last thing we want to do is crash the system!

    Consider the case of the Bright Field, a 69,000 ton freighter that plowed into a New Orleans mall 13 years ago. While navigating the Mississippi river, an oil pump on the engine failed, causing the engine’s computer to cut power. With the sudden loss of power, the ship couldn’t maneuver, and kept going where it was headed — right towards a moored casino boat, a hotel, and a mall.

    In this case, a computer made the ship literally crash to avoid corrupting the engine due to it running in an invalid state. I’m willing to bet that there are at least several hundred people who wish the computer gave the pilot a dialog that said "Bad engine state — crash or keep running with bad state?"

  57. AdamWu says:

    Raymond, I don’t agree. From what you described, parameter validation isn’t out of fashion at all — it is just the bar of error checking and handling is raising.

    Let me put it this way: making system calls are like entering some sort of secured area.

    1. With no parameter validation, it is like to let everyone in, even if they carry a gun or bomb.
    2. With basic parameter validation, a security screening is performed and persons with dangerous items are turned away.

    3. But, that is not enough. For the bad guys, we want not only deny them of entrance, but also trigger "big red alarms", catch them and put them in jail.

    To achieve the latter, we need parameter validation, and a good one.

    For example, a program may close a handle, and later tries to use it again. To ensure this behavior is caught, the kernel should not only validate input handle, but also avoid reusing handles. Otherwise, if other files are opened in between and got the recycled handle, the program may get pass the handle validation, and corrupt data files.

  58. Spock says:

    @Gabe

    I deliberately simplified the situation to demonstrate the point. SCADA software needs 100% up-time, so it is necessary to have solutions to the inevitable programming bug. In our case this achieved by handling exceptions at defined component boundaries (an entire module is unplugged and re-plugged in essence), and via redundancy. Redundancy is the primary mechanism by which the "cannot continue safely" situation is handled. The entire process is restarted while the standby server/s take over. This has the added advantage of handling hardware failure as well. In the case of Word, the situation has already been handled by having a hidden incremental backup that is stored for this very situation. When the application crashes I can simply restart and "recover". In this instance it is much better to crash then to carry on risking data corruption of the entire file. Redundancy would have saved your freighter as well, and is standard practice in flight control systems.

  59. Spock says:

    @Gabe

    Actually, I missed the point of your freighter example. The computer system did not crash, leaving the crew unable to operate the boat; an engineer designed the system to shut down the engine in the advent of a pump failure. That there was presumably no "operator override" available was an engineering oversite. I do get the allusion to the fact that shutting down a system (crashing) may cause greater harm then continuing to run in a potentially dangerous state, but as I mentioned in my previous post, redundancy is designed to solve this problem. In the example of the freighter, they would have needed a redundant engine (may not be practical), but in software systems an extra server is not particularly onerous. Allowing the user the choice to continue has potential benefits if we assume that the software is always "manned" (i.e., not so good for server processes etc). I think we may have strayed a long way of topic here. My only point was to say that crashing is not always the worst thing, and try{}catch(…){} is your worst enemy.

Comments are closed.