The default error mode (SetErrorMode) is not zero


A customer put the following code at the start of their program:

// If this assertion fires, then somebody else changed the error mode
// and I just overwrote it with my error mode.
ASSERT(SetErrorMode(SEM_FAILCRITICALERRORS) == 0);

The customer wanted to know whether it was a valid assumption that the initial error mode for a process is zero.

No it is not, and this is called out in the documentation for Set­Error­Mode:

Remarks

Each process has an associated error mode that indicates to the system how the application is going to respond to serious errors. A child process inherits the error mode of its parent process.

The assumption that the initial error mode is zero is therefore false.

There's another error in the above code: The call to Set­Error­Mode is placed inside an assertion. This means that in the retail build, the call disappears. The debug build has the error mode set to SEM_FAIL­CRITICAL­ERRORS, but the retail build has the default error mode. They are changing the semantics in the debug build, and are headed down the slippery slope that leads to them being forced to deploy the debug version of the program into production because that's the only build that works.

Unfortunately, they may have already reached that point, because the customer asked, "Is it possible for the user to set the default error code to something other than zero, in which case this assertion would crash the client?" (Emphasis mine.)

Bonus chatter: Note that you can override error mode inheritance by passing the CREATE_DEFAULT_ERROR_MODE flag to the Create­Process function.

Comments (23)
  1. pete.d says:

    "…being forced to deploy the debug version of the program into production because that's the only build that works.

    Unfortunately, they may have already reached that point…"

    Huh? I think it's the other way around. This appears to be one of those rare moments where two wrongs make a right. At least the way I define "works", which is that "the program will run without crashing".

    Only the debug code has the error-code checking that the user (code author) desired, but that error-code checking is flawed and would crash the program if present. So it's the retail build that has a better chance of doing the right thing.

    (Granted, we don't know what the retail build actually does based on the error code, but presumably it reacts less harshly to a non-zero state than just crashing the program outright).

  2. Cullen says:

    @pete.d The way the customer asked the question makes it clear that they have experienced a 'crash' in production, caused by this assert firing.  Therefore, they must have deployed the debug code to a client.

  3. jk. says:

    Or they have their own ASSERT macro that does the same thing in release as debug

  4. Adam Rosenfield says:

    It would be helpful if the documentation on the SEM_NOOPENFILEERRORBOX called out that that only refers to loading DLLs with LoadLibrary().  People would get quite upset if every call to CreateFile() on a non-existent file resulted in a message box instead of it failing with ERROR_FILE_NOT_FOUND.

  5. mikeb says:

    The Subversion project made a policy decision that assertions would be left enabled in release builds. The thinking was that if an assertion failed, the system was in an undefined state so any further processing might result in data loss.  Better to fail fast, even in release builds.

  6. Brian_EE says:

    @mikeb: "The Subversion project made a policy decision that assertions would be left enabled in release builds."

    To me, this would be indicative of poor planning and/or sub-standard unit testing during development. If you feel that the assertions are still needed, then obviously you didn't test your code thoroughly enough under all use-cases.

  7. Evan says:

    @Brian_EE: "To me, this would be indicative of poor planning and/or sub-standard unit testing during development. If you feel that the assertions are still needed, then obviously you didn't test your code thoroughly enough under all use-cases."

    Or it's an acknowledgement that from a realistic point of view, "test[ing] your code thoroughly enough under all use-cases" is literally impossible. (Or, less flippiantly, that the Subversion team's standard for what "enough" means is stricter than yours.)

  8. Zan Lynx' says:

    @mikeb: Not everyone runs with ECC RAM and some people overclock their CPUs. So "impossible" program states do happen.

    A few extra verification checks are a good idea for most programs.

  9. Joshua says:

    I've been known to ship with assertions on, even in release compiles. Well done asserts don't slow things down that much.

    One large product we make, we decided to ship the debug build (linked against release libraries) because the release build fails with a send to microsoft error on the database server keeling over while the debug build shows an error box with the unhandled exception in it and lets the user continue (after fixing the database server).

    @mikeab: Exactly. Especially if the assertions are being used to check sanity of the backing store. Hitting an assert() on corrupting backing store is probably better than corrupting it worse.

    @Brian_EE: (insert repetition of backing store)

  10. Myria says:

    I wish SetErrorMode had a way to pass heap corruption and other errors to our exception handler, rather than __fastfail.  We have our own error reporter program that we use to send crash logs to us; it classifies them in a manner that's useful to us.  We've found that more and more types of crashes in our program are going to Windows Error Reporter instead of us.  This is due to things like __fastfail and the SetUnhandledExceptionFilter(NULL) call in __report_gsfailure.

    We understand the security reason behind not going through Win32 exception handling, particularly on x86-32, but if we could set one of our programs to handle crashes rather than Windows Error Reporter, it'd be awesome.  We have a SysDev account, but SysDev is a very slow web site that takes a week or more to show crashes for a new release.

    For personal use, I also wish there were a way to enable alignment faults on x86-32/64, as it assists in emulator performance when emulating CPUs that fault on misaligned accesses.  There is an RFLAGS flag for alignment checking in ring 3, but the documentation for SetErrorMode seems to imply that such faults aren't sent to the application level.

    We often have beta versions of our program released to customers with assertions enabled.

  11. floyd says:

    @pete.d You are missing a vital detail here: The "error-code checking" as you call it does in fact *set* the error mode. In a release build they are running with a random error mode, in contrast to the debug build. Hence, only the debug build runs in a well defined environment.

    .f

  12. Joshua says:

    @Myria: I know. The fault tolerant heap bit me. I followed the test what you ship mantra, and I had a double free bug. It didn't break until deployed on a Windows 2003 server. I wish there was a good way to tap the fault tolerant checks on release build so it fails fast. I ended up wrapping malloc and re-implementing the sentinel (it's not hard to do). For me, the overrun barrier is 0x1C 0x0D which is not likely to occur otherwise when processing 7 bit ASCII.

  13. Yuriy Gettya says:

    @Joshua: you can always disable FTH for the process: blogs.msdn.com/…/10260334.aspx

  14. cheong00 says:

    To me, I'd think it's intentional to place the ASSERT() there to make sure noone is releasing debug version chain of program to production. That main program should also have a call to set it to 0 if debug.

    I think it this way because he doesn't set the error mode to 0 in this line of code.

    Btw, it's right that he should probably move the call out the ASSERT and instead just assert the value returned to allow it set the mode in production code too. But hey, we all have moment's that our mind don't work clearly and mistaken one way of code is equivalent of another. :P

  15. Brian_EE says:

    @Evan: "Or it's an acknowledgement that from a realistic point of view, "test[ing] your code thoroughly enough under all use-cases" is literally impossible."

    Come work on equipment that people's lives depend on, and you'll have a different point of view of thorough unit testing of software. Line by line code reviews, verification testing with the debugger where you inject error cases, no dead code, etc etc.

  16. Jon says:

    @Brian

    I think the key phrase is "realistic point of view." In the equipment you're talking about, those tests are critical. In a massive open-source project that doesn't fly planes, fire a missile or control a pacemaker, it would be unrealistic to expect that level of testing, and it would be impossible to implement without losing the vast majority of your developers.

  17. dave says:

    If you're writing in an environment where it's that important, you probably should not use Windows, which is the subject of this blog.

    No slight on Windows; it's just that doing a line-by-line review of your code, and ignoring the several-million-lines of not-your-code, seems to be in contradiction.

  18. Myria says:

    @Brian_EE: Certainly, in a high-risk environment, many rules are different.  The software I work on won't kill anyone if it crashes – it merely annoys the customer.  It is far more important for us to make a compelling product that mostly works rather than a simplistic product that works perfectly.  We know it's a tradeoff, and have calibrated according to our particular situation.

  19. mikeb says:

    @Brian_EE:

    One key testing advantage you get when working on equipment that people's lives depend on is that you get to test on the precise equipment and configurations that the software will run on. That's generally not possible with software that's delivered to be run on whatever machine the user installs it on.

  20. Nicholas says:

    I've never understood why so many programmers have this funny idea that "no bugs in my code, it's perfect!" and managers think "no bugs in our product, it's perfect!".  When I hear about a problem in some part of a system that I've recently worked on, I immediately start thinking about ways that it might have been my fault.  Even if you're fresh out of school, past experience should have given you plenty of examples of times you've written flawed code.

    But then I get called a faithless pessimist when I suggest we verify something on our end before we blame the vendor/customer :(

  21. 640k says:

    If I had a nickel for every time I heard this contradiction:

    "my code is flawless, therefore internal errors in it should not crash the app".

    You have to understand that *your* code probably is the most buggiest in the whole ecosystem, although with any type of integration, always "trust but verify".

  22. dave says:

    The reason I leave the assertions in the shipping code is that my code *is* flawless:  assert(something) is my continued claim of that lack of flaws, and the non-occurrence of assertion-failure terminations demonstrates the truth of my claim.

    Semi-joking, of course, but people lose sight of what 'assert' literally means: the programmer claims that a certain condition must necessarily hold, otherwise his code is defective.

  23. j b says:

    @Nicolas,

    Those claiming 'no bugs' are often right – _provided_ that the program is used the way it was intended to be used. (End user level) "testers" temd to be useless to the project after half a year – that is the maximum time required to (unknowingly) learn to use the sofware as intended, not not as intended.

    The best end user level test is the "five year old test": Put a five year old at the keyboard, tell him to play around any way he wants to, and for every way he can make the program crash, and show daddy how he did it, he will get an ice cream cone… One of my fellow workers actually used this technique on a regular basis. (His sons were somewhat older, like ten years, but playing with that piece of software required somewhat higher competence.) Several mornings, he came to work presenting error situations discovered by his sons. And the sons did get their awards (which was somewhat more than an ice cream cone.)

    Bugs are like things that disappear: The reason why things disappear and cannot be found is that people search in places where those things are NOT. If people would only search in places where the things ARE, then the things wouldn't get lost…

Comments are closed.