The compatibility constraints of error codes, episode 2


A customer reported an incompatibility in Windows 7: If A: is a floppy drive and they call Load­Library("A:\\foo.dll") and there is no disk in the drive, the Load­Library call fails with the error ERROR_NOT_READY. Previous versions of Windows failed with the error ERROR_MOD_NOT_FOUND.

Both error codes are reasonable responses to the situation. "The module couldn't be found because the drive is not ready." Programs should treat a failed Load­Library as a failed library load and shouldn't be sensitive to the precise reason for the error. (They can display a more specific error to the user based on the error code, but overall program logic shouldn't depend on the error code.)

Fortunately, the customer discovered this discrepancy during their pre-release testing and were able to accommodate this change in their program before ever releasing it. A sigh of relief from the application compatibility team.

Episode 1.

Comments (22)
  1. John says:

    <i>program logic shouldn't depend on the error code</i>

    I agree with this in theory, but in practice some errors can be handled more gracefully than others (obviously this depends on the nature of the program).

    [You're right. I didn't phrase that right. Better would be to say that program logic should be robust to error codes. -Raymond]
  2. WndSks says:

    I'm not going to complain about the lack of documented GLE values per function since I know that it is a hard problem and you should generally just bail on !success (Except for LogonUser()) but it seems unlikely to me that some floppy (removable media?) code was added to the loader. I assume not ready is never going to happen for a fixed disk so was the loader changed or some code deeper down in the kernel?

  3. Miles Archer says:

    What's a floppy drive?

    (kidding)

  4. Alex says:

    <i>program logic shouldn't depend on the error code</i>

    Example: an "access denied" error for CreateFile can bring up dialog for UAC elevation and retry the operation.

  5. @ Mikes Archer says:

    > What's a floppy drive?

    The feeling you perceive when you see a pretty not-enough-dressed girl but you are too old to raise an exception.

    (don't know if this will ever pass Raymond moderation though :))

  6. @WinSks

    I think the reason why they don't document the error codes really comes down to preventing compatability constraints rather than it being a hard problem. If the list of errors is fully documented, then people will depend on that list, of course if the list changes then all of a sudden we can have an error no longer being caught, or an error can slightly change meaning. The documentation usually points out the important errors, but that is about it. To be honest, I prefer it this way.

  7. Joshua says:

    [You're right. I didn't phrase that right. Better would be to say that program logic should be robust to error codes. -Raymond]

    It turns out that create file failing because the file already exists is significantly different from other cases. Otherwise, it's very hard to make distributed transactional systems on top of filesystems.

  8. jmthomas says:

    The Windows 7 people got it right, but as usual, compatibility trumps almost everything.

    Perhaps the original designers justified "not found" vs "not ready" by assuming the OS would give the user a chance to recover from a "door open" error before the program ever got the return code.  Then "not found" would truly be "not found".  

    If so, there developed a right-hand left-hand problem and something fell through the cracks.  When the program started getting the one code to cover both conditions, it had to add the "is the door still open" logic to address a "not found" error.

    (This problem may be been present from the beginning of time when two design groups failed to synchronize, or it may have appeared later when to OS decided to degrade the importance of floppy drives because hard drives had become ubiquitous.)

    Much better to divide the situation into 2 parts so they can be handled more easily.  Trying to discover is the call failed because the door is open or if the wrong diskette was inserted adds significant effort to the program.  Kudos for the Win7 people trying make life easier for applications!

    (As one who wrote operating system software for mainframes, we had axioms "there can never be too much information in an error code" and "group codes for related reasons and sources together to allow quick filtering".  So much of what we learned to hard way never made it into the heads of PC developers.)

  9. 640k says:

    @HiTech: "there can never be too much information in an error code"

    Then you end up with:

    1. Error messages which scares users away

    2. Over verbose logfiles which fill up the disk

  10. Joshua says:

    @640k: No, it shouldn't. Certain Filesystem API calls are supposed to provide the A and D parts. A developer who cannot construct a simple version of C and I from A deserves to get fired.

  11. Anonymous Coward Anonymous Coward says:

    This highlights (again) that the failure modes of a function are just as much part of the interface as its parameters and return value. The Java designers got it exactly right.

    [If a function can fail in more than one way, and both failures apply, which one do you report? Does Java specify that if, say, Access Denied and Invalid Parameter both apply (say because you don't have access to the first parameter, and the second parameter is invalid), then one or the other must be raised in preference to the other? (Honest question.) -Raymond]
  12. Malcolm says:

    VMS only ever displays a max of three underlying layers.

    And it was very handy in a way, since you were getting the error code each subsystem returned.

    $ open /read infile foo.bar

    %DCL-E-OPENIN, error opening file FOO.BAR

    -RMS-E-FNF, file not found

    -SYSTEM-W-NOSUCHFILE, no such file

    $

    If you wanted something more like English, you could turn off the Facility, Severity and Identity, and you get the perhaps less intimidating to the end user:

    $ open /read infile foo.bar

    Error opening file FOO.BAR

    File not found

    No such file

    $

    I still work with VMS, but not very regularly these days ;)

  13. 640k says:

    It *should* be hard to layer ACID atop of non-ACID.

  14. dave says:

    >I think the reason why they don't document the error codes

    >really comes down to preventing compatability constraints

    >rather than it being a hard problem.

    If I write a function that sits on top of any Windows API, and which passes the same error code back to my caller, all I can tell you about what it returns is "errors I explicitly coded, plus anything the underlying OS returns, including any lower layer components that may be invented tomorrow".  And so recursively down the stack.

    Ultimately, if you plug in a new device, that has the potential to change the error returns from my code.

    The alternative, of course, is that I *don't* return the underlying error code directly.  If I just return my own error code and discard what I got from the lower layers, that's hiding potentially useful information. If I return my own error code and also what I got from the lower layers, that gets unmanageable after two or three layers (been there, done that: VAX/VMS).

    %MYPROG-E-NOPE, cannot open file FOO.BAR

    -SOMELIB-E-RMSIO, error from RMS

    -OTHERLIB-E-NOSUCH, no such file

    -RMS-E-FNF, file not found

    -SYSTEM-E-TIMEOUT, timeout

    So, I think "hard problem" is correct.

  15. Midhat says:

    WHY is there a floppy drive and windows 7 on the same computer?????

  16. @Midhat

    My mother's computer has one, and it is running Windows 7.

    If the hardware is older and the OS was upgraded, then it may be less likely than you think that a computer running Windows 7 has one.

    Then again, it doesn't get used for more than just occasionally making funny noises.

  17. @Midhat

    My mother's computer has one, and it is running Windows 7.

    If the hardware is older and the OS was upgraded, then it may be less likely than you think that a computer running Windows 7 has one.

    Then again, it doesn't get used for more than just occasionally making funny noises.

  18. Joshua says:

    I've seen what icon Windows Vista uses for 5¼'' floppy. It looks identical to the Win95 version (and probably is).

  19. alegr1 says:

    WHY is there a floppy drive and windows 7 on the same computer?????

    A better question is why "File Save" icons still use the image of a floppy. Mictosoft, beinf famous for running usability tests, should have noticed that the young users would have no idea what that means.

  20. Michael Grier [MSFT] says:

    This is yet another smallish difficult problem to solve.  My personal take is the following:

    Design the API so that non-success results are not errors/exceptional.  Then leave errors as clearly being undocumented and not having backwards compatibility constraints.

    This is difficult.  I did it for a family of internal APIs and while it was very successful, it still raises eyebrows.  People don't get why you don't just check for ERROR_FILE_NOT_FOUND or catch the FileNotFound exception.

    Part of the difficulty is that if there are multiple such non-success results, you need to differentiate them using some kind of codes/flags which ends up looking a lot like checking for specific error codes.

    Searching an in-memory collection is a great case where returning NULL is a better result than returning ERROR_FILE_NOT_FOUND or throwing an exception.

    One unmentioned difficulty here is that while CreateFileW() may return ERROR_FILE_NOT_FOUND in the case that the named directory does exist but the instance file does not, it's actually not as trivially guaranteed that that is the only case where ERROR_FILE_NOT_FOUND is returned.  From my knowledge of the source code I'm not aware of any other cases but you could imagine a filter driver or some using some clever technique to hijack the API (for good cause mind you!  These kinds of situations often start with good intentions…) but they call LoadLibrary() perhaps and next thing you know you're getting ERROR_FILE_NOT_FOUND for some reason other than that the file isn't present in the directory.

    Given the movement towards "developer productivity", I am curious if such design issues will ever be addressed in any future computing platform and then whether this provides some kind of quantum limit to the correctness we can achieve if it is not.  It's hard to imagine say 50 years from now deciding to "fix" this issue throughout the gobs and gobs of legacy code we'll have.

  21. Myria says:

    Is it safe to rely upon APIs returning error codes like ERROR_INSUFFICIENT_BUFFER and ERROR_MORE_DATA as a means of detecting the size of allocation you need to pass?

    [Does the documentation say that these error codes can be used for these specific purposes? -Raymond]
  22. Jeroen Frijters says:

    [If a function can fail in more than one way, and both failures apply, which one do you report? Does Java specify that if, say, Access Denied and Invalid Parameter both apply (say because you don't have access to the first parameter, and the second parameter is invalid), then one or the other must be raised in preference to the other? (Honest question.) -Raymond]

    The documentation isn't always great, but they have something called the Technology Compatibility Kit (TCK) that you have to pass to be able to be call your implementation Java ™ and it does enforce these error ordering issues.

Comments are closed.