Erroneous assumptions

Has anyone noticed that all of the Win32 documentation has something like this for each API:

Return Values

If the function succeeds, the return value is NO_ERROR.

If the function fails, the return value is one of the following error codes.

Value

Meaning

ERROR_INVALID_PARAMETER

Something about the ERROR_INVALID_PARAMETER error

Other

A system error code defined in WinError.h.

 I can’t think of the number of people who have complained about the last line in the table.  Why on Earth can’t Microsoft bother to document the errors returned from this API anyway?  Are they being stupid or something?

Actually the answer’s somewhat simpler.  We’ve been burned by doing this in the past, and we’re not willing to get burned again.

One of the pieces of memorabilia I have on my desk is a copy of the MS-DOS 2.0 reference manual (published by Microsoft in 1984).  On page 1-143, near the description of the Create Handle API (the MS-DOS equivilant of open()).  It indicates that the API has the following return values:

                        Carry set:
                        AX
                                    3 = Path not found
                                    4 = Too many open files
                                    5 = Access denied

That’s it.  Microsofts (and IBMs) documentation specified the complete set of errors returned by all the DOS APIs.  We told all our customers that the ONLY error codes that the INT 21, 0x3DH API would return are errors 3, 4, and 5.  And you know what?  Our customers believed us, and they wrote their apps with that assumption.

Well, along came DOS 3.1, which added support for networking.  And with that came a whole host of ways for the APIs to fail.  Things like “Network path not found” (the file is on a server and the server’s down).  Or “Sharing Violation” (someone else has the file open and they’re not letting you access the file).

Originally, the DOS developers just returned the new error codes, thinking that most app authors were smart enough to realize that there might be other error codes returned from the APIs in the future.  And we started testing.

And we discovered just how wrong that assumption was.  Apps crashed left and right.  EVERYONE’S apps crashed.  Why?  Because Microsoft and IBM had told them that they would never see any errors other than 3, 4 or 5.  And since RAM was at an absolute premium on these machines, they didn’t waste valuable code space on useless features like error checking for errors that could never ever be generated.  When your app is going to be running on a machine with 64K of RAM, then defensive programming becomes an optional feature.

So Microsoft invented the DOS error mapping table.  It defined a mapping from all the new error codes into the DOS 2.0 set of error codes.  To find the REAL error code, you called the “Get Extended Error” API which returned you the “real” reason for the failure.

This table still exists in the Longhorn source tree (just for grins, I looked it up the other day).  It’s in the NTVDM logic, so it’s not a part of any of the Win32 logic, but the bottom line is that it’s still there.  And it’s likely that we’ll never be able to get rid of it (at a minimum, we’re not going to be able to get rid of it until we get rid of the 16 bit DOS support, which isn’t gonna happen anytime soon).

And ever since then, Microsoft has refused to completely document the error codes from its APIs.  By not documenting the complete set of error codes possible, it moves the onus of handling new error codes from Microsoft to the application author, where it belongs.