More undocumented behavior and the people who rely on it: Output buffers


For functions that return data, the contents of the output buffer if the function fails are typically left unspecified. If the function fails, callers should assume nothing about the contents.

But that doesn’t stop them from assuming it anyway.

I was reminded of this topic after reading Michael Kaplan’s story of one customer who wanted the output buffer contents to be defined even on failure. The reason the buffer is left untouched is because many programs assume that the buffer is unchanged on failure, even though there is no documentation supporting this behavior.

Here’s one example of code I’ve seen (reconstructed) that relies on the output buffer being left unchanged:

HKEY hk = hkFallback;
RegOpenKeyEx(..., &hk);
RegQueryValue(hk, ...);
if (hk != hkFallback) RegCloseKey(hk);

This code fragment starts out with a fallback key then tries to open a “better” key, assuming that if the open fails, the contents of the hk variable will be left unchanged and therefore will continue to have the original fallback value. This behavior is not guaranteed by the specification for the RegOpenKeyEx function, but that doesn’t stop people from relying on it anyway.

Here’s another example from actual shipping code. Observe that the CRegistry::Restore method is documented as “If the specified key does not exist, the value of ‘Value’ is unchanged.” (Let’s ignore for now that the documentation uses registry terminology incorrectly; the parameter specified is a value name, not a key name.) If you look at what the code actually does, it loads the buffer with the original value of “Value”, then calls the RegQueryValueEx function twice and ignores the return value both times! The real work happens in the CRegistry::RestoreDWORD function. At the first call, observe that it initializes the type variable, then calls the RegQueryValueEx function and assumes that it does not modify the &type parameter on failure. Next, it calls the RegQueryValueEx function a second time, this time assuming that the output buffer &Value remains unchanged in the event of failure, because that’s what CRegistry::Restore expects.

I don’t mean to pick on that code sample. It was merely a convenient example of the sorts of abuses that Win32 needs to sustain on a regular basis for the sake of compatibility. Because, after all, people buy computers in order to run programs on them.

One significant exception to the “output buffers are undefined on failure” rule is output buffers returned by COM interface methods. COM rules are that output buffers are always initialized, even on failure. This is necessary to ensure that the marshaller doesn’t crash. For example, the last parameter to the IUnknown::QueryInterface method must be set to NULL on failure.

Comments (42)
  1. asdf says:

    I looked at the documentation for these two functions. RegQueryValueEx only says the buffer is in an undefined state if the buffer isn’t large enough. For all other cases I’d assume the buffer/key are good if the return value is ok or untouched if the return value is bad. I’d imagine other programmers would think the same thing since all other sane documentation works that way.

  2. oldnewthing says:

    The "output buffers are guaranteed unmodified on failure" model means that all output parameters have to be double-buffered, which can be quite expensive for large buffers and tedious even if the buffers are small. For example, a function that computes a string can’t build the string directly into the output buffer; it has to build it in a separate buffer, and then only when everything is successful (including determining whether the buffer is big enough) copy the result into the output buffer. Do you write your functions this way?

  3. pmuhC says:

    Be paranoid about the returns from any call if the documentation for the call is not explicit about every eventuality.

  4. asdf says:

    It depends on the function. But exact cases where the buffer is undefined does get documented as I would expect MSDN to do. Take the RegQueryValueEx function, it explicitly says the buffer is undefined when the ERROR_MORE_DATA return code is returned. That to me implies the function either fills the buffer correctly or doesn’t touch it in all other cases.

  5. Reinder says:

    "For example, a function that computes a string can’t build the string directly into the output buffer; it has to build it in a separate buffer, and then only when everything is successful (including determining whether the buffer is big enough) copy the result into the output buffer."

    The ideal order of thing is:

    1. determine whether you can deliver all requested data

    2. If so, start producing it. Because of step 1, this can happen directly into the output buffer.

    "Do you write your functions this way?"

    If robustness of an API has enough priority: yes. Doing this correctly also affects ease of use for the programmer, and sometimes for the user. Overall, the investment in getting a function really robust may pay out.

    As an extreme example, a function to copy a file should make every effort to ensure that it can perform the requested task before copying the first byte.

  6. oldnewthing says:

    "Determine whether you can deliver all requested data". This is the hard part. Often you can’t tell until you actually try, and then you’re stuck having to remember the result of the attempt somewhere. Why not store it in the output buffer since that’s where you were going to put it anyway?

    For example, SHGetFileInfo receives flags specifying what items the caller wishes to retrieve. If the caller asks for two things (e.g., the icon and the display name), do you get the icon first and save it "off to the side" while you get the second – and only if both things work do you put the answer in the output buffer? You can’t determine whether you can get the icon until you actually get it, because there’s always the possibility of an "out of memory" error just as you call ExtractIcon.

    (File copying is a change of topic. I’m talking about output buffers on failure not changes to storage.)

  7. Ben Cooke says:

    I often catch myself getting the registry’s key/value terminology wrong even though I know better. The problem is that I tend to use the word "key" to describe the left-hand part of a (key, value) pair. The value (obviously) is the value.

    Unfortunately the registry is not a map structure. It’s a tree of map structures. For some reason someone decided that each of these map structures should be called a key. So what do we call the left hand part of the (key, value) pair? The best name I’ve come up with so far is "the value name". It would have been far better to call the map structures "folders", especially since the standard registry editor visually represents them that way.

    I do have a theory about why it ended up this way, though. Windows 3.1 (and probably earlier versions; I was still in Amiga land at that point) had a much more stunted registry whose purpose was to store the file extension mappings and presumably also the OLE stuff. From what I remember that old registry only stored one value for each point in the tree, so rather than a tree of map structures it was a heirarchical map structure. Under this scheme the name "key" makes sense. As I understand it, the old singleton value lives on in and beyond as the "(Default)" entry in the registry editor, still used all over the HKEY_CLASSES_ROOT key presumably for reasons of backwards compatibility. I also assume that this is the reason for the rather awkward schema adopted in that area, where three different kinds of thing are all shoved in at the top level and distinguished only by naming convention.

  8. Mike says:

    "As an extreme example, a function to copy a file should make every effort to ensure that it can perform the requested task before copying the first byte."

    The problem with that philosphy is that the answer can change between the time you do your pre-verification and the time you do the actual operation. It’s simpler and better, IMHO, to do only minimum sanity testing and handle errors occuring in the actual operation as you go – which you have to do anyway.

  9. Adrian says:

    raymond wrote: "For example, a function that computes a string can’t build the string directly into the output buffer; it has to build it in a separate buffer, and then only when everything is successful (including determining whether the buffer is big enough) copy the result into the output buffer. Do you write your functions this way?"

    While I agree that the caller shouldn’t expect the buffer to be untouched upon error, it seems reasonable to avoid undefined outcomes. So if building the string fails, it should do something reasonable to the buffer like setting it to a properly terminated empty string in addition to returning the appropriate error code.

    A similar problem exists with GetLastError. Some functions that purport to return additional failure information via GetLastError don’t actually update the error code in all cases. Thus, occasionally, you have to use SetLastError to something your recognize (e.g., 0) before calling the unreliable function.

    Chapters 5 and 6 of Writing Solid Code by Steve Maguire (Microsoft Press) recommend avoiding undefined behaviors.

  10. J says:

    I think 10 or more years ago you could rely on your documentation omitting details because it should be obvious why you wouldn’t want to things that way. These days, however, the programmer audience is more diverse and the machines are more complex and more forgiving (if you do something inefficiently, are you ever going to notice?). No, these days if you have such a diverse audience like the Windows platform you have to spell out everything or you’ll suffer in the long run.

  11. James says:

    Why not improve the documentation?

    For any function that takes an output parameter as an argument, it should clearly specify what happens to it on failure:

    A. Leaves it untouched.

    B. Initializes it to some value.

    C. Unspecified.

    Don’t let people assume.

    Furthermore, better documenting the function’s contract is better for both sides of the fence: anyone modifying that function’s internals are given a clear requirement on how the function needs to behave to the outside world.

  12. foxyshadis says:

    Ben: Thanks, I never knew that. I’d always wondered why the stuff in HKCR seemed so inefficiently built with only one value per key.

    Also, regedit itself calls them value name and value data, so you’re in good company.

  13. oldnewthing says:

    I am but one person. To update all the documentation myself would take years.

    At what point does one finally decide to leave things unsaid? Should every function also say "All pointers must point to valid memory; otherwise the behavior is undefined"? How about "If you pass the wrong number of parameters to this function, the behavior is undefined." Or "Do not use memory after freeing it."

  14. Ben Cooke says:

    Fatal Error: The operation completed successfully.

  15. Ivan says:

    "Should every function also say "All pointers must point to valid memory; otherwise the behavior is undefined"? How about "If you pass the wrong number of parameters to this function, the behavior is undefined." Or "Do not use memory after freeing it.""

    Saying something for every function is easy, you could just write it once and get docs for all functions to include it, not that I’m suggesting you write this sort of stuff of course

  16. Ivan says:

    Ben: I got that one too once

  17. oldnewthing says:

    Ivan: Would a single "Ground rules for documentation" page suffice? Or does every page have to have an explicit link to the "ground rules" page?

  18. Anonymous Coward says:

    In a great many cases (especially anything involving hardware access) race conditionsmean that you can never be absolutely certain that an operation will succeed before you start. You should check if it’s definately impossible (assuming that’s achievable) but you should never rely on operations being guranteed to succeed.

    Assumption is the mother of all subtle access violations waiting for an unlikely series of events to occur when you demo the software to a major potential customer before it occurs for the first time.

  19. J. Edward Sanchez says:

    I think it would be best if every page had an explicit — and prominent — link to the "ground rules" page.

  20. Norman Diamond says:

    Thursday, September 01, 2005 1:57 PM by Mike

    [quoting Reinder]

    >> "As an extreme example, a function to copy

    >> a file should make every effort to ensure

    >> that it can perform the requested task

    >> before copying the first byte."

    >

    > The problem with that philosphy is that the

    > answer can change between the time you do

    > your pre-verification and the time you do

    > the actual operation.

    That’s exactly why Reinder said "every effort" even though it’s not perfect. Suppose you want to copy a directory and all its subdirectories etc. onto an external hard drive for backup, Windows Explorer sits there for a while preparing, Windows Explorer can compute that this is 2.53 GB of files and the destination drive has 2.51 GB of free space, wouldn’t you want to be informed that the only way this is going to succeed is not only for the answer to change between the time of preparation and time of completion, but in fact it depends on some very lucky kinds of changes?

    Yes you do have to continue doing stringent error checking at every stage, exactly as you said, but advance notification has a purpose.

    Thursday, September 01, 2005 2:04 PM by Adrian

    > A similar problem exists with GetLastError.

    > Some functions that purport to return

    > additional failure information via

    > GetLastError don’t actually update the error

    > code in all cases. Thus, occasionally, you

    > have to use SetLastError to something your

    > recognize (e.g., 0) before calling the

    > unreliable function.

    Well yes and no. When LoadImage() fails, sometimes it returns a return value of 0 (failure) and GetLastError() returns 0 (ERROR_SUCCESS). I didn’t check if this could be solved by calling SetLastError to something other than 0 before calling LoadImage, but if it can be, 0 isn’t a useful value to use.

    Thursday, September 01, 2005 3:36 PM by oldnewthing

    > I am but one person. To update all the

    > documentation myself would take years.

    Of course. I understand that’s why you no longer solicit bug reports about MSDN pages. But as your base note here shows, you remember why it’s important for the spec to be reliable. (After all you can’t expect users to rely on an unreliable spec, at least not for long. Besides, think of how much more appcompat work you’ll get to do if they do.)

    So PLEASE persuade your colleagues in the MSDN arena that they should act on bug reports. It isn’t enough to send a reply saying that MSDN content is acquired by MSDN. Get the specs fixed, maybe with a timeframe of less than a year.

    Also PLEASE persuade your colleagues who aren’t in the MSDN arena that, even though they’re not responsible for fixing MSDN, that doesn’t mean they should post publicly that MSDN bugs don’t need fixing. They should learn that MSDN bugs DO need fixing.

  21. Mihai says:

    I usualy try to accomplish the following (in the order listed):

    - Strong guarantee

    - NoThrow guarantee

    - Basic guarantee

    I know that Windows precedes these concepts, but this is something that was intuitively followed by many programmers for a long time.

    I also realize that a big buffer cannot be preserved. But for "cheap" object (like a HANDLE, which is usualy an int) I kind of expect to be preserved. Especially from an OS, which is supposed to be way more reliable than a game, or browser, or whatever.

  22. Mihai says:

    It seems like the discution moves towards MSDN.

    Guys, it seems you never tried to use the Mac OS documentation. Same functionality provided by 4-5 functions, all but one being obsolete. But this is not documented, you need someone who was a long time Mac programmer to tell you which one is the new one.

    You can read some of my Mac adventures here http://www.mihai-nita.net/m200506.shtml

    I don’t say the MSDN doc is perfect, or that I would not like to see it getting better, but man, try doing at least browsing the Mac documentation (freely available online).

  23. dave says:

    here’s a pretty reasonable chunk of documentation for a pretty well-known unix api:

    http://www.openbsd.org/cgi-bin/man.cgi?query=read&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html

    note that it doesn’t make mention of the state of the buffer on error.

    who would ever implement read() in such a way that it buffers received data in a secondary buffer, just in case there might be an error reading from whatever the heck it is you’re reading from? who would expect it? who would want it?

  24. TC says:

    Let’s ignore for now that the documentation

    > uses registry terminology incorrectly; the

    > parameter specified is a value name, not a

    > key name.)

    You have to ask, which team of rocket scientists chose "key", "value", and "data"? How guaranteed is that, to cause confusion? ("The value of the value was … no, erm, the data of the value … um .."

    TC

  25. dave says:

    "The value of the value was"

    the pellet with the poison’s in the vessel with the pestle; the chalice from the palace has the brew that is true!

  26. Jonathan says:

    I call it "value name" and "value value". Just like the Disney characters – Goofy is a "man dog" and Pluto is a "dog dog". But maybe it’s just me.

  27. sandman says:

    raymond wrote: "I am but one person. To update all the documentation myself would take years."

    Absolutely. But it the documentation of the windows API which is I find to microsoft’s weakest point.

    Phaps instead of blogging about problems like this when you find them you could also take time to fix that function. It doesn’t all have to happen all at once , and every little helps.

    I don’t know of any way of folks outside MS provide patches either.

    You probably didn’t submit a patch to the OSS project you picked on – but if you did you could of held up the before and after as an example of how it is supposed to be done.

  28. Carlos says:

    My favourite MSDN cock-up is in the CreateFile documentation. It looks like someone has run a spellchecker: the console-input pseudo-files have changed from CONIN$ and CONOUT$ to COIN$ and COPOUT$.

  29. oldnewthing says:

    "Phaps instead of blogging about problems like this when you find them you could also take time to fix that function."

    I do not have the authority to edit the documentation directly. I can send suggestions to the doc teams (note: plural) and they will consider them. Sometimes they accept my suggestions, sometimes they don’t. It’s their call not mine.

    One suggestion of mine was accepted with "That’s a great idea. Why don’t you go do it send the diffs to us". Of course this was a large-scale change that I suggested, so it’ll take me hundreds of hours to do it. Hundreds of unpaid hours I could be spending on something else, like, say, having fun reading the latest Harry Potter book. What’s my incentive?

    I just heard back from the doc people on the COPOUT$ thing. There was a batch of edits that had a high regression rate that they are still trying to dig out from under and this was one of them. (Another example: Somebody went in and "fixed" Itanium -> Titanium.)

  30. Ben Hutchings says:

    dave wrote: "here’s a pretty reasonable chunk of documentation for a pretty well-known unix api:

    http://www.openbsd.org/cgi-bin/man.cgi?query=read&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html

    note that it doesn’t make mention of the state of the buffer on error."

    That’s because it’s untouched in that case.

    "who would ever implement read() in such a way that it buffers received data in a secondary buffer, just in case there might be an error reading from whatever the heck it is you’re reading from? who would expect it? who would want it?"

    That "secondary buffer" is commonly known as a "disk cache". They are fairly commonly used…

  31. Fun topic.

    I think that the more fundamental mistake here is to think that the (basic) contract varies depending on success and failure.

    This does not require double buffering to make it work correctly.

    The first thing is to understand the data flow direction. If you can’t understand the data flow direction easily, you have a badly designed API (and yes I’ve designed some bad ones myself).

    The OSF/DCE RPC IDL syntax is entirely ignorant of failure vs. success and forms a good starting point for anaysis.

    For example, there are three patterns of data flow: in, in/out, out.

    For "in" data, the value passed in must be (syntactically) legal. By this I mean that, for example, pointers have to either be NULL or point to a valid region of memory with the correct size/extent/type. The syntactic legality follows through any pointer chasing.

    For "in/out" data, the value passed in must be legal as specified in "in", but is also subject to change. The thing I dislike most about "in/out" is that for a nontrivial structure (say an array of pointers to structs where some fields in the structs are read and some are written to) it’s very hard to annotate the "subflow". However empirical evidence suggests that this kind of complex flow is bug-prone and while it may seem elegant on one level (fewer formal paramters), in practice it isn’t a good idea.

    "out" is interesting. Out data flow specifically says that the memory that the pointer refers to is uninitialized and that the callee will initialize it. Unconditionally. Any caller who depends on "out" data to be preserved is fundamentally broken.

    This still leaves the registry example ambiguous since while "[out] char **stringout" would certainly be expected to be initialized thus: "*stringout = NULL;", buffers with sizes are subject to more complex rules. Again IDL has set a decent standard for shallow data structures with size_is and length_is.

    Unfortunately, [out] does not mix with C++ objects; they by definition have to be [in, out] so things get more complex. You’re left with [out] only applying to things like raw memory buffers or simple scalars (including pointers).

  32. TC says:

    Carlos wrote:

    > My favourite MSDN cock-up is in the

    > CreateFile documentation. It looks like

    > someone has run a spellchecker: the

    > console-input pseudo-files have changed

    > from CONIN$ and CONOUT$ to COIN$ and COPOUT$.

    What an absolute bottler! :-))

    Recently, a user of my main aplication (which has been running successfully for several years), reported that he could not enter the asian surname "Teh". "Your software keeps changing it to ‘The’!". I said words to the effect of, what drugs are you on, and where can I get some.

    Oops! MS Access applied MS Office Autocorrect /BY DEFAULT/ to all of the text fields in my application! Hey, there’s a neat idea. Let’s spelling-correct all the Product Codes!

    I’m a great fan of MS software. It has revolutionized several aspects of software development in the 30+ years that I have been in the game. But I look at some decisions, such as, the decision to automatically spelling-correct every text field in a database application (!), and I wonder what level of oversight is applied to these decisions within MS? How could any experienced database software developer, have ever signed-off that one?

  33. Jerry Pisk says:

    Ben,

    read() is used to read data from a lot more types of objects than just physical disks. Consoles, network devices are just two of things you can read with read() that can fail at any time, even half way through a read (so can caches, you cannot expect a disk cache to act as a double buffer, if you read more than one cache page than the first does not have to be available by the time your second comes in). If you lookup POSIX standard references you will see that:

    It seems fairly clear from paragraph 7 that at least for EINTR the contents must be considered undefined, since the OS may return -1 anyway after data has been transferred.

    (http://standards.ieee.org/reading/ieee/interp/1003-1-90_int/pasc-1003.1-76.html)

  34. James says:

    At what point does one finally decide to leave

    > things unsaid? Should every function also say

    > "All pointers must point to valid memory;

    > otherwise the behavior is undefined"? How about

    > "If you pass the wrong number of parameters to

    > this function, the behavior is undefined." Or

    > "Do not use memory after freeing it."

    Of course not. Those things are already implicit from the standard C language specification.

    If you’re writing an API function, it’s your (in the general sense, not you Raymond Chen specifically =) ) duty to specify a contract with its callers.

    I’m okay with the "if it’s not documented, it’s obviously unspecified, duh" approach, but obviously it’s proven to be problematic.

  35. David Walker says:

    Eric says "What’s my incentive?"

    Why, the greater good, of course! Making the world a better place! ;-)

  36. David Walker says:

    As for "At what point does one finally decide to leave things unsaid?", it’s halfway between "All pointers must point to valid memory" and "On failure, the output buffer will contain…"

    You don’t need to say "All pointers must point to valid memory". Obviously we know that. With how great your blogs are generally, I wonder what possessed you to say this–it’s clear that you’re not serious.

    But some indication of what we can expect the output buffer to contain, if anything, on failure might be useful.

  37. Ben Hutchings says:

    Jerry Pisk wrote: "It seems fairly clear from paragraph 7 that at least for EINTR the contents must be considered undefined, since the OS may return -1 anyway after data has been transferred."

    This was changed in the 2001 version of POSIX:

    "If a read() is interrupted by a signal before it reads any data, it shall return -1 with errno set to [EINTR].</p>

    "If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes

    read."

    (from http://www.opengroup.org/onlinepubs/009695399/functions/read.html – free registration required)

  38. TC says:

    > some indication of what we can expect

    > the output buffer to contain, if anything,

    > on failure might be useful.

    * Why * ?

    Can you give an example?

    I just can’t see it. If you ask for ‘X’, and don’t get it, who cares what you got instead? It’s not a food-type situation, where you ask for a burger, but would probably be satisfied with a pie instead …

  39. Norman Diamond says:

    Tuesday, September 06, 2005 7:43 AM by TC

    > I just can’t see it. If you ask for ‘X’, and

    > don’t get it, who cares what you got

    > instead?

    Here’s a reason for caring. It helps in situations where the results are documented and doesn’t help in situations where the results aren’t documented, but it helps show why it could be useful to define and document and implement meaningful behaviours in some cases which aren’t at present.

    Some calls to Windows APIs can get error returns with a status code saying that more data are available.

    If the output buffer included accurate data up to the limit of either the size of the output buffer or the maximum size that Windows is willing to fill in one call, and if MSDN says that the caller can repeat the call to obtain subsequent portions of the data and the caller can concatenate the results together, then the caller can eventually get everything they needed.

  40. TC says:

    David Walker wanted to know what was in a buffer after a call had "failed". I still see no benefit for knowing that. Your example is not a "failure" in that sense.

  41. Norman Diamond says:

    OK, IF MSDN says that the caller can repeat the call to obtain subsequent portions of the data THEN the error return saying that more data are available isn’t exactly a "failure". But if MSDN doesn’t say so then the error return still is a failure. It would be useful for some of these failure cases to be changed into non-failures. The reason it would be useful is that the caller has reasons to ask for the data in the first place.

  42. TC says:

    Norman, that’s still not a good example IMHO.

    If you are saying that a call returns partial data, but this is not documented, then, this is either (a) fortuitous, ie. you can’t expect it to keep happening in future, or (b) a simple error in the documentation.

    I’m after examples of what at least one previous poster seemed to be wanting: information on the content of an output item, after a call has failed /absolutely/.

    An "absolute" failure in this context would be something like "file not found" – ie. an error with no if’s, but’s or maybe’s about it.

Comments are closed.