If you say that you don’t care about something, you shouldn’t be upset that it contains garbage


There are many situations where you pass a structure to a function, and the function fills in the structure with information you request. In some cases, the function always fills in the entire structure (example: GlobalMemoryStatus). In other cases, you tell the function which bits of information you care about, to save the function the effort of computing something you weren't interested in anyway (example: TreeView_GetItem).

In the latter case, if you say that you aren't interested in certain parts of the structure, and then you change your mind and start paying attention to them, don't be surprised if you find that there's nothing interesting there. After all, you said you didn't care.

For example, if you call TreeView_GetItem and set the mask to TVIF_IMAGE | TVIF_PARAM, this means that you want the function to set the iImage and lParam members of the TVITEM structure and that you don't care about the rest. After the call returns, the values of those two members are defined, since you said that's what you wanted, and the remainder of the output fields are undefined. They might contain useful information, they might contain garbage, you're not supposed to care since you said that you didn't.

Why might fields you said you didn't care about still contain information (correct or incorrect)? It might be that the value is so easy to compute that checking whether the value should be set takes more work than actually setting it! In such a case, the function might choose to set the value even if you didn't say that you needed it.

On the other hand, the value might be an artifact of a translation layer: You pass a structure saying, "I'm interested in two out of the four members." The function in turn calls a lower lever function with a different structure, saying, "I'm interested in two out of the five members of this different structure." After the call returns, the middle-man function converts the lower-level structure to the higher-level structure. Sure, it may also "convert" stuff that was never asked for, but you said you weren't interested, so they just get garbage. In other words, the function you're calling might be defined like this:

// The pinfo parameter points to this structure
struct FOOINFO {
 DWORD dwInUse;
 DWORD dwAvailable;
 DWORD dwRequested;
 DWORD dwDenied;
};

// The dwMask parameter can be a combination of these values
#define FOOINFO_INUSE     0x0001
#define FOOINFO_AVAILABLE 0x0002
#define FOOINFO_REQUESTED 0x0004
#define FOOINFO_DENIED    0x0008

BOOL GetFooInfo(FOOINFO *pinfo, DWORD dwMask);

Now, the GetFooInfo function might just be a middle man that talks to another component to do the real work.

// lowlevel.h

struct LOWLEVELSTATS {
 DWORD dwUnitSize;
 DWORD dwUnitsInUse;
 DWORD dwUnitsAvailable;
 DWORD dwUnitsRequested;
 DWORD dwUnitsGranted;
 DWORD dwTotalRequests;
};

// The dwMask parameter can be a combination of these values
#define LLSTATS_UNITSIZE  0x0001
#define LLSTATS_INUSE     0x0002
#define LLSTATS_AVAILABLE 0x0004
#define LLSTATS_REQUESTED 0x0008
#define LLSTATS_GRANTED   0x0020
#define LLSTATS_REQUESTS  0x0040

BOOL GetLowLevelStatistics(LOWLEVELSTATS *pstats, DWORD dwMask);

The resulting GetFooInfo function merely translates the call from the application into a call to the GetLowLevelStatistics function:

BOOL GetFooInfo(FOOINFO *pinfo, DWORD dwMask)
{
 LOWLEVELSTATS stats;
 DWORD dwLowLevelMask = LLINFO_UNITSIZE;

 if (dwMask & FOOINFO_INUSE)
  dwLowLevelMask |= LLSTATS_INUSE;
 if (dwMask & FOOINFO_AVAILABLE)
  dwLowLevelMask |= LLSTATS_AVAILABLE;
 if (dwMask & FOOINFO_REQUESTED)
  dwLowLevelMask |= LLSTATS_REQUESTED;
 if (dwMask & FOOINFO_DENIED)
  dwLowLevelMask |= LLSTATS_REQUESTED | LLSTATS_GRANTED;

 if (!GetLowLevelStats(&info;stats, dwLowLevelMask))
  return FALSE;

 // Convert the LOWLEVELSTATS into a FOOINFO
 pinfo->dwInUse = stats.dwUnitSize * stats.dwUnitsInUse;
 pinfo->dwAvailable = stats.dwUnitSize * stats.dwUnitsAvailable;
 pinfo->dwRequested = stats.dwUnitSize * stats.dwUnitsRequested;
 pinfo->dwDenied = stats.dwUnitSize *
                   (stats.dwUnitsRequested - stats.dwUnitsGranted);

 return TRUE;
}

Notice that if you ask for just FOOINFO_DENIED, you still get the dwRequested as a side effect, since computing the number of requests that were denied entails obtaining the total number of requests. On the other hand, you also get garbage for dwInUse since the call to GetLowLevelStats didn't ask for LLSTATS_INUSE, but the code that converts the LOWLEVELSTATS to a FOOINFO doesn't know that and converts the uninitialized garbage. But since you said that you didn't care about the dwInUse member, you shouldn't be upset that it contains garbage.

You now know enough to answer this person's question.

(Note of course that I'm assuming we are not returning uninitialized garbage across a security boundary.)

Comments (32)
  1. Tom says:

    First, it’s interesting to know that you’re actually trolling newsgroups.

    Second, I am continually amazed at how many people simply can’t read the owner’s manual.

    From ms-help://MS.PSDKSVR2003R2.1033/shellcc/platform/commctls/treeview/structures/tvitem.htm

    "mask – Array of flags that indicate which of the other structure members contain valid data. When this structure is used with the TVM_GETITEM message, the mask member indicates the item attributes to retrieve."

  2. Tom says:

    Edit: I meant to say "trawling" instead of "trolling".  Sorry about the implication.

  3. Dog says:

    Seriously, this is just bad design.

    Either have a function that fills in a struct, or have a function to return each item seperately. Or both. Having this bizarre "please initialise these parts of this struct" function is just silly.

    It’s not even as if you can pass the struct back to the same/another function to fill in the rest (since that function is perfectly at liberty to overwrite the data you got earlier with garbage).

    Bad. Bad. Bad. Please, if anybody is thinking about writing a function like "GetFooInfo" please stop and think about how to do it properly.

  4. Neil says:

    Quoting MSDN, "On output, the state member contains the values of the specified state bits."

    This sounds to me like it will only contain the bits specified, and not any garbage bits.

    I don’t see the merit of offering a subset of bits but not limiting the callee to that subset; in that case the callee might as well return all the bits all the time, which makes the offer meaningless.

  5. njkayaker says:

    It’s sad that there is a need for you to explain what should be basic to any competent programmer.

    "Seriously, this is just bad design."

    Are you sure there no advantage to doing it "wrong" way?

    Anyway, the "bad design" (arguable) isn’t relevent. The documentation is fairly clear as to how it works. If someone has a problem with the way it works, they have many other programming problems you just don’t know about!

  6. nathan_works says:

    No trawling the news groups, that post was from 2005 (!!). Ancient. History.

  7. Wang-Lo says:

    When passing a structure around, each handling process must contract to do one of THREE possible things to each member: (1) fill it with a meaningful result, (2) fill it with garbage, or (3) leave it unchanged.

    The programmer’s confusion arises from interpreting (2) as (3), probably because the documentation author never thought of (3) as a sane possibility, and so didn’t clarify.  Meanwhile, the programmer unconsciously assumes that (2) is just a crazy dumb thing to do in a shared structure.

    -Wang-Lo.

  8. Mark Sowul says:

    "Seriously, this is just bad design…

    Bad. Bad. Bad. Please, if anybody is thinking about writing a function like "GetFooInfo" please stop and think about how to do it properly."

    Here we go again.  Are you sure you’re not Aaargh! in disguise?

  9. Carter says:

    I wonder if that guy is still badly blocked.  Hes been waiting three years Raymond!  I no longer think you have super powers.

  10. mikeb says:

    As Neil indicates, another way of interpreting the documentation might be:

    "If you say that you’re interested only in particular items, the API should not change other bits of information"

    It seems to be that either interpretation is reasonable.  This is where experience comes into play – clearing up the ambiguous areas which exist in pretty much any API – more so in the complex ones.

  11. Adam V says:

    @Nathan_works:

    > that post was from 2005 (!!). Ancient. History.

    Raymond has mentioned http://blogs.msdn.com/oldnewthing/articles/407234.aspx multiple times http://blogs.msdn.com/oldnewthing/archive/2008/05/07/8464281.aspx in the past that he has a large backlog of posts queued up. So why are people surprised that this story is regarding a post made in 2005?

  12. Leo Davidson says:

    @Adam V:

    People are saying "that post was from 2005" because the Usenet post in question — the one Raymond linked near the end with the line "You now know enough to answer this person’s question." has a 2005 date on it.

  13. orcmid says:

    I also could see how the description of the function can be taken as a promise to only return what is being asked for.  (In particular, if I’d written something like that, that is what it would mean unless I was very explicity to emphasize the contrary.)

    On the other hand, the mask field is also used to describe what in the structure is valid and could potentially be updated as part of performance of the request.  Likewise, the stateMask has this dual role of specifying the state bits wanted and masking the valid bits. (There doesn’t seem to be any prospect for it being altered though.)

    This leaves me with the impression that the descriptions of this request and the data structure are pretty crufty.  I think that existence of that old newsgroup post demonstrates that the feature needs to be more tightly specified.  It is interesting that the remainder of the thread apparently never did identify the confusion and went off in different directions.

  14. Dean Harding says:

    @Leo Davidson: I think Adam V understood that. Obviously, in 2005 (or so), Raymond saw that newsgroup post and added it to the end of his queue of posts to be posted. Given that he says he currently has enough posts to last until 2010 or so, it doesn’t seem unreasonable that it’s only showing up now.

    An alternate explanation is: "who cares?" the topic is not magically made irrelevant by the simple passage of time.

    @Dog: This design is common in the Win32 API. Once you get used to it (seriously, it’s not that complicated) it’s quite simple to understand.

    Do you think saying "SELECT a, b, c FROM foo" is bad SQL, when you could just say "SELECT * FROM foo"?

  15. parobu says:

    or maybe we should issue seperate queries for each field:

    select a from foo

    select b from foo

    select c from foo

    Good. Good. Good.

  16. Leo Davidson says:

    I feel there’s a difference between:

    1) A function filling in fields in a structure which you didn’t ask for (which is fine unless you assumed it wouldn’t overwrite something)

    And:

    2) A function filling in bits in a single field when you only asked it for other bits.

    In the 2nd case you might say I only want to know if BIT_N is set or not, then compare the results with BIT_N for equality. Instead there may be other bits set and you have to mask off the returned field in some way.

    I realise that bits in a field are, conceptually, no different to fields in a structure, but people are much more likely to compare a field with a constant for equality than to compare an entire structure for equality.

    I don’t think what the API does is wrong but I think it would be worth the trivial performance hit for the API to mask of the field to only the requested bits, because people *are* going to misunderstand what the function does or misinterpret (or not even read) the documentation, and it’s a needless problem.

    On the other hand, if someone says they don’t care about an entire field and then read from it or compare it, then they only really have themselves to blame, *provided* the documentation is clear that you can’t rely on previously set data to be left in fields you have said you’re not interested in.

  17. Dean Harding says:

    What would be the point of calling the function a second time to "fill in the rest of the struct"? Why wouldn’t you just call it with all the flags you’re going to need already set up front?

  18. Richard Wells says:

    Or call it a second time just for the remaining elements needed.

    I remember that some function calls took so long to return everything that the calls had to be split apart to maintain the illusion of a responsive system. No one cares about how the programmer shaves off a few lines of code if the program stalls for 10 minutes getting results.

  19. Triangle says:

    Mark Sowul: I absolutely **loved** the way you copied Dog’s post, carefully removed the explanation for the bad design claim, and then go on to dismiss it as meaningless. Really classy.

    And I agree with Dog as well; this is bad design because the miniscule performance benefit of the callee not checking the set bits will be far outweighed by the caller having to copy the data multiple times if it wants a struct filled in from different calls, as well as making the callers code unneccesarily more complicated.

  20. Richard says:

    That’s quite an unsafe idiom:

    struct LOWLEVELFOO_TAG {

     double a;

     double b;

     int n;

    } LOWLEVELFOO;

    struct FOO_TAG {

     double r;

     int n;

    } FOO;

    void FillInThisFoo(FOO *pFoo, DWORD bitsThatIWant)

    {

     LOWLEVELFOO llf;

     DWORD llfBits = 0;

     if (bitsThatIWant & WANT_R) llfBits |= LLF_WANT_A | LLF_WANT_B;

     if (bitsThatIWant & WANT_N) llfBits |= LLF_WANT_N;

     FillInThisLowLevelFoo(&llf, llfBits);

     pFoo->n = llf.n;

     pFoo->r = llf.a – llf.b; // OOPS! Here we crash if stack garbage happens to look like a SNaN!

    }

    [I didn’t write an idiom, just a function. Obviously if you have floating point, things get interesting. -Raymond]
  21. Aaron says:

    Re: "bad design" argument from Dog (and I apologize if I’m going off-topic here)…

    Let’s say I want to be able to contact you privately.  Say you’re in the office down the hall, or we met at a party, whatever.  Tell me which of these three options makes the most sense:

    1) I ask for your contact info.  You take a slip of paper and write down your work e-mail, home e-mail, work phone, cell phone, home phone, work fax, home fax, Facebook profile, Myspace profile, work address, home address, weekend cottage address, MSN messenger contact, ICQ number, AOL nickname, Google Talk contact, Twitter feed, blog address, driver’s license number, and the name and address of the bar you usually hang out at after work.  Also you give most of this information for your brother who is your emergency contact.

    2) I walk over to you and ask for your home phone.  You write it down for me, then I go back and file it away.  I come over to you two minutes later and ask for your e-mail address.  I leave and file it away.  I repeat this exact same process to get your home address, cell phone, and MSN contact.

    3) I walk over to you and ask for your home address, phone, cell, e-mail, and MSN contact.  You write it down, I leave, and we’re finished.

    Anyone want to explain to me why option #1 or #2 is somehow better?

  22. Dog says:

    Aaron:

    Of course 3 is better. However, if you gave me a form that had spaces for business email, personal email, home phone, work phone, work cellphone and personal cellphone, along with another piece of paper that said "only fill in business email and work phone", then I went ahead, filled in those fields and then filled the rest in with made-up details, would that be reasonable? (Then of course I would destroy the piece of paper that told me what to fill in.)

    How about if you then handed the form to your secretary and said "give this guy a call"? Would it be a surprise if he/she tried one of the made-up phone numbers? (Or even if you just left the form on your desk for a couple of days and forgot which items were real).

    That’s why this is bad.

    Note that in the world of programming (in most languages at least) not filling in == filling in with garbage.

    If there are one or two items being requested, it’s no big deal to have to do a function call for each of them.

    If there is a commonly-used subset of the data, then provide a struct that has just that subset (maybe with an optional (null if not used) pointer to the rest of the data).

    Don’t make up a one-size-fits-all struct and then only fill in parts of it (although if you make sure to zero/null/indicate somehow the parts you haven’t filled in then it’s probably ok).

  23. noone in particular says:

    Dog:

    Note that in the world of programming (in most languages >at least) not filling in == filling in with garbage.

    Actually, this is C thinking. In the context of a mostly C-style API like Windows this is ok.

    But generally, I would prefer the function would not touch elements of the callers struct it was not explicitly asked to touch.

    Instead of e.g. wiping the whole struct with zeroes or filling them with garbage (it actually has to compute first!)

  24. Bryan says:

    @Dog:

    Your comparison is flawed quite a bit.  It’s more similar to if you said "Get Information about <X>" and she said "What’s the number for <X>?" and you gave her a contact page with a lot of different numbers and e-mail addresses, would you expect that she use the unlabeled e-mail address on there?  No – you’d expect she just use <X>’s number since that’s all she wanted.  (Assuming <X>’s number was labeled and accurate where other items on the card were random, unlabeled, or unverified)

    You know what information you want.  Use that information.

  25. Mark Sowul says:

    @Triangle – when I said "here we go again" and referenced Aargh, I was referring to the imminent war mirroring the one we had a few days ago on the merits of kernel handles ignoring the lower two bits, i.e. someone saying "this design is stupid" while ignoring its historical context, namely slow processors and tight RAM, which implies that the person complaining has never read (or understood) this blog.  I didn’t feel the need to write that all out, as we had just done this to death earlier in the week.

  26. Sitten Spynne says:

    It seems that most of the people arguing that this design is wrong are missing a crucial point Raymond made, and the people firing the analogy guns are missing them as well.

    Let’s say I have a silly form I have people fill out when I need contact information.  there are spaces for "Name", "Phone number" "Age", and "Name and address of every teacher of every subject, starting with grade school and ending with highest level of education".  In general, I only need the name, age, and phone number, but when I want to do a creepy background check I want the teachers.

    Name, age, and phone number take no time for the person to generate.  The final field will require plenty of time to transcribe, not to mention possible digging for old yearbooks and phone calls to friends to get the name of this or that teacher.  If I’m in a hurry and I tell you, "Please fill out the ‘name’ and ‘phone number’ fields", I’m going to be quite annoyed if you take the time to fill out the extensive last field.  Even if you do fill it out, I didn’t want that information so I won’t pay any  attention to it.  You’ve wasted my time without helping me.

    Raymond pointed out that sometimes this is done for performance reasons, and this is why those functions don’t put meaningful values in the parts of structures you don’t ask for.

    I do agree it’s not nice if the function changes the values you said you don’t care about, but since some functions do, is it so hard to remember that if the data in struct A is important, you should pass some other struct B to the function then copy the members you need to A?

  27. Triangle says:

    I wish people would stop trying to use analogies to explain their points of view. It only confuses the issue more. There’s basically just a disagreement to whether or not setting X bits and clearing Y bits means "All I care about is X, Y is unimportant to me" or "Bits X are set and bits Y are cleared, so do X and don’t touch Y".

  28. Alec Soroudi says:

    <em>If you say that you don’t care about something, you shouldn’t be upset that it contains garbage</em>

    Isn’t that the same argument they use to get people to vote? Isn’t there already too much politics in software dev already? :p

  29. Mike Dimmick says:

    A lot of you don’t understand what things cost.

    TreeView_GetItem is a macro. It wraps a call to SendMessage sending the message TVM_GETITEM. SendMessage first has to validate that the window handle is valid, and this ultimately ends up in a call into kernel mode. After this it’s a relatively simple operation if sending a message to a window on the same thread – Windows looks up the thread ID for the window and if it’s the same, looks up the window procedure and calls it. However, if it’s not on the same thread, it has to queue a message on that thread’s message queue (taking locks in the process), signal the other thread that a message was queued, then wait for a response. Most of these actions require a trip into the kernel which costs many cycles.

    Crossing to another process also means copying the structure to some shared memory location before queueing the message. That needs more kernel round-trips. I’m not sure if this works for TVM_GETITEM, although it is in the [0, WM_USER) range so it should have marshalling support according to the documentation.

    Because there’s a finite latency cost involved in getting the message to its destination, and getting the reply it can often be beneficial in terms of overall program runtime to batch up requests that have to cross some boundary. In DCOM this was referred to as making your interface ‘chunky’ instead of ‘chatty’. This is a ‘chunky’ interface, designed that way on the assumption that most of the time, you’ll be interested in more than one property.

    In addition there’s some extra complication involved in whether the TreeView is running an ANSI or Unicode window procedure, and whether the calling application supplied an ANSI or Unicode structure, so some translation between formats might be necessary. Again, doing this once is generally better than multiple round-trips to collect different pieces of information (of course not all of those will be strings).

    In the case of standard functions, each additional function adds extra space to the export address table of the DLL, the import address and import name tables of the application or library which imports it, and of course the extra size of the function itself (which will have at least *some* prologue and epilogue information) and on platforms that use exception tables, extra unwind information in those tables. Extending functions with variable-size parameter structures is often a better idea than adding new entry points.

  30. Dog says:

    Since some people here probably wouldn’t be able to spot a badly designed function even if it were clearly stated in a logically organized and easily searchable documentation database, here are two reasons why this is bad (in no particular order):

    * The compiler can no longer tell if you are using uninitialized data.

    * It violates the design principle that the data that a function returns should always be conceptually the same.

    If you do this, why not just have "HRESULT DoStuff(LPVOID lpvWhatToDo, LPVOID lpvWhereToPutResult)"?

    I can however (sort of) see why this was done. Maybe the struct was already decided, but it was realized that one or more of it’s values were computationally expensive and due to management processes, etc, the struct could not be changed, nor could individual getters be implemented.

    And this is not similar to the SQL example given. It would only be similar if "SELECT a FROM bla" also returned columns b and c filled with random garbage.

  31. Paul says:

    Dog:

    If I gave you the form and asked you to fill in *only* the business email and work phone then it is probably because I *only* want those bits of information. If you choose to draw pictures or enter blatant lies on all the other fields why should I care – I am *only* interested in the information I requested.

    Should I ask you to fill in all the fields (causing you more work) to obtain the two values I need just in case you would doodle in the fields I don’t care about?

    If you fill in the bits I request correctly I really do not care about the remaining bits.

    Seems simple enough to me.

  32. yme says:

    It appears to me, after reading the whole newsgroup thread, that the guy wasn’t primarily complaining about getting stuff he didn’t ask for, but rather about not getting stuff he did ask for.

    Then it turned out that the stuff he asked for didn’t exist, so he shouldn’t have expected to get it even if he asked for it.  (Or something like that.)

Comments are closed.

Skip to main content