Why are structure sizes checked strictly?


You may have noticed that Windows as a general rule checks structure sizes strictly. For example, consider the MENUITEMINFO structure:

typedef struct tagMENUITEMINFO {
  UINT    cbSize; 
  UINT    fMask; 
  UINT    fType; 
  UINT    fState; 
  UINT    wID; 
  HMENU   hSubMenu; 
  HBITMAP hbmpChecked; 
  HBITMAP hbmpUnchecked; 
  ULONG_PTR dwItemData; 
  LPTSTR  dwTypeData; 
  UINT    cch; 
#if(WINVER >= 0x0500)
  HBITMAP hbmpItem; // available only on Windows 2000 and higher
#endif
} MENUITEMINFO, *LPMENUITEMINFO; 

Notice that the size of this structure changes depending on whether WINVER >= 0x0500 (i.e., whether you are targetting Windows 2000 or higher). If you take the Windows 2000 version of this structure and pass it to Windows NT 4, the call will fail since the sizes don't match.

"But the old version of the operating system should accept any size that is greater than or equal to the size it expects. A larger value means that the structure came from a newer version of the program, and it should just ignore the parts it doesn't understand."

We tried that. It didn't work.

Consider the following imaginary sized structure and a function that consumes it. This will be used as the guinea pig for the discussion to follow:

typedef struct tagIMAGINARY {
  UINT cbSize;
  BOOL fDance;
  BOOL fSing;
#if IMAGINARY_VERSION >= 2
  // v2 added new features
  IServiceProvider *psp; // where to get more info
#endif
} IMAGINARY;

// perform the actions you specify
STDAPI DoImaginaryThing(const IMAGINARY *pimg);

// query what things are currently happening
STDAPI GetImaginaryThing(IMAGINARY *pimg);

First, we found lots of programs which simply forgot to initialize the cbSize member altogether.

IMAGINARY img;
img.fDance = TRUE;
img.fSing = FALSE;
DoImaginaryThing(&img);

So they got stack garbage as their size. The stack garbage happened to be a large number, so it passed the "greater than or equal to the expected cbSize" test and the code worked. Then the next version of the header file expanded the structure, using the cbSize to detect whether the caller is using the old or new style. Now, the stack garbage is still greater than or equal to the new cbSize, so version 2 of DoImaginaryThing says, "Oh cool, this is somebody who wants to provide additional information via the IServiceProvider field." Except of course that it's stack garbage, so calling the IServiceProvider::QueryService method crashes.

Now consider this related scenario:

IMAGINARY img;
GetImaginaryThing(&img);

The next version of the header file expanded the structure, and the stack garbage happened to be a large number, so it passed the "greater than or equal to the expected cbSize" test, so it returned not just the fDance and fSing flags, but also returned an psp. Oops, but the caller was compiled with v1, so its structure doesn't have a psp member. The psp gets written past the end of the structure, corrupting whatever came after it in memory. Ah, so now we have one of those dreaded buffer overflow bugs.

Even if you were lucky and the memory that came afterwards was safe to corrupt, you still have a bug: By the rules of COM reference counts, when a function returns an interface pointer, it is the caller's responsibility to release the pointer when no longer needed. But the v1 caller doesn't know about this psp member, so it certainly doesn't know that it needs to be psp->Release()d. So now, in addition to memory corruption (as if that wasn't bad enough), you also have a memory leak.

Wait, I'm not done yet. Now let's see what happens when a program written in the future runs on an older system.

Suppose somebody is writing their program intending it to be run on v2. They set the cbSize to the larger v2 structure size and set the psp member to a service provider that performs security checks before allowing any singing or dancing to take place. (E.g., makes sure everybody paid the entrance fee.) Now somebody takes this program and runs it on v1. The new v2 structure size passes the "greater than or equal to the v1 structure size" test, so v1 will accept the structure and Do the ImaginaryThing. Except that v1 didn't support the psp field, so your service provider never gets called and your security module is bypassed. Now everybody is coming into your club without paying.

Now, you might say, "Well those are just buggy programs. They deserve to lose." If you stand by that logic, then prepare to take the heat when you read magazine articles like "Microsoft intentionally designed <Product X> to be incompatible with <software from a major competitor>. Where is the Justice Department when you need them?"

Comments (40)
  1. Mike Dunn says:

    That must suck when you add checks to the OS that actually _find bugs_ for other people, yet they complain about it. I guess even developers aren’t exempt from the "people don’t read help" syndrome, since reading the docs and seeing that there’s a cbSize member to set would avoid all that trouble in the first place.

    That reminds me of a nagging question, and Raymond you are just the guy to ask since it’s about UI ;) Are there any undocumented checks of list/tree ctrl structs like LVITEM? Over my years of coding and helping others, I’ve found that if someone doesn’t init an LVITEM to all zeros, yet sets all the relevant members properly (including mask to indicate which members are valid), calls will sometimes fail. Init’ing the struct to all zeros fixes it. So it seems like, even though the mask says the OS should ignore some members, it doesn’t ignore them.

  2. What’s up with all those struct members that are documented as ignored or reserved. How does that happen?

  3. Raymond Chen says:

    Mike: The most common error with LVITEM is forgetting to initialize the mandatory iSubItem member. I bet that’s what you’re missing. Any field that has a mask is optional. Fields that don’t have a mask are mandatory.

    Dejan: I guess I don’t understand the question. Ignored/reserved means "We don’t use them for anything now, but a future version might use them, so make sure to pass 0 here so you don’t break in the next vesion."

  4. Anonymous says:

    The only thing that’s bad about all those structs is they all use different naming conventions for the size member. Fortunately all the ones I have come across list it as the first member so I can just do SOMESTRUCT foo = { sizeof foo }; on the first line.

  5. Jeroen-bart Engelen says:

    This feature really frustrated me at first. I create software that uses the RAS and my users have systems that range from Win95 to WinXP. But I do wish to use the Win2k and up features if possible. So I first went and found out how big each structure would be on each platform (only for the ANSI version) and then wrote a function that would return the struct size for the platform I requested. Then I would include the RAS header file with WINVER set to 0x510 (that was XP, right?) and use that structure. Then when I actually called an API that needed a versioned-struct I would first use my function to retrieve the size it would expect and just give it a larger structure. IIRC I only set the cbSize member to the false value. Some APIs also want the size of the struct as a parameter and I believe I do give the real (sizeof) size there.
    Works like a charm.

  6. Anonymous says:

    Jeroen-bart, you’re a brave man coding against the RAS APIs :)
    I wish all APIs included a Version field in their structs or APIs, instead if using a size field to try to discern between different versions…

  7. Jack Mathews says:

    Much like the extended styles not failing on extra bits being set, it’s hard to believe you guys had a >= check in there long enough for programs to break as a result of it. I suppose hindsight is 20/20 though.

    Sizes, for all intents and purposes, are versions. Though it would be nice if in the Windows does they put the different versions of all of the structs in the .h files, rather than the most recent. Or even rather than the ifs for the versions of Windows. Just give us LOGFONT_NT5 and LOGFONT_NT4 then just alias LOGFONT to the newest version.

    I think I’m preferring flag members at the top of the class over sizes though. While it’s more limiting (it only allows things at the end), flags let you use the newest versions of the structs without worrying about anything except not passing newer flags than the Windows version.

  8. Raymond Chen says:

    Flags and sizes – does it really matter?

    With flags:

    s.dwFlags = (use version 2) ? COOL_V2_STUFF : 0;

    versus sizes:

    s.cbSize = (use version 2) ? sizeof(S_V2) : sizeof(S_V1);

    The original code used >= because in the old days, there was an implicit assumption that developers were not stupid, not lazy, and not evil. "Of course everybody will set their sizes properly. That’s why they get paid the big bucks."

  9. Lonnie McCullough says:

    This also begs the question of why certain structs do not use a cbSize member. Take LOGFONT for instance. Now to me the inner workings of the font mapper and fonts in general are a huge mystery and therefore subject to change at any given time (as long as Windows can measure it correctly tho it can remain a mystery to me). So I was using the theme APIs the other day and came across GetThemeFont which expects a LOGFONT (but in actuality it wants a LOGFONTW). It seems having a cbSize member here (because the lfFaceName is embedded in the struct) would have made sense in this case, instead I had to spend half an hour realizing that the theme API is all Unicode and therefore wants a LOGFONTW. This was only after my "this" pointer had been overwritten about 15 times by GetThemeFont because it thought it had a LOGFONTW. Not complaining really, just wondering why something as variable as font rendering / mapping would not have designed in some future proofness (maybe its really the fault of the theme API writers tho). Thanks for an awesome blog Raymond!

  10. Phaeron says:

    Sorry, Raymond. I usually agree with you, but I think Microsoft deserves the big wooden spoon for this one. It is counterintuitive that simply raising _WINNT_WINVER could cause perfectly working code to break on older platforms.

    I once set WINVER to 0x0500 in a module so I could enable resizing on file dialogs in Windows 2000, and then got a bug report a couple of weeks later that the menus were broken on NT4. It turned out that _WINNT_WINVER=0x0500 caused the MENUITEMINFO structure to grow, which broke 95 and NT4. Gee, thanks. The Win32 documentation doesn’t usually say whether a function will accept larger structures or not, or whether redefining WINVER will change the size of a structure. In any case, it’s impractical for programmers to retest every API call in their application to determine if any no longer work on downgrade OS versions.

    If a structure needs to be enlarged for new features, and existing platforms won’t accept larger sizes, then the correct action is to create a new structure, not to modify the older structure. That is, WINVER=0x0500 should be a perfect superset of WINVER=0x0400, and WINVER=0x0501 a perfect superset of WINVER=0x0500. If you want programmers to take advantage of the features of the newest and greatest OS, you must either make it easy to continue support for older ones.

    Incidentally, I’ve heard that PalmOS has problems like this in its SDK as well — if they ship an import for function Foo in SDK version 37, they’ll rename it to FooV37 in v38 and call the incompatible v38 version Foo again. Drives everyone nuts.

  11. Raymond Chen says:

    Lonnie: Thanks for the pointer about GetTheme(Sys)Font. I’ll have that fixed. It should be declared as LOGFONTW*. Similarly, the TEXTMETRIC in GetThemeTextMetrics should be a TEXTMETRICW*.

    Phaeron: I concede that writing for mixed targets is a pain. I can try to keep an eye on the ones I have some influence over, but MENUITEMINFO is alas not one of them. I like to think that the PROPSHEETPAGE_V<x> structures are a little better though. (Those I do have influence over.)

  12. Mike Dunn says:

    Raymond, I just dove into prsht.h and was reminded of another nagging question. What the heck is ISOLATION_AWARE_ENABLED for exactly? I know it has something to do with enabling themes on XP (even though from my experience it seems like all I need to get themes is a manifest with the right ID), but the rest is muddy.

  13. Raymond Chen says:

    Isolation awareness is an entirely different matter. This has to do with activating your manifest at the appropriate times. You need to do this if you are a plug-in in a host program that is not manifest-aware. I am not an expert on this subject so I’ll shut up now before I make a greater fool of myself.

  14. Tom Seddon says:

    Regarding prsht.h, is there any reason why #define is used so much instead of typedef? I’ve noticed this a lot in MS headers, and it’s always struck me as strange. And now here I have a chance to ask :)

  15. Raymond Chen says:

    No deep reason for #define over typedef. I guess one tiny advantage of #define is that you can #undef a #define if you don’t like it. But that’s not a real reason.

  16. Krishna says:

    sorry for being outa topic. Why dont you write something about how a process looks like after the CLR loads and maps managed exec.s / dlls – you could exemplify the whole process of managed execution, with regard to the native aspects (process, threads, fibers… etc.). Something no one has done already. looking forward …. thanks…

  17. Raymond Chen says:

    Um, look at the subtitle: "Not actually a .NET blog".

  18. Krishna says:

    ok, i agree but since you were into hardcore Win32, i thought i could suggest that. sorry…

  19. Frederik says:

    Hi Raymond,

    I like your blog a lot. Could you tell something about that annoying ‘Unsafe Removal of Device’ dialog in Windows 2000 that shows up when you disconnect a USB device? Incidentally, it also pops up when you restore from hibernation, which is even more annoying. It seems to be gone in Windows XP. I’d like to hear about the reasoning behind it. Thanks!

  20. Raymond Chen says:

    I wasn’t involved with that dialog, but here’s what I remember: The device was indeed removed unsafely. If it was a USB storage device, for example, there may have been unflushed I/O buffers. If it were a printer, there may have been an active print job. The USB stack doesn’t know for sure (those are concepts at a higher layer that the stack doesn’t know about) – all it knows is that it had an active channel with the device and now the device is gone, so it gets upset.and yells at you.

    In Windows XP, it still gets upset but it now keeps its mouth shut. You’re now on your honor not to rip out your USB drive before waiting two seconds for all I/O to flush, not to unplug your printer while a job is printing, etc. If you do, then your drive gets corrupted / print job is lost / etc. and you’re on your own.

  21. Raymond Chen says:

    Krishna: I basically discuss raw Win32. If you want to know how the CLR uses threads, you’ll have to talk to the CLR folks. I don’t know how the CLR uses threads, nor do I know how MFC uses threads, how ATL uses threads, how SQL uses threads… I just know about how threads themselves work.

  22. Jack Mathews says:

    "Flags and sizes – does it really matter?"

    Yes, it matters. For one, sizes assumes a monotonically increasing size for the structure. Say for a future version you would like to CHANGE a member, rather than just add one. Can’t do that with a size.

    Also, as someone else mentioned, since structs change behind your back with newer WINVERs, you suddenly have compatibility issues. And not just versions, lets say you have:

    struct TestStruct
    {
    int size;
    void *ptr1;
    };

    … and in code, you initialize it like this …

    TestStruct s = { sizeof( TestStruct ), ptr };

    Now, you guys add a new memeber and bam, you have uninitiailized data. And that’s not developers being stupid, that’s just bad API design. You let code that used to work change behind peoples’ backs without warning. With flags, you don’t get this problem.

  23. Raymond Chen says:

    Actually, in that case you do not have uninitialized data. Unspecified initalizers default to 0. See ARM 8.4.1 or C 3.5.7.

    If you want to lock to a previous version of Windows, set WINVER before including <windows.h>. It has always worked that way.

    Members in structures are never changed. If there’s a member you don’t like any more, you just deprecate it and add a new member that you do like. (Changing the typf of a member really messes up the marshaller!)

    Please point me at an example of how flags would make this easier.

  24. Rob McAfee says:

    Raymond, I disagree with your rationalizations on this one. First, uninitialized data is easily trapped by range checking; for example, allow expansion by a factor of 2. Second, the idea that GetMenuItemInfo or any other API needs to protect future programmers from themselves, well, thanks but no thanks. I am fully capable of reading documentation and determining how an API will behave on older platforms. What I really want to do is just call GetMenuItemInfo with the structure’s size field initialized using sizeof. Any other behavior is frustrating. Finally, if a structure expands in such a way that the legacy API would not be able to behave properly (as in the "everyone gets in for free" example), then a new API is warranted, not failure of the legacy API.

  25. Raymond Chen says:

    As I noted: We tried that and it didn’t work. Even if you allow expansion by a factor of 2, you’ll find some people who are lucky and the uninitialized value happens to be in the "acceptable" range.

    If all you want to do is call GetMenuItemInfo and have it work as far down as NT4, then set _WIN32_WINNT_VERSION=0x0400 before #include’ing <windows.h> and you will get an NT4-compatible header file.

    Yes, it is annoying if you want your program to run on multiple platforms, taking advantage of new features of each platform as it becomes available. But most programs aren’t written that way; they target one OS and rely on the app compat team to keep them going on the newer OSs.

    Be careful what you ask for: Do you really want a new API every time a new feature/field/flag is added? Check out how many different varieties of GetTextExtent exist. Imagine if every API had that many variations.

  26. Raymond Chen says:

    P.S., we tried flags. That didn’t work either. Notice that the TVITEM structure does not have a size; it uses flags to determine what version you are using. And we found apps that left the flags uninitialized. But the flags they set were harmless — until IE4 added new flags that happened to match the uninitialized garbage…

  27. Jordan Russell says:

    Can’t such problems be prevented by checking for invalid flags?

  28. Raymond Chen says:

    And so we’ve come full circle. The code now has to read

    if (version_2_detected)
    s.dwFlags = FEATURE_V1 | FEATURE_V2;
    else
    // can’t set FEATURE_V2 on version_1 or it
    // gets rejected as an invalid flag
    s.dwFlags = FEATURE_V1;

    but as I understood it the reason people preferred flags is that it avoided the need to do version checking…

  29. Jack Mathews says:

    "Actually, in that case you do not have uninitialized data. Unspecified initalizers default to 0. See ARM 8.4.1 or C 3.5.7."

    And zero is always a sensible default? I consider zero uninitialized, because the implementation of the struct may require a non zero value.

    "but as I understood it the reason people preferred flags is that it avoided the need to do version checking…"

    Which had nothing to do with the reasons I just said, which is that with sizes you get an implicit change to code in places that you may not know.

    For one, your "people target one version of Windows" argument is bunk. I had to have all sorts of workarounds for Win98, and we didn’t feel like redistributing comctl32 for awhile, so we would have checks in there. Plus, anyone using comctl 6.0 that wants their programs to run on Win2k has to do quite a bit of version checking.

    Anyway, you’re missing the point of the flags – it’s not so much for version checking as it is for being forward compatible. With flags, anything you add can be uninitialized, because it won’t be touched without a flag there. You don’t have to redeclare the struct yourself or hard code a size. You can get a new version of the platform sdk and bam, all your stuff still works on the versions of Windows it used to.

  30. Raymond Chen says:

    I understand that flags are much nicer if you assume that everybody is smart and pays close attention and never makes mistakes, but unfortunately that’s not the case. In fact most programmers are NOT writing industrial strength "must run on all platforms" code. They’re writing "just get the job done" code. So we have to design things so it’s easy to get it right. Allowing forward compatibility in structures and flags means that people will leave stuff uninitialized and get away with it.

    I understand the frustration, since I too have written my share of programs that need to run on several flavors of the OS. But you and I are in the minority. It’s pay for play. The easy things are easy, and the hard things are hard.

  31. Rob McAfee says:

    At the very least this should be documented in MSDN. Currently the docs state that cbSize must be initialized to sizeof(MENUITEMINFO), and also that the OS requirement is NT4 and above, which just isn’t true.

    GetMenuItemInfo is the only API I’ve run into in practice that does this sort of strict size enforcement. Is this practice commonplace? If so, I’m surprised I haven’t run into it with other API calls.

    I also have a hard time believing that tight range checking doesn’t work in practice… I’ve done a certain amount of API development with versioned software, though admittedly on a smaller scale than Win32, and it’s been my experience that it is a happy compromise. The NT4 implementation could easily have allowed for a small number of additional data members to be added and still have an extremely low risk of false positives, while allowing for forward compatibility with the forseeable future of Windows platforms. The gains outweigh the risks. With admittedly 20-20 hindsight, it turns out that only a 4-byte expansion was needed for compatibility with Win 2000/XP, but even if GetMenuItemInfo had allowed for an 8 or 16 byte expansion in the size-checking logic, a mere handful of programmers would be "lucky" enough to get away with uninit data.

    Finally: what does the NT4 implementation of GetMenuItemInfo care what the size is anyway? It’s not like it’s going to suddenly start writing to hbmpItem because you handed it a size > 0x2C…

  32. Rob McAfee says:

    Just to answer my own question "what does the NT4 implementation of GetMenuItemInfo care what the size is anyway"… I see what you were trying to do; prevent programs compiled for NT4 to advertise a bogus size if they are later executed, without recompilation, on a future version of Windows. Whether that’s reasonable is debatable, but it doesn’t make sense in this case due to the semantics of GetMenuItemInfo: you have to specify the data you want in fMask using flags. I suppose you could say you were worried about people not initializing their flags also, e.g.

    UINT flags; // Uh-oh! Uninitialized data.
    flags |= MIIM_TYPE; // Who knows what other flags are set in there.

    Instead of

    UINT flags = MIIM_TYPE;

    But that is really paranoid. At some point you have to trust people to be professionals.

  33. Raymond Chen says:

    GetMenuItemInfo uses both flags and size so in that sense it is indeed rather odd. But since this function thunks down to kernel mode, you have to know the size of the structure in order to marshal it properly. If some NT4 app accidentally set MIIM_BITMAP (say via stack garbage), Windows 2000 would say "Okay, let’s copy the larger structure" – and if the structure happened to lie right at the end of a page, trying to copy the larger structure would fault.

    Here is why the NT4 version case if you passed a bigger size: somebody takes a program written for Windows 2000 and running it on NT4. The program sets MIIM_BITMAP and calls GetMenuItemInfo and it succeeds (because the size is >= 0x2C) and the the app accesses the hbmpItem member and finds garbage.

    "At some point you have to trust people to be professionals."

    And setting the proper size is how you demonstrate that you know what you’re doing. (Though I’ve seen examples of this assumption not being true either…)

  34. Rob McAfee says:

    "The program sets MIIM_BITMAP and calls GetMenuItemInfo and it succeeds (because the size is >= 0x2C)"…

    Actually, I have no problem with NT4 failing on unexpected flags; it *should* fail. Unknown flags would only be introduced by mistake or by new client code that uses features that are advertised to be unavailable on NT4, not simple recompilations of legacy code, so I’m less touchy about the bar being higher in that case. I would expect such client code to have to do special work at runtime to be compatible with NT4. Basically I would have written the compat check in NT4’s GetMenuItemInfo like so:

    if (info->cbSize < sizeof(MENUITEMINFO) // Obviously bogus
    || info->cbSize > sizeof(MENUITEMINFO)+16 // Room for expansion
    || (info->fMask & ~MIIM_ALLFLAGS) != 0)) // I invented MIIM_ALLFLAGS, but you get the idea: No unknown flags please
    {
    // … SetLastError, then…
    return FALSE;
    }

    Or rather, I would write the check like that *now*, after discussing GetMenuItemInfo more than I would have thought possible! :)

  35. Peter Lund says:

    Another way could be having two size fields, one of them being negated. If they didn’t add up to 0 the programmer was…well, not really a programmer :)

    Weren’t the sizes also used to help automatic thunking between WIN16 and WIN32 10 years ago?

  36. Riad says:

    hi

    explain carefully the different effects that may occur at runtime with the following declaration of a string:

    char*const host="muser.brad.ac.uk"

    what possible differences could occur at runtime if the following alternatives were used:

    const char*host="muser.brad.ac.uk"

    thank you

    if you can email me the answer, i’ll be very greatfull

  37. Raymond Chen says:

    Commenting on this article has been closed.

  38. It’s all there in the tool helper library.

Comments are closed.