Why does SHCOLUMNINFO have unusually tight packing?

Alternate title: News flash: Sometimes things happen by mistake

rbirkby asks why the SHCOLUMNINFO structure has 1-byte packing. "Was the expectation that there would be so many columns in a details view that the saving would be worthwhile?"

Hardly anything that clever or ingenious. It's just the consequence of a mistake.

When the SHCOLUMNINFO structure was added to the header file in the Windows 2000 timeframe, it was added with no specific packing directive. But it turns out that there was a specific packing directive; it just wasn't obvious. Near the top of the shlobj.h header file was the following:

#include <pshpack1.h>   /* Assume byte packing throughout */

(There was of course a matching #include <poppack.h> at the bottom.) This set the default packing for the entire header file to byte packing instead of natural alignment.

By the time this mistake was identified, it was too late. Windows 2000 had already shipped, byte packing and all. And once the code ships, it's done. You're stuck with it.


Comments (22)
  1. Dan Bugglin says:

    Congratulations, this blog post is already Google result #2 for "SHCOLUMNINFO".

  2. NB says:

    Does this have any unwanted consequences then?

  3. Sunil Joshi says:


    It can lead to members being stored at non naturally aligned offsets. This could lead to lower performance on certain architechtures (not x86) which do not support unaliged access.

  4. SimonRev says:


    Wouldn't this lead to decreased performance on the x86 and exceptions (crashes) on CPU architectures that do require aligned access?  (I would presume that the SHCOLUMNINFO for the embedded Windows platforms would be properly aligned).

    Now, you would probably never actually notice the decreased performance on the x86, because I doubt anyone does enough manipulation of the SHCOLUMNINFO to ever notice.

  5. Random832 says:

    SimonRev, For members of a structure declared as packed [rather than, say, pointers that just happen to have an odd numbered value that they shouldn't], the compiler will generate the extra code needed to unpack the number (i.e. getting two adjacent words and shifting bits of them together, or maybe just accessing the four bytes separately)

  6. Henning Makholm says:

    Sunil, it bombs performance on x86 too. The processor makes sure (for legacy instructions at least; don't try with SIMD ones) that unaligned accesses give you the right result, but it takes its own sweet time doing it (usually involving costly traps to microcode).

  7. Pierre B. says:

    Actually, unaligned access is only slower when crossing cache-line boundaries, and then only if you would not have accessed the data on each side of the boundary. If you are processing data and accessing all the elements consecutively, then the performance hit will be undetectable and you could even have a gain, thanks to the higher-density packing, when you have a lot of consecutive elements.

    There is no "drop to microcode trap" due to misaligned data access on x86.

    Tests: http://www.alexonlinux.com/aligned-vs-unaligned-memory-access

  8. Sunil Joshi says:

    @Henning Makholm

    I have never actually measured but my understanding was that on x86 unaligned accesses (except for _m128 and it's ilk) did not perform badly enough to worry. I have ready this in several places mostly recent in a description of IA-32el the dynamic translator of IA-32 to Itanium code. On of the problems it had to handle was the frequent use of unaligned access in IA-32 code and it says "the penalty for unaligned access on the IA-32 architecture is ver low." Similarly, I know vc does not have an __unaligned ptr type for x86 (unlike IA-64) as it's not worth it; the UNALIGNED macro is defined as nothing on x86.  

    Also unaligned SIMD performance is much better on Nehalem. Or so Intel claims.


    If the compiler knows about unaligned access (as it would if you access the members of this structure using . Or -> it can generate fixups.) This is why __unaligned works. In the general case, when the compiler does not know (i.e. If you pass a integer ptr to a separately compiled function) you get exceptions and fireworks etc. Unless the OS emulate unaligned access for you.

  9. Nawak says:

    But what structure(s) in shlobj.h did require the packing directive that was previously there? Was there a structure that was used so much that packing made a noticeable change in memory usage?

  10. Anonymous Coward says:

    Given that packing matters so little on the processor platforms on which Windows is most popular, and that the number of these structures allocated will probably be relatively small, I think even calling it a mistake is saying too much. This is a ‘don't care’.

  11. Evan says:


    I'm not sure that's true. I think I've heard somewhere recently that unaligned accesses on x86 actually aren't too bad. I wrote a little program to at least provide a tiny bit of evidence; after presenting it, I discuss possible failings. I'm on Linux at the moment so this would have to be modified a bit for Windows (e.g. the method of exiting):


       mov rdi, 2000000000  ; rdi = 2 billion

       mov rsi, rsp         ; rsi = rsp

       add rsi, 2           ; rsi -=2           OPTIONAL

       jmp bottom


       mov rbx, [rsi]       ; rbx = *rsi

       sub rdi, 1           ; rdi -= 1


       cmp rdi, 0

       jg top               ; if rdi>0, goto top

       mov rax, 1

       int 0x80

    I compiled it with '~/Downloads/nasm-2.09.04/nasm -f elf64 hello.s' and linked with 'ld -s -o hello-unaligned hello.o'. Then I removed the line labeled "OPTIONAL" and rebuilt, naming it 'hello-aligned'. I then tested with 'time hello-aligned; time hello-unaligned'. I ran it 6 times (there's no real method to that). This is running on a Core 2 Quad Q9400 (2.66 gHz).

    The aligned version completed in 0.753 sec ("total" time) in five out of the six trials; the sixth took 0.754 sec. The unaligned version completed in 0.753 sec in four out of the six trials, and in 0.754 sec in the remaining two trials.

    So the unaligned access to [rsp-2] in hello-unaligned vs the aligned access to [rsp] in hello-aligned doesn't seem to be hurting this particular test case.

    The are a couple ways in which this test may not be so great. First, obviously the access is to the same memory location in each iteration of the loop. Thus the cache line will be warmed immediately and then there will be no misses. It's quite possible that an unaligned access that has to go to either L2 or main memory will incur an unaligned penalty. Second, the results when I was running a 32-bit version of that program (just replace 'r' with 'e' in all the register names) were a bit different. While most runs were the same, every so often the unaligned version would take far longer.

    Some quick searching indicates that there probably *isn't* a performance penalty to unaligned accesses on x86 unless the accessed region spans a cache line. It's possible that this is what was happening in the long case.

  12. 640k says:

    If it's a ‘don't care’, it's also a ‘don't use’.

  13. Gabe says:

    Am I correct in determining that the member missing padding is VARTYPE vt, which is 2 bytes, thus shifting the following 3 fields (DWORD, UINT, DWORD) out of alignment? After those are a couple Unicode strings which should also have 2-byte alignment, so they shouldn't be affected by it.

    So to me it looks like the padding mistake saves 2 bytes in a 450-byte data structure and causes misaligned accesses to 4-byte 3 members. I would have to agree with Anonymous that this is a ‘don't care’.

  14. Neil says:

    @Nawak ITEMIDLIST used to live in ShlObj.h but it now lives in ShTypes.h; it needs byte packing because it's a concatenation of SHORT lengths and variable-length BYTE arrays.

  15. James Curran says:

    Does this mean your post backlog is now up to just 10 days shy of 3 years?  

    ["Objection, your honor. Assumes facts not in evidence." -Raymond]
  16. Dave says:

    once the code ships, it's done. You're stuck with it.

    Unless it's open source code, in which case you can change it and break backwards compatibility, because hey, it's open source.  And then later you can change your mind again and change something else.  And then three releases down you can change it again.  But hey, it's still open source.

  17. Smartass says:

    Apology accepted.

    Still waiting for the Windows ME apology.

  18. Smaug says:

    >once the code ships, it's done. You're stuck with it.

    >Unless it's open source code

    Or unless it's made by Apple. You know, they (try to) fix their mess-ups every few years, because they allow themselves to break downwards compatibility. And probably MS should do that too. Not all the time for minor things, mind you, but once ever two system versions would be fine. I am writing code for Windows 7 right now, and if it didn't work on 2k, I wouldn't be bothered. Because you know why? It does not work on 2k to begin with.

    Having a problem nowadays which was made 10 years ago sounds quite a bit harsh to me, considering how fast the software world moves.

    [Okay, so fine, you decide that Windows 7 won't support the Windows 2000 SHCOLUMNINFO structure. But the Windows 2000 SHCOLUMNINFO structure is the same as the Windows XP SHCOLUMNINFO structure, so you can't change it or you break Windows XP compatibility. -Raymond]
  19. WinXP says:

    Smaug: What you don't realize is that the only reason people are using Win7 in the first place is that it runs approximately all Win2k software. If Win7 couldn't run important Win2k software, people just wouldn't use it, and you would have no market for your Win7-only software. WinXP was released nearly 10 years ago and some major corporations have only recently finished migrating FROM Win2k TO WinXP!

    Microsoft doesn't control the hardware so they have no choice over whether to have backwards compatibility. If they want to sell a new version of Windows, it had damn well better run virtually everything that the old version ran, including that mission-critical VB6 app whose source code was lost 10 years ago. Even with the quite good backwards compatibility of Win7, WinXP still has 50% market share! MS has to stay compatible with everything but the worst security problems for decades if they want to keep selling software.

    Companies like Apple that make the only hardware that their software runs on have the luxury of being able to stop making hardware that will run their old software. Since people will eventually need to buy new hardware and new hardware requires a new OS, people have no choice but to "upgrade", whether their old programs run or not. And of course they make their new software stop supporting old hardware, forcing anybody who wants to run new software to have to upgrade their hardware also. Thus, Apple has a sliding window of just a few years where they have to keep compatibility. Of course, it helps that people generally don't write their mission-critical apps for Apple operating systems (perhaps because they'd have to keep rewriting them), so backwards compatibility isn't such a big deal for Apple in the first place.

  20. Engywuck says:

    I see it the same as "WinXP". Heck, we even still use Win98 on some rare computers because the program running on them won't work with XP. (Another program could be convinced to work with 2k by some strange ancient voodoo but stops working for good on XP).

  21. Dave says:

    What you don't realize is that the only reason people are using Win7 in the first place

    is that it runs approximately all Win2k software.

    I was going to point out the same thing, the reason why Windows is pretty much the universal environment for computing is because MS bend over backwards to retain backwards-compatibility.  I still have users running NT SP6 machines, and my code will run on them without any problems (and, just for reference, on Win7 as well).  OTOH doing that with an Apple machine, or Linux, or […], not a hope.

  22. Dave says:

    Companies like Apple that make the only hardware that their software runs on have the

    luxury of being able to stop making hardware that will run their old software.

    Another thing that Apple can get away with, which almost no other company can, is to tell their users point-blank "X isn't being supported any more, gee it sucks to be you, get something newer", and most users will do so without much complaint.  This makes them more a special case-study in abnormal customer behaviour than an example for others to follow.

Comments are closed.