The kooky STRRET structure


If you’ve messed with the shell namespace, you’ve no doubt run across the kooky STRRET structure, which is used by IShellFolder::GetDisplayNameOf to return names of shell items. As you can see from its documentation, a STRRET is sometimes an ANSI string buffer, sometimes a pointer to a UNICODE string, sometimes (and this is the kookiest bit) an offset into a pidl. What is going on here?

The STRRET structure burst onto the scene during the Windows 95 era. Computers during this time were still comparatively slow and memory-constrained. (Windows 95’s minimum hardware requirements were for 4MB of memory and a 386DX processor – which ran at a whopping 25MHz.) It was much faster to allocate memory off the stack (a simple “sub” instruction) than to allocate it from the heap (which might take thousands of instructions!), so the STRRET structure was designed so the common (for Windows 95) scenarios could be satisfied without needing a heap allocation.

The STRRET_OFFSET flag took this to an even greater extreme. Often, you kept the name inside the pidl, and copying it into the STRRET structure would take, gosh, 200 clocks (!). To avoid this wasteful memory copying, STRRET_OFFSET allowed you to return just an offset into the pidl, which the caller could then copy out of directly.

Woo-hoo, you saved a string copy.

Of course, as time passed and computers got faster and memory became more readily available, these micro-optimizations have turned into annoyances. Saving 200 clock cycles on a string copy operation is hardly worth it any more. On a 1GHz processor, a single soft page fault costs you over a million cycles; a hard page fault costs you tens of millions.

You can copy a lot of strings in twenty million cycles.

What’s more, the scenarios that were common in Windows 95 aren’t quite so common any more, so the original scenario that the optimization was tailored for hardly occurs any more. It’s an optimization that has outlived its usefulness.

Fortunately, you don’t have to think about the STRRET structure any more. There are several helper functions that take the STRRET structure and turn it into something much easier to manipulate.

The kookiness of the STRRET structure has now been encapsulated away. Thank goodness.

Comments (10)
  1. nikita says:

    It was much faster to allocate memory off the stack (a

    > simple "sub" instruction) than to allocate it from the

    > heap (which might take thousands of instructions!),

    Eh? Surely stack allocation became faster relative

    to the generic memory allocator during last ten years.

    For one thing, all `per-cpu magazines’ notwithstanding,

    allocator still acquires some kind of lock occasionaly,

    and resulting coherent bus traffic is very expensive.

    Taking even uncontended spin-lock on modern x86 is

    hundreds cycles (but still one instruction).

  2. AF says:

    On a 1GHz processor, a single soft page fault costs you over a million cycles

    What? Two context switches, one new PTE, and a 4k memcpy… takes 1e6 cycles?

  3. Michael says:

    Yes, on a 1GHz processor 1 million cycles is a millisecond, which seems way too long for a soft page fault. Raymond, surely you mean a hard page fault?

  4. Raymond Chen says:

    Hard page faults are killer since you are at the mercy of the disk drive. It’s not too unusual for this to be as slow as 10ms.

    Soft page faults are more like 80,000 cycles according to this article http://msdn.microsoft.com/library/en-us/dnvc60/html/optcode.asp

  5. Merle says:

    I love the way the MSDN article "translates" times into "human" terms:

    "Therefore, a typical "soft" page fault incurs a 200-microsecond penalty, which is 80,000 CPU cycles. To put that in human terms, if it took 1 second to read a byte from the primary CPU cache, it would take almost a day to process a page fault."

    Erm, yeah, but your 10ms hard drive access would translate into a fifty day penalty. Kind of like physical-mailing off for a book from Botswana.

    And, yes, this is a huge performance hit, no doubt. I just have a beef with their comparison.

  6. Skywing says:

    Not all page faults incur a disk access. For instance, touching an uncommitted range will cause a page fault, but will not cause a pagefile hit (instead, you’ll eventually see a STATUS_ACCESS_VIOLATION exception in user mode).

  7. josh says:

    I’m confused… how much did a page fault cost on a 25MHz machine then?

  8. Eric Newton says:

    So STRRET is still around but just encapsulated?

    Ah well, at least when its encapsulated, the possibility exists to get rid of a dated idea.

  9. Jack Mathews says:

    Skywing: Right, that’s a soft page fault. That’s why there’s the differentiation

  10. Jack Mathews says:

    josh:

    Well, since disks haven’t sped up THAT much, page faults took less clock cycles. So if disks are twice as fast, but processors are 10x faster, then page faults would take 5x less clock cycles on the slower machine. Same with RAM. RAM hasn’t kept up with CPU clock speeds, so you can do similar kinds of math for that.

Comments are closed.