The evolution of version resources – corrupted 32-bit version resources


Last time we looked at the format of 32-bit version resources, but I ended with the remark that what you saw purported to be the resources of shell32.dll but actually weren't. What's going on here?

The resources I presented last time were what the resources of shell32.dll should have been, but in fact they aren't.

A common mistake in generating 32-bit resources is to mistreat the cbData field of the structure I called a VERSIONNODE as a count of characters rather than a count of bytes if the type is Unicode text. Even Microsoft's own Resource Compiler has fallen into this trap! For example, consider this VERSIONNODE I presented last time:

0098  4C 00         // cbNode (node ends at 0x0088 + 0x004C = 0x00D40)
009A  2C 00         // cbData
009C  01 00         // wType = 1 (string data)
009E  43 00 6F 00 6D 00 70 00 61 00 6E 00 79 00 4E 00
      61 00 6D 00 65 00 00 00
                    // L"CompanyName" + null terminator
00B6  00 00         // padding to restore alignment
00B8  4D 00 69 00 63 00 72 00 6F 00 73 00 6F 00 66 00
      74 00 20 00 43 00 6F 00 72 00 70 00 6F 00 72 00
      61 00 74 00 69 00 6F 00 6E 00 00 00
                    // L"Microsoft Corporation" + null terminator
00E4                // no padding needed

In real life, the data take the following form:

0098  4C 00         // cbNode (node ends at 0x0088 + 0x004C = 0x00D40)
009A  16 00         // cchData (!)
009C  01 00         // wType = 1 (string data)
...

These malformed version resources manage to get away without crashing too horribly because the standard format of version resources uses string data only in leaf nodes. Therefore, the incorrect cbData affects only the node itself and doesn't cause the child nodes to be parsed incorrectly (since there are no child nodes).

Until somebody tries to read, say, \StringFileInfo\040904B0\CompanyName\oops. After the VerQueryValue function locates the VERSIONNODE corresponding to CompanyName, it tries to locate the first child node and, due to the incorrect cbData, ends up misinterpreting the middle of the string as if it were the start of a child VERSIONNODE. Things only go downhill from there.

They're just lucky that nobody actually asks for that.

But wait, there's more. Somebody who calls the VerQueryValueA function expects to have the version string returned as ANSI, so VerQueryValueA needs to know how many characters to convert from Unicode to ANSI. If VerQueryValue trusted the erroneous cbData value, then ANSI callers would get only half the data they were expecting.

As a result of this mess, the VerQueryValue function keeps its eyes open and anticipates that the version resource it was given to parse may have been generated by one of these buggy version resource compilers and goes to some extra effort to accommodate those bugs.

Comments (11)
  1. Anonymous says:

    Are you saying that rc.exe has had this bug for 15 years?

  2. Anonymous says:

    Now that’s bad-memory lane, I tripped over this problem several times. In the Win98 era before VerQueryValue was fixed, there used to be a KB article about the problem. I never knew what the core problem was, though. At the time, there must not have been a lot of Unicode resources on files.

  3. Anonymous says:

    No resource compiler today could get away with writing a byte count because VerQueryValueW returns this count directly in its puLen parameter. puLen is — you guessed it — documented as *character* count, and has been for at least a decade.

  4. Anonymous says:

    If you try to call GetFileVersionInfo for UPX-compressed executable, it will cause a crash in krnl386.exe under Windows 98 (the bug is corrected in Windows XP). So, if you want your application to work under all versions of Windows, you may want to parse the resources yourself without relying on (those buggy) Win 32 API functions.

    [Sounds like a bug in UPX to me – it’s generating corrupted binaries. Remember, the Windows 3.1 series assumes that you are doing the right thing. -Raymond]
  5. Anonymous says:

    Um, I’m not seeing where VerQueryValueW has

    to copy any data at all.

    Sorry you’re right, it’s the program that has to do the copying.  VerQueryValueW tells the program how many characters to copy.

    Does VerQueryValueW figure out the correct number of characters even when cbData isn’t a byte count?  (OK, I should experiment instead of asking.  So far I’ve only needed this on Windows CE where it works well enough.  VerQueryValueW reports the correct number of characters there (after the .rc file has been hand edited).  I didn’t look at the cbData field in the binary.)

  6. Anonymous says:

    > But wait, there’s more. Somebody who calls the

    > VerQueryValueA function expects to have the

    > version string returned as ANSI, so

    > VerQueryValueA needs to know how many

    > characters to convert from Unicode to ANSI.

    > If VerQueryValue trusted the erroneous cbData

    > value, then ANSI callers would get only half

    > the data they were expecting.

    I think there’s more.

    (1) Somebody who calls the VerQueryValueW function expects to have the version string returned as Unicode, so VerQueryValueW needs to know how many characters to copy.  If VerQueryValue trusted the erroneous cbData value, then Unicode callers would get only half the data they were expecting.[*]

    (2)  Somebody who calls the VerQueryValueA function expects to have the version string returned as ANSI, so VerQueryValueA needs to know how many characters to convert from Unicode to ANSI. If VerQueryValue trusted the erroneous cbData value, then ANSI callers would get some random fraction of the data they were expecting.  When a Unicode character converts to a two-byte ANSI character, the caller might get both bytes.  Though this is just hypothetical because we can’t really test it — VerQueryValue knows not to trust cbData so a test would only find out what VerQueryValue actually does.

    [* If the data include surrogate pairs then the fraction might be random.]

    [Um, I’m not seeing where VerQueryValueW has to copy any data at all. -Raymond]
  7. Anonymous says:

    Why is the length embedded at all? Redundant information.

    [It’s not redundant for binary data or strings with embedded nulls. (I can’t believe I had to write that.) -Raymond]
  8. Anonymous says:

    Are arbitrary binary data allowed in string fields? How can a null
    terminated string have embedded nulls? A terminating NIL char could
    have been used to terminate the string instead of a byte count integer.

    [It’s a string field, not a null-terminated string
    field. You can see embedded NULs in the 16-bit version resources a few
    days ago. (More evidence that people don’t actually read my entries.)
    -Raymond
    ]
  9. Anonymous says:

    > You can see embedded NULs in the 16-bit

    > version resources a few days ago.

    The ones I noticed were intended to be terminators.  I didn’t notice any that weren’t intended to be terminators.

    Since ordinary string resources don’t automatically get NUL terminators appended, programmers have to code the terminators themselves[*], and some programmers didn’t notice that version string resources are different.  Some of those programmers produced some versions of Visual C++, so a lot of executables have redundant NUL terminators.  I never complained about this very minor bug, a very slight waste of memory with no other consequences.  Had I been involved, I would have given priority to fixing more serious bugs than to this one.  Though I don’t have any complaint about its having been fixed either.

    Anyway, do you know of cases where NULs were intended to be embedded rather than intended to be terminators?

    Putting binary data in fields that are labelled as non-binary sometimes causes bugs.  For example BSTRs sometimes get converted to ANSI without the programmer noticing because the programmer’s code page is different from the customer’s code page.

    [I’m sure you’ve used strings with embedded nulls. No need to ask me for examples. -Raymond]
  10. Anonymous says:

    I’m sure you’ve used strings with embedded

    nulls.

    Yes and no.  I coded stuff using string syntax with embedded nulls in RCDATA resources, i.e. binary resources.  I did not do so in STRING resources.

    In a setting having nothing to do with resources, my discovery that sometimes BSTRs get converted to ANSI did come the hard way, but I luckily discovered it before the product shipped and I’ve never repeated that mistake.  I learned belatedly that the MSDN section that I had read included an invisible restriction (visible in some other pages that I belatedly discovered) so it didn’t apply to the particular code I had written.

  11. Anonymous says:

    Previous blogs in this series: 0: A long journey begins with the zeroeth step One of the first things

Comments are closed.