The format of string resources


Unlike the other resource formats, where the resource identifier is the same as the value listed in the *.rc file, string resources are packaged in “bundles”. There is a rather terse description of this in Knowledge Base article Q196774. Today we’re going to expand that terse description into actual code.

The strings listed in the *.rc file are grouped together in bundles of sixteen. So the first bundle contains strings 0 through 15, the second bundle contains strings 16 through 31, and so on. In general, bundle N contains strings (N-1)*16 through (N-1)*16+15.

The strings in each bundle are stored as counted UNICODE strings, not null-terminated strings. If there are gaps in the numbering, null strings are used. So for example if your string table had only strings 16 and 31, there would be one bundle (number 2), which consists of string 16, fourteen null strings, then string 31.

(Note that this means there is no way to tell the difference between “string 20 is a string that has length zero” and “string 20 doesn’t exist”.)

The LoadString function is rather limiting in a few ways:

  • You can’t pass a language ID. If your resources are multilingual, you can’t load strings from a nondefault language.

  • You can’t query the length of a resource string.

Let’s write some functions that remove these limitations.

LPCWSTR FindStringResourceEx(HINSTANCE hinst,
 UINT uId, UINT langId)
{
 // Convert the string ID into a bundle number
 LPCWSTR pwsz = NULL;
 HRSRC hrsrc = FindResourceEx(hinst, RT_STRING,
                     MAKEINTRESOURCE(uId / 16 + 1),
                     langId);
 if (hrsrc) {
  HGLOBAL hglob = LoadResource(hinst, hrsrc);
  if (hglob) {
   pwsz = reinterpret_cast<LPCWSTR>
              (LockResource(hglob));
   if (pwsz) {
    // okay now walk the string table
    for (int i = 0; i < uId & 15; i++) {
     pwsz += 1 + (UINT)*pwsz;
    }
    UnlockResource(pwsz);
   }
   FreeResource(hglob);
  }
 }
 return pwsz;
}

After converting the string ID into a bundle number, we find the bundle, load it, and lock it. (That's an awful lot of paperwork just to access a resource. It's a throwback to the Windows 3.1 way of managing resources; more on that in a future entry.)

We then walk through the table skipping over the desired number of strings until we find the one we want. The first WCHAR in each string entry is the length of the string, so adding 1 skips over the count and adding the count skips over the string.

When we finish walking, pwsz is left pointing to the counted string.

With this basic function we can create fancier functions.

The function FindStringResource is a simple wrapper that searches for the string in the default thread language.

LPCWSTR FindStringResource(HINSTANCE hinst, UINT uId)
{
 return FindStringResourceEx(hinst, uId,
     MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL));
}

The function GetResourceStringLengthEx returns the length of the corresponding string, including the null terminator.

UINT GetStringResourceLengthEx(HINSTANCE hinst,
 UINT uId, UINT langId)
{
 LPCWSTR pwsz = FindStringResourceEx
                       (hinst, uId, langId);
 return 1 + (pwsz ? *pwsz : 0);
}

And the function AllocStringFromResourceEx loads the entire string resource into a heap-allocated memory block.

LPWSTR AllocStringFromResourceEx(HINSTANCE hinst,
 UINT uId, UINT langId)
{
 LPCWSTR pwszRes = FindStringResourceEx
                       (hinst, uId, langId);
 if (!pwszRes) pwszRes = L"";
 LPWSTR pwsz = new WCHAR[(UINT)*pwszRes+1];
 if (pwsz) {
   pwsz[(UINT)*pwszRes] = L'\0';
   CopyMemory(pwsz, pwszRes+1,
              *pwszRes * sizeof(WCHAR));
 }
 return pwsz;
}

(Writing the non-Ex functions GetStringResourceLength and AllocStringFromResource is left as an exercise.)

Note that we must explicitly null-terminate the string since the string in the resource is not null-terminated. Note also that the string returned by AllocStringFromResourceEx must be freed with delete[]. For example:

LPWSTR pwsz = AllocStringFromResource(hinst, uId);
if (pwsz) {
  ... use pwsz ...
  delete[] pwsz;
}

Mismatching vector "new[]" and scalar "delete" is an error I'll talk about in a future entry.

Exercise: Discuss how the /n flag to rc.exe affects these functions.

Comments (34)
  1. Frederik Slijkerman says:

    It would be nicer to return a std::wstring from AllocStringFromResource(), so you don’t have to use delete[] at all.

  2. Reuben Harris says:

    The VC6 resource compiler doesn’t like strings with more than 256 characters, and will truncate those that are and give you a warning. This can be annoying if you’re doing verbose UI labels, such as a wizard page.

    Is this a restriction of the way Win32 expects resource strings to be stored, or is rc.exe just dim?

    (Great blog by the way… I’ve just read through your archive and have fallen into at least half the traps you warn about!)

  3. Dan Crevier says:

    Doesn’t AllocStringFromResourceEx suffer from the integer overflow problems you’ve been talking about recently?

  4. runtime says:

    Windows CE’s LoadString() resource API has a nice "underdocumented" feature to use resource strings without allocating extra memory or copying strings. According to Douglas Boling’s book "Programming Microsoft Windows CE", if you pass a NULL lpBuffer parameter to LoadString(), the API will return a read-only pointer to the string. Since the resource strings are not null-terminated, the string length is stored in the word preceeding the start of the resource string.

    The book also says you can request that resource strings be stored as null-terminated strings if you invoke the resource compiler with the -r command line switch. I don’t know if that feature is Windows CE specific.

  5. Mike Dunn says:

    One of my favorite tricks for loading strings is to use the little-known CString (MFC or WTL) constructor trick:

    CString str ((LPCTSTR) IDS_SOME_STRING);

    Then you can make a macro to do that on the fly:

    #define _S(id) (CString(LPCTSTR(id)))

    and use it inline, such as:

    MessageBox ( _S(IDS_BAD_ERROR), _S(IDS_MSGBOX_TITLE), MB_ICONERROR );

    Of course, the _S definition above is only for release mode. In debug mode, it’s a real function that does a LoadString and asserts if the string can’t be loaded.

  6. The Old New Thing talks about format of string resources (Windows)….

  7. B.Y. says:

    Since the EXE is always mapped into memory, if they designed the format so that strings are always terminated with a zero, can’t you can get read-only pointers to the strings without the need to allocate memory ?

  8. Joe says:

    Dan: No, it doesn’t. Because the size is coming from a short (0 – 65535), that gets cast to a UINT (0 – 2^32-1), and then has 1 added. There’s no way to overflow the allocator because 2 * 65536 < 2^32 – 1.

  9. runtime says:

    BY, you would think so, but maybe Microsoft wanted to save the "wasted" space of the null-terminator character? I bet 90% of the time, resource strings are used without modification. Microsoft should have optimized this common case with an API that just returned an easy to use, zero-copy, read-only pointer to the null-terminated resource string.

  10. Mike Dunn says:

    runtime – remember that when these APIs were designed and written, they had to work on machines with 4 MB of RAM (the lowest Win95 would run in). That’s four MEGAbytes. Lots of null bytes hanging around can add up if there are a lot of strings in the string table.

    Sure, now we don’t give it a second thought when a .NET app requires a 20MB download and uses 40MB of memory (that’s what SharpReader is at right now on my system). In 1993/94/95, things were a *lot* different.

  11. Raymond Chen says:

    Frederick: There was a discussion of std::wstring in previous blog comments: http://weblogs.asp.net/oldnewthing/archive/2004/01/21/61101.aspx

    I wrote these functions as if they were part of the Platform SDK. This means no language-specific constructs, and certainly no compiler-specific constructs. (std::wstring is not guaranteed to be compatible from one compiler to the next or even from one compiler VERSION to the next. The contract for std::wstring is at the source code level, not the ABI.)

    B.Y.: Try solving the exercise.

    runtime/Mike Dunn: I actually have a discussion of the historical basis for resource formats scheduled for a future entry. It’s even weirder than you think.

  12. asdf says:

    new is a language specific construct. And a non-throwing new is a compiler-specific construct (or a standards compliant compiler with exceptions disabled via a flag).

    Using /n it looks like the string length gets reported 1 WCHAR longer than it actually is.

  13. Mike Dunn says:

    Note: there’s a bug in the for loop in FindStringResourceEx(), the condition should be "i < (uId & 15)"

    Raymond, I noticed that FindStringResourceEx() returns a pointer to the block of memory occupied by the resource, but you call UnlockResource() and FreeResource(), which presumably might free that memory. Is this safe?

    OTOH, the docs on LockResource() say: "The pointer returned by LockResource is valid until the module containing the resource is unloaded." That implies that UnlockResource/FreeResource can’t free the memory because it would break LockResource(). So who’s right?

    As for the exercise, adding /n doesn’t break your functions, it just makes them allocate one extra WCHAR. When the strings are 0-terminated, the lengths are increased as well, so the code to walk the strings and find a particular one still works.

    The code assumes string table entries are not 0-terminated. They become 0-terminated with /n, so the string returned by AllocStringFromResourceEx() has two 0 chars at the end. Mostly harmless.

  14. Raymond Chen says:

    Yeah, I broke my own rule with new[]; I should have used LocalAlloc.

    Good catch on the precedence bug.

    UnlockResource and FreeResource are NOPs on Win32. More information to come in that promised future blog entry.

  15. dru says:

    How is the landId used?

    I couldn’t find a good reference to that

    on my MSDN CD via FindResourceEx?

  16. Raymond Chen says:

    Like the documentation says, it specifies the language of the resource you want to access. You can use the LANGUAGE directive in the *.rc file to provide resources in multiple languages.

  17. Frederik Slijkerman says:

    The best solution might be to introduce a new function ReleaseStringFromResource that would take the pointer from AllocStringFromResourceEx and free it properly, with delete[] or whatever.

    That way, you also reserve the right to change the allocation mechanism without breaking backwards compatibility.

  18. Mike Dimmick says:

    UnlockResource is a total no-op in the current SDK headers – it’s a macro which evaluates the argument, then discards the result.

    LockResource is a slightly more substantial no-op, because it’s implemented as a function. However, the implementation is basically:

    PVOID LockResource(HGLOBAL hGlob)

    {

    return (PVOID) hGlob;

    }

    (dumpbin /disasm is your friend…)

  19. Lonnie McCullough says:

    I have a question about the MAKELANGID macro (actually about langids in general). What is the difference between (LANG_NEUTRAL, SUBLANG_NEUTRAL) and (LANG_NEUTRAL, SUBLANG_DEFAULT)? Will the first map to the second if there are resources present in the user’s default language? Is there an algorithm for falling back from the user’s language to other languges in the resource file? I guess I’m just not sure how all this stuff is really handled and would like to know more (trying to do internationalization the right way if at all possible). Even a pointer to a resource would help greatly in clearing up the confusion in my head over how NEUTRAL,NEUTRAL contrasts with NEUTRAL,DEFAULT. Thanks for the great blog.

    Go Pats!!!

  20. Raymond Chen says:

    Lonnie: I’m going to have to defer on your question. I am not an internationalization expert and I wouldn’t want to give the wrong answer.

  21. David Kemp says:

    Lonnie:

    (LANG_NEUTRAL, SUBLANG_NEUTRAL) = Language Neutral

    (LANG_NEUTRAL, SUBLANG_DEFAULT) = User’s Default Language

    A language neutral string is different from one in a user’s default language.

    You can mark a resource as Language Neutral by using "LANGUAGE LANG_NEUTRAL, SUBLANG_NEUTRAL" in the resource file.

    I’d imagine you’d want to make a distinction between Language Neutral and the Default Language of your application tho. For example, you might want to default to American English if you don’t have resources for the user’s prefered language, but you might want to check the system’s default language first. You’d want to use Language for resources that are truly Language Neutral (so, probably never, as I’d doubt such a thing exists)

    [The MSDN reference for MAKELANGID is:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_97vo.asp]

  22. Raymond Chen says:

    Examples of language-neutral resources would be most icons and bitmaps. These are things that don’t change regardless of the language. (Of course you have to make sure your icon/bitmap doesn’t contain locale-sensitive imagery.)

  23. floyd says:

    Raymond Chen: "Examples of language-neutral resources would be most icons and bitmaps. These are things that don’t change regardless of the language."

    I’d have to disagree here. Although you cannot translate images in the same way as text resources, there are still potential language specific facets. Just think about bitmaps with text on them. With this being an exception that is easily perceived there are also less obvious nuances: a green coloured UI element would signal a successful operation to those living in western civilizations — if your product ships to asian countries you would rather change this colour to red.

    .f

    p.s.: Thanks a lot for sharing your experience. I stumbled across your blog today, almost by accident, and I like it already :)

  24. Raymond Chen says:

    True, bitmaps with text and culturally-dependnet images would need to be localized. But I ruled that out in my parenthetical.

    It’s a good idea to avoid locale-sensitive bitmaps because professional translators tend not also to be accomplished graphic artists.

  25. floyd says:

    Don’t get me wrong, I wasn’t going to challange you in any way. I merely meant to illustrate that it isn’t always as easy as it may appear to decide whether a resource is language-dependent. I would agree with you, that locale-dependent images are generally a bad idea, unless you have tool-support to track those as well as good reasons to go for that approach in the first place.

    I just didn’t want anyone reading this thread take it as a fact that images are generally locale-independent. With that said, I also have a not so obvious string resource that is in fact language-independent: let’s say you are writing an image processing application and need to support CMY color space — if you translate Cyan-Magenta-Yellow into local names, you will run into major trouble when it comes to printing your work.

    Anyway, I’m not an expert in this field either. But with all those bits and pieces I picked up along the way the only thing I can say is this: localization is a beast to master.

    .f

  26. Raymond Chen says:

    Agreed. Designing your code to be localizable is a lot of work and contains many pitfalls.

  27. Raymond Chen says:

    Commenting on this entry has been closed.

  28. Um eine WIN32-Applikation in mehreren Sprachversionen zu lokalisieren,

    gibt es neben den lokalisierten Forms auch die String tables, die sich in den Programmresourcen befinden.

    Wenn man sein Programm also mehrsprachig gestalten will, sind alle hardcoi

  29. The SZ (a.k.a. Steffen) asked in the suggestion box:

    What is the prefered way to select the &quot;most…

  30. No really, you can’t.

  31. Serdar asked: Hi, Is it possible to call GetLocaleInfo in a different language? What I’m trying to do

Comments are closed.