Converting between LCIDs and RFC 1766 language codes


Occasionally, I see someone ask for a function that converts between LCIDs (such as 0x0409 for English-US) and RFC 1766 language identifiers (such as "en-us"). The rule of thumb is, if it's something a web browser would need, and it has to do with locales and languages, you should look in the MLang library. In this case, the IMultiLanguage::GetRfc1766FromLcid method does the trick.

For illustration, here's a program that takes US-English and converts it to RFC 1766 format. For fun, we also convert "sv-fi" (Finland-Swedish) to an LCID.

#include <stdio.h>
#include <ole2.h>
#include <oleauto.h>
#include <mlang.h>

int __cdecl main(int argc, char **argv)
{
 HRESULT hr = CoInitialize(NULL);
 if (SUCCEEDED(hr)) {
  IMultiLanguage * pml;
  hr = CoCreateInstance(CLSID_CMultiLanguage, NULL,
                        CLSCTX_ALL,
                        IID_IMultiLanguage, (void**)&pml);
  if (SUCCEEDED(hr)) {

   // Let's convert US-English to an RFC 1766 string
   BSTR bs;
   LCID lcid = MAKELCID(MAKELANGID(LANG_ENGLISH,
                        SUBLANG_ENGLISH_US), SORT_DEFAULT);
   hr = pml->GetRfc1766FromLcid(lcid, &bs);
   if (SUCCEEDED(hr)) {
    printf("%ws\n", bs);
    SysFreeString(bs);
   }

   // And a sample reverse conversion just for good measure
   bs = SysAllocString(L"sv-fi");
   if (bs && SUCCEEDED(pml->GetLcidFromRfc1766(&lcid, bs))) {
    printf("%x\n", lcid);
   }
   SysFreeString(bs);

   pml->Release();
  }
  CoUninitialize();
 }
 return 0;
}

When you run this program, you should get

en-us
81d

"en-us" is the RFC 1766 way of saying "US-English", and 0x081d is MAKELCID(MAKELANGID(LANG_SWEDISH, SUBLANG_SWEDISH_FINLAND), SORT_DEFAULT).

If you browse around, you'll find lots of other interesting functions in the MLang library. You may recall that earlier we saw how to use MLang to display strings without those ugly boxes.

Update (January 2008): The globalization folks have told me that they'd prefer that people didn't use MLang. They recommend instead the functions LCIDToLocaleName and LocaleNameToLCID. The functions are built into Windows Vista and are also available downlevel via a redistributable.

Comments (17)
  1. aon says:

    Simply read HKCU/MIME/Database/RFC1766. Problem solved.

  2. Derek says:

    Yes, because it’s always better to go delving into the registry than to use the documented API calls. . . .

  3. Bob says:

    Especially when that’s the wrong path. It should be HKLM/Software/Classes/MIME/Database/Rfc1766.

    But yeah, use the APIs instead, even though it means more blasted COM.

  4. PatriotB says:

    Looks like you could also use Rfc1766ToLcid (http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/reference/functions/rfc1766tolcid.asp) and LcidToRfc1766 (http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/reference/functions/rfc1766tolcid.asp), functions exposed directly from mlang.dll. Of course this requires IE 5.5 or newer, whereas IMultiLanguage is available from IE 4.0 onward.

  5. Fin says:

    RFC 1766 is long obolete (replaced by rfc3066) – best conversion function to use is LCIDToLocaleName (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_LCIDToLocaleName.asp?frame=true). This is base platform functionality rather than MLang.

  6. A says:

    best conversion function to use is LCIDToLocaleName

    …except it doesn’t exist on any shipping version of Windows.

    Windows NT/2000/XP/Vista: Included in Windows Vista and later.

    Windows 95/98/Me/NT/2000/XP: Unsupported.

  7. michkap says:

    There are other problems with MLang though. I will blog about this tonight….

  8. Rob says:

    How do you determine the character set of each code page? For example, for code page 932?

  9. michkap says:

    Hey Rob — Do you meant font charset? That would make a cool blog topic, too. :-)

    If you mean the one you would use in web pages, that’s a bit harder.

  10. Rob says:

    What I want to do is to display all the characters in a particular code page i.e. codepage 932 is japanese. I want to gather all those characters and displays them using something like the following code snipplet.

    SomeFunkyFunction(void)

    {

    CString s, s1;

    for (int i = BEGIN; i < END; i++)

    {

    // concatenate the character into string.

    s1.Format(_T("%c"), i);

    s += s1;

    }

    return s;

    }

    // Any recommendations?

  11. michkap says:

    You should put this in the suggestion box on my blog (http://blogs.msdn.com/michkap/482609.aspx)….

  12. KJK::Hyperion says:

    Rob: HKEY_CLASSES_ROOTMIMEDatabaseCodepage for the codepage-to-charset conversion, and HKEY_CLASSES_ROOTMIMEDatabaseCharset for the reverse. There’s no API for it AFAIK

  13. KJK::Hyperion says:

    oh, nevermind

  14. Norman Diamond says:

    Thursday, January 05, 2006 5:53 PM by Rob

    > all the characters in a particular code page

    > […]

    > for (int i = BEGIN; i < END; i++)

    > s1.Format(_T("%c"), i);

    I can’t find any spec for what happens when the value of i isn’t a valid character’s codepoint.

    But there’s something else odd about this. In a Unicode compilation, you won’t know what code page(s) contain the character that you’re dealing with in Unicode. In an ANSI compilation, your %c format can only handle a single-byte character. In an ANSI compilation you’ll have two problems. One is the same as above, I can’t find any spec for what happens when the value of i isn’t a valid character’s codepoint (i.e. might just be the lead byte of a two-byte character, or might not be a valid lead byte of anything). The other is that you never format any double-byte characters, so you miss most of the characters in the code page.

  15. Rob says:

    KJK::Hyperion,

    How do you use those data? Any idea? They don’t really display the characters. For the CharSet it only gives (Default) and AliasForCharset. Any recommendation would be appreciated.

  16. Rob says:

    Norman Diamond,

    Yeah that’s the problem that i’m running into. Deciphering the ranging and the getting the correct double for each individual codepage seems like a daunting task. Any recommendation?

  17. michkap says:

    Hi Rob —

    I posted what I think is the easiest approach over in my blog (http://blogs.msdn.com/michkap/510411.aspx). The method covers all of the problems outlined in this thread (and several others, like best fit mappings).

    Enjoy!

Comments are closed.

Skip to main content