How to display a string without those ugly boxes


You’ve all seen those ugly boxes. When you try to display a string and the font you have doesn’t support all of the characters in it, you get an ugly box for the characters that aren’t available in the font.

Start with our scratch program and add this to the PaintContent function:

void
PaintContent(HWND hwnd, PAINTSTRUCT *pps)
{
    TextOutW(pps->hdc, 0, 0,
            L"ABC\x0410\x0411\x0412\x0E01\x0E02\x0E03", 9);
}

That string contains the first three letters from three different alphabets: “ABC” from the Roman alphabet; “АБВ” from the Cyrillic alphabet; and “กขฃ” from the Thai alphabet.

If you run this program, you get a bunch of ugly boxes for the non-Roman characters because the SYSTEM font is very limited in its character set support.

But how to pick the right font? What if the string contained Korean or Japanese characters? There is no single font that contains every character defined by Unicode. (Or at least, none that is commonly available.) What do you do?

This is where font linking comes in.

Font linking allows you to take a string and break it into pieces, where each piece can be displayed in an appropriate font.

The IMLangFontLink2 interface provides the methods necessary to do this breaking. GetStrCodePages takes the string apart into chunks, such that all the characters in a chunk can be displayed by the same font, and MapFont creates the font.

Okay, so let’s write our font-link-enabled version of the TextOut function. We’ll do this in stages, starting with the idea kernel.

#include <mlang.h>

HRESULT TextOutFL(HDC hdc, int x, int y, LPCWSTR psz, int cch)
{
  ...
  while (cch > 0) {
    DWORD dwActualCodePages;
    long cchActual;
    pfl->GetStrCodePages(psz, cch, 0, &dwActualCodePages, &cchActual);
    HFONT hfLinked;
    pfl->MapFont(hdc, dwActualCodePages, 0, &hfLinked);
    HFONT hfOrig = SelectFont(hdc, hfLinked);
    TextOut(hdc, ?, ?, psz, cchActual);
    SelectFont(hdc, hfOrig);
    pfl->ReleaseFont(hfLinked);
    psz += cchActual;
    cch -= cchActual;
  }
  ...
}

After figuring out which code pages the default font supports, we walk through the string asking GetStrCodePages to give us the next chunk of characters. From that chunk, we create a matching font and draw the characters in that font at “the right place”. Repeat until all the characters are done.

The rest is refinement and paperwork.

First of all, what is “the right place”? We want the next chunk to resume where the previous chunk left off. For that, we take advantage of the TA_UPDATECP text alignment style, which says that GDI should draw the text at the current position, and update the current position to the end of the drawn text (therefore, in position for the next chunk).

Therefore, part of the paperwork is to set the DC’s current position and set the text mode to TA_UPDATECP:

  SetTextAlign(hdc, GetTextAlign(hdc) | TA_UPDATECP);
  MoveToEx(hdc, x, y, NULL);

Then we can just pass “0,0” as the coordinates to TextOut, because the coordinates passed to TextOut are ignored if the text alignment mode is TA_UPDATECP; it always draws at the current position.

Of course, we can’t just mess with the DC’s settings like this. If the caller did not set TA_UPDATECP, then the caller is not expecting us to be meddling with the current position. Therefore, we have to save the original position and restore it (and the original text alignment mode) afterwards.

  POINT ptOrig;
  DWORD dwAlignOrig = GetTextAlign(hdc);
  SetTextAlign(hdc, dwAlignOrig | TA_UPDATECP);
  MoveToEx(hdc, x, y, &ptOrig);
  while (cch > 0) {
    ...
    TextOut(hdc, 0, 0, psz, cchActual);
    ...
  }
  // if caller did not want CP updated, then restore it
  // and restore the text alignment mode too
  if (!(dwAlignOrig & TA_UPDATECP)) {
    SetTextAlign(hdc, dwAlignOrig);
    MoveToEx(hdc, ptOrig.x, ptOrig.y, NULL);
  }

Next is a refinement: We should take advantage of the second parameter to GetStrCodePages, which specifies the code pages we would prefer to use if a choice is avialable. Clearly we should prefer to use the code pages supported by the font we want to use, so that if the character can be displayed in that font directly, then we shouldn’t map an alternate font.

  HFONT hfOrig = (HFONT)GetCurrentObject(hdc, OBJ_FONT);
  DWORD dwFontCodePages = 0;
  pfl->GetFontCodePages(hdc, hfOrig, &dwFontCodePages);
  ...
  while (cch > 0) {
    pfl->GetStrCodePages(psz, cch, dwFontCodePages, &dwActualCodePages, &cchActual);
    if (dwActualCodePages & dwFontCodePages) {
      // our font can handle it - draw directly using our font
      TextOut(hdc, 0, 0, psz, cchActual);
    } else {
      ... MapFont etc ...
    }
  }
  ...

Of course, you probably wonder this magical pfl comes from. It comes from the Multilanguage Object in mlang.

  IMLangFontLink2 *pfl;
  CoCreateInstance(CLSID_CMultiLanguage, NULL,
                   CLSCTX_ALL, IID_IMLangFontLink2, (void**)&pfl);
  ...
  pfl->Release();

And of course, all the errors we’ve been ignoring need to be taken care of. This does create a big of a problem if we run into an error after we have already made it through a few chunks. What should we do?

I’m going to handle the error by drawing the string in the original font, ugly boxes and all. We can’t erase the characters we already drew, and we can’t just draw half of the string (for our caller won’t know where to resume). So we just draw with the original font and hope for the best. At least it’s no worse than it was before font linking.

Put all of these refinements together and you get this final function:

HRESULT TextOutFL(HDC hdc, int x, int y, LPCWSTR psz, int cch)
{
  HRESULT hr;
  IMLangFontLink2 *pfl;
  if (SUCCEEDED(hr = CoCreateInstance(CLSID_CMultiLanguage, NULL,
                      CLSCTX_ALL, IID_IMLangFontLink2, (void**)&pfl))) {
    HFONT hfOrig = (HFONT)GetCurrentObject(hdc, OBJ_FONT);
    POINT ptOrig;
    DWORD dwAlignOrig = GetTextAlign(hdc);
    if (!(dwAlignOrig & TA_UPDATECP)) {
      SetTextAlign(hdc, dwAlignOrig | TA_UPDATECP);
    }
    MoveToEx(hdc, x, y, &ptOrig);
    DWORD dwFontCodePages = 0;
    hr = pfl->GetFontCodePages(hdc, hfOrig, &dwFontCodePages);
    if (SUCCEEDED(hr)) {
      while (cch > 0) {
        DWORD dwActualCodePages;
        long cchActual;
        hr = pfl->GetStrCodePages(psz, cch, dwFontCodePages, &dwActualCodePages, &cchActual);
        if (FAILED(hr)) {
          break;
        }

        if (dwActualCodePages & dwFontCodePages) {
          TextOut(hdc, 0, 0, psz, cchActual);
        } else {
          HFONT hfLinked;
          if (FAILED(hr = pfl->MapFont(hdc, dwActualCodePages, 0, &hfLinked))) {
            break;
          }
          SelectFont(hdc, hfLinked);
          TextOut(hdc, 0, 0, psz, cchActual);
          SelectFont(hdc, hfOrig);
          pfl->ReleaseFont(hfLinked);
        }
        psz += cchActual;
        cch -= cchActual;
      }
      if (FAILED(hr)) {
        //  We started outputting characters so we have to finish.
        //  Do the rest without font linking since we have no choice.
        TextOut(hdc, 0, 0, psz, cch);
        hr = S_FALSE;
      }
    }

    pfl->Release();

    if (!(dwAlignOrig & TA_UPDATECP)) {
      SetTextAlign(hdc, dwAlignOrig);
      MoveToEx(hdc, ptOrig.x, ptOrig.y, NULL);
    }
  }

  return hr;
}

Finally, we can wrap the entire operation inside a helper function that first tries with font linking and if that fails, then just draws the text the old-fashioned way.

void TextOutTryFL(HDC hdc, int x, int y, LPCWSTR psz, int cch)
{
  if (FAILED(TextOutFL(hdc, x, y, psz, cch)) {
    TextOut(hdc, x, y, psz, cch);
  }
}

Okay, now that we have our font-linked TextOut with fallback, we can go ahead and adjust our PaintContent function to use it.

void
PaintContent(HWND hwnd, PAINTSTRUCT *pps)
{
  TextOutTryFL(pps->hdc, 0, 0,
               TEXT("ABC\x0410\x0411\x0412\x0E01\x0E02\x0E03"), 9);
}

Observe that the string is now displayed with no black boxes.

One refinement I did not do was to avoid creating the IMlangFontLink2 pointer each time we want to draw text. In a “real program” you would probably create the multilanguage object once per drawing context (per window, perhaps) and re-use it to avoid going through the whole object creation codepath each time you want to draw a string.

[Raymond is currently on vacation; this message was pre-recorded.]

Comments (17)
  1. Does this work for RTL languages as well?

  2. DrPizza says:

    It seems curious to me that this functionality is (apparently) part of IE, and not part of something such as Uniscribe.

  3. You know, DrPizza, I was just thinking that myself.

    I was incredibly surprised that this code didn’t touch Uniscribe at all.

    Mind you, I find Uniscribe a bear to work with anyway, so I’m not entirely surprised.

    Raymond – why not Uniscribe?

  4. Nicholas Allen says:

    Larry, I believe this will fail for things like embedded bidi.

    Uniscribe has built in support for linking and fallback. It probably does something similar to this internally because a lot of it boils down to calls to something like ExtTextOut.

  5. Raymond Chen says:

    I didn’t know about Uniscribe. MLang was added in Internet Explorer 4.0; Uniscribe didn’t show up until Internet Explorer 5.0.

    I welcome anybody to write an equivalent version that uses Uniscribe.

  6. Jordan Russell says:

    I thought the font "Microsoft Sans Serif" (added in Windows 2000) was supposed to solve this problem. It seems to be capable of displaying Roman/Cyrillic/Thai/Korean/Japanese/etc. characters, provided the necessary language support is installed on the system.

  7. Raymond: Fair enough :)

  8. Jonathan says:
    1. I thought Arial Unicode MS does include all Unicode characters (or a pretty good approximation thereof).

      2. How does this handle bidi? How about combining characters?

  9. M1EK says:

    I had to do this on OS/2 in order to get Unicode text display working for Java, which sneers at puny concepts like codepages.

    And I like how you picked Thai, which is the weirdest character set I ever had the displeasure to attempt to handle.

  10. Ben Cooke says:

    Jonathan,

    Arial Unicode MS isn’t part of the stock Windows distribution, and it’s also HUGE (about 25MB!) so it’s not really something most people want to have loaded when it’s not necessary.

    It’s a pity there isn’t a GDI mode or an alternative function which does all this stuff for you. Obviously it can’t be the default, because in some cases it’s important to use one font and one font alone and there are already loads of apps out there, but the option to have Windows do the right thing to solve this very common problem.

  11. Raymond Chen says:

    If you have Windows 2000 or better, my understanding is that GDI will do font linking automatically if your font is Tahoma, MS Sans Serif, a few select others. (I’m not the expert on this subject, so my answer to follow-up questions will likely be "I don’t know.")

  12. Ben Cooke says:

    As a test I opened Notepad (first simple Unicode-enabled app which sprang to mind) and created an empty file with a name containing Thai, Cyrillic, Arabic and Greek characters to see what would happen in the title bar, which on my system is rendered in Tahoma. I assume title bar captions are rendered through TextOut, although of course they might not be!

    It doesn’t appear to have worked, although the copy of Tahoma on my system does seem to contain lots of Thai, Cyrillic and Arabic characters. It did stumble on a few of the Arabic ones, though.

    I’ll have to test this properly with a simple test app sometime, I guess.

  13. Brian says:

    I’m curious as to the point of TextOutTryFL. Why not just put else { TextOut(hdc, x, y, psz, cch); } inside TextOutFL ?

  14. J. Edward Sanchez says:

    Because the caller might want to know whether the operation succeeded as requested. You could accomplish that using a special return code, but then the caller might also want to take some other action (i.e., warn the user) instead of just falling back to TextOut() in case of a problem.

Comments are closed.