Break it up, you two!: The zero width non-joiner


Keytips are those little pop-up keyboard accelerator thingies that appear on the Ribbon when you tap the Alt key:

A tester discovered that when a test tried to read the accessibility name for a Ribbon keytip, "an extra character appears after every keytip character." In the above example, the keytip for "Tab 1" was being read back as

46 00 0C 20 46 00 0C 20
----- ----- ----- -----
  F   ?????   F   ?????

The question marks are U+200C, formally known as ZERO WIDTH NON-JOINER. Michael Kaplan discussed the character (and its evil twin the ZERO WIDTH JOINER) some time ago.

The ZERO WIDTH NON-JOINER (or ZWNJ to his friends) is a hint to the font engine that the characters on opposite sides of the ZWNJ should not be combined into a ligature. In English, the ZWNJ would prevent two consecutive lowercase "f"s from being converted into a "ff" ligature. Ligatures are fading from use in contemporary printing, probably due to the rise of computers. Back in the old days, you saw all sorts of neat ligatures, like "st".

Breaking up the ligature is important when presenting keyboard accelerators. Imagine if the keyboard accelerator for a key sequence was "A" followed by "E". If this were displayed as "Æ", users would waste their time looking for an "Æ" key on their keyboard. Although English doesn't have many ligatures any more, many other languages still employ them heavily. (You may have noticed that the keytip was a bit overzealous with the ZWNJ, putting one at the end of the string even though there was nothing for the second F to be unjoined from!)

So if you encounter one of these ZWNJ characters, don't be afraid. He's just there to break things up. And as Michael notes, ZWNJ and ZWJ "are supposed to be ignored in things like the Unicode Collation Algortihm."

Comments (14)
  1. Anonymous says:

    If anyone sees a square with 'FB06' in it, it's this character: http://www.fileformat.info/…/index.htm.

  2. Anonymous says:

    That's just another case of people not realizing Unicode's neatness.

  3. Anonymous says:

    Don't worry, ligatures are back in Word 2013.  Try typing the word "bloodletting."

    In fact, ligatures are applied regardless of font size — which makes for some really awkward-looking text at large point sizes.  I always thought that ligatures are only supposed to be used at typical reading sizes — nothing larger than, say, 14-point.

  4. Ligatures are fading from use in contemporary printing, probably due to the rise of computers.

    If somebody doesn’t use the fi ligature when using Serif fonts, he is an idiot, because of the crappy dot clash.  Copy “fi fi” to your favorite WYSIWYG text editor program (Word{,Pad}/{Libre,Open}Office) with Times New Roman at a small size.  (too big fonts/zoom may break this, and don’t forget that you can zoom paper in much better than an LCD display).  

    {La,}TeX, the most superior typesetting system in existence, does ligatures of certain words automatically.

  5. Anonymous says:

    What does a multi-letter keytip mean? Do I hold down Alt while pressing the first and then the second letter? Do I only hold down Alt for the first letter? Something else?

  6. So that's why lowercase RN (rn) is often shown as m.

  7. Anonymous says:

    @alegr1: No that is because of "keming" (bad kerning).

  8. Anonymous says:

    @alegr1 – At one point the "M" key on my laptop broke; I found myself needing to type rn instead to compensate, at least until I used a registry hack to remap my menu key to m. Eventually I just replaced the keyboard. ;)

  9. Anonymous says:

    The only thing missing from Unicode is a virtual machine that can execute programs inside strings. We should reserve a bitplane for the opcodes before it's too late!

  10. I conclude the tester was using something like strcmp instead of CompareString?

  11. Anonymous says:

    @Gabe: "What does a multi-letter keytip mean? Do I hold down Alt while pressing the first and then the second letter? Do I only hold down Alt for the first letter? Something else?"

    In this context, they're used to display the characters that corresponded to the old menu shortcuts. E.g. in the picture above (aside: wow there's an actual picture in Raymond's post!), "FB" is bold because in the old UI, if you pressed Alt, then F for the Format menu, then B for Bold, you'd get bold. So you could chord it as Alt, F, B; as Alt-F, B, or probably as Alt-F (continue holding Alt) Alt-B. Though I'm not sure about the last and can't conveniently try.

    [I *think* this didn't post the first time; sorry for the spam if it did.]

  12. Anonymous says:

    I'm disappointed the picture is not Raymond CSS magic.

  13. Anonymous says:

    Does the ZWNJ also serve as a text-to-speech hint that the word shouldn't be prononced "FA" but rather "F" "A" ?

  14. Yes, rn->m was a joke. Not a joke is how long it took Microsoft to fix the kerning (from keming). I guess, nobody was promised a bonus for that, that's why.

Comments are closed.