What is Title Case?

Disclaimer: I’m not an English teacher (that’s my mom), so I’m sure my description of title casing in English probably has exceptions/variations.

Title casing has an interesting history in computer programming.  Programmers like to use CamelCase to make variable names more readable, and, particularly amongst developers native to some languages, there’s an idea that title casing is interesting, such as in String.ToTitleCase(), and in Windows 7, LCMapString(LCMAP_TITLECASE).  Most title casing algorythms are linguistically bad, even in English.  For other languages it’s worse.

ToTitleCase() takes a very simple approach to title casing.  Maybe in the future it’ll be smarter, but for now it just uppercases the first letter in a group of letters, and tries to pay attention to non-letters and word breaks.  It also tries to keep acronyms all upper-case.

Even in English this is a simplistic approach.  The title of this post is “What is Title Case?”  Is is supposed to be lower case, but ToTitleCase() would mess it up.  Additionally unexpected word breaks or punctuation could trick the algorithm.  Even the acronym test isn’t complete since it just expects all-upper case and sometimes acronyms keep the lower case of the full title.  Also it messess up names like DiSilva or McConnell.  Contractions can also be messed up.

Outside of English, ToTitleCase() rapidly gets silly.  In English we capitalize everything except articles, short prepositions and some other short words.  In German it’s just like a normal sentence, with only nouns getting capitalized, so the English slightly over-eager capitilization behavior becomes very over-eager.  Other languages also can have letters before the main word, eg: l’État, so the ToTitleCase rules can mess up those words as well.

And then there’re scripts/languages that don’t even have an upper/lower case distinction, so ToTitleCase gets pointless.

Anyway, use care when using ToTitleCase().  It might work in some cases, but don’t expect it to work linguistically, particularly globally, particularly in non-English cases.  Also maybe we’ll get smarter and figure out a more correct way to do it in the future.


Comments (4)

  1. David Kean says:

    If its so bad (and Michael Kaplan agrees with you) – why was added to Windows 7?

  2. Shawn Steele says:

    I almost added a paragraph to answer that 🙂

    I actually added it to Win7.  (And I even recycled the bits of the UPPER and LOWER flags, combining them is TITLE, isn’t that sneaky?)

    Mostly it was added for .Net SDK parity.

    Although titlecase does have quite a few problems, applications still find it convenient and better than some alternatives.  In limited cases, where the linguistic problems are understood, it might be a bit helpful.  

    It’s also possible that in the very-long-term we’ll figure out how to do something smart and linguistically appropriate.  (so don’t complain if LCMAP_TITLECASE gets smarter some day).

  3. Ilya says:

    Actually, there is an interesting reason for title case.

    The Serbian language is written using both Cyrillic letters and Latin letters.  The problem is that there are several Cyrillic letters that correspond to two Latin letters: Љ corresponds to LJ, Њ to NJ, Џ to DŽ.  Roundtripping from Cyrillic to Latin and back to Cyrillic doesn't work correctly.  In order to remedy this, the Unicode committee added ligatures LJ, NJ and DŽ.  Converting them to lowercase makes them lj, nj, dž.  In order to capitalize a proper name beginning with them correctly, you need to convert them to title case: Lj, Nj, Dž.  However, as far as I know, the people from the former Yugoslavia do not care about these ligatures.  Ljubica (a common name) brings about a million hits in Google, and I could not find any with the ligature.

  4. Shawn Steele says:

    Yes, the ligatures are annoying special cases, however the "Title Casing" rules for several languages differ.  "Is" in "What is Title Case?" shouldn't be capitalized, but ToTitleCase() will do that.