Text with no language is just not quite there

This is kind of a continuation of Working with text and bytes, by the way – you might find those interesting as well, although this is getting more abstract as we go.

Let’s consider this example: let’s say I’m a rendering engine, and I want to display text. Typically this entails figuring out what text there is to display, how I want to paint each glyph (each rendering unit, if you will), and how I’m going to lay those out on the screen. Unfortunately, there are cases in which I do the job right unless I also know the language associated with the text.

HTML addresses languange using the lang attribute. HTTP has the Content-Language header, although it isn’t quite the same – it’s the language for the intended audience, but the body may have more than one language, and HTTP says nothing about what those languages are or what portions of text are covered by which one. Which isn’t surprising, given that HTTP could very well transferring binary files.

XML again already takes this into account with the xml:lang attribute – yay for text processing!

You want more practical examples? Those will come in a bit…


Comments (1)

  1. I’ve written in the past about XML and languages, and why you might be interested in being aware of the