Rambling about RFC 4690 and IDN

There’s a reasonably new RFC 4690 ( http://www.ietf.org/rfc/rfc4690.txt ) that raises a bunch of questions about IDN names and Unicode regarding such things as confusable characters and other issues.  Some of those are also discussed in Unicode TR36 “Unicode Security Considerations” http://www.unicode.org/reports/tr36/.

One thing that confuses me about the discussion regarding IDN’s weaknesses is that most of these issues wouldn’t be a problem if the registrar’s didn’t register the names.  So from the client application’s perspective, it doesn’t matter much if a particular IDN name is legal or not, so long as it does the appropriate mapping.  If its not a legal IDN name, then web browsers or other apps won’t respond to it because the DNS system won’t return any records.  So from the client perspective I’m not sure what all the fuss is about.

Obviously the registrars need to be able to disallow bad names, and guidelines such as TR36 help.  Registrars also have the opportunity to be pickier than the standards.  Many only allow certain scripts or combinations relevent for their TLD.

For similar reasons the migration of IDN to Unicode 5 or newer doesn’t bother me.  Nobody’s going to allow unassigned code points in their domain name.  If a Unicode 5 code point appears in a future IDN name, and the DNS system resolves it, it would follow that it is a valid code point, even if some client app only understood Unicode 3.2.  That doesn’t help case mapping, but a pure Unicode 5 name should be easily understandable and resolvable even by downlevel clients.

Anyway, it seems to me that there’s more fuss about these issues than there needs to be.  Probably if the IETF IDN folks and the Unicode folks worked with each other to resolve these issues then IDN would get updated faster.  Right now it seems like each group is reacting on its own.

Comments (2)

  1. Within my limited security knowledge these concerns could be real. Although unrelated to IDN, XMPP JIDs have a security problem because of their international nature:


    You need only read the introduction section. This would, for example, allow me to create the domain standardbank.co.za, but using Cherokee characters as per the above XEP. Phishing would instantly become a million percent more effective.

    Hopefully a good CA would pick up on what I was trying to do: but you never know – you never know what chimps some companies have behind their desks.

  2. My feeling is that most of the spoofing is irrelevent.  If someone wants to spoof standardbank.co.za, they just register standardbank.info or something.  Or standardbank.secure.co.za.

    So I think that homographs are interesting, and they certainly are a variation of ways to cause trouble, however they are also somewhat easy to test against (mixed script tests, etc.), whereas plain-ascii attacks can be harder to find…

Skip to main content