There’s a reasonably new RFC 4690 ( http://www.ietf.org/rfc/rfc4690.txt ) that raises a bunch of questions about IDN names and Unicode regarding such things as confusable characters and other issues. Some of those are also discussed in Unicode TR36 “Unicode Security Considerations” http://www.unicode.org/reports/tr36/.
One thing that confuses me about the discussion regarding IDN’s weaknesses is that most of these issues wouldn’t be a problem if the registrar’s didn’t register the names. So from the client application’s perspective, it doesn’t matter much if a particular IDN name is legal or not, so long as it does the appropriate mapping. If its not a legal IDN name, then web browsers or other apps won’t respond to it because the DNS system won’t return any records. So from the client perspective I’m not sure what all the fuss is about.
Obviously the registrars need to be able to disallow bad names, and guidelines such as TR36 help. Registrars also have the opportunity to be pickier than the standards. Many only allow certain scripts or combinations relevent for their TLD.
For similar reasons the migration of IDN to Unicode 5 or newer doesn’t bother me. Nobody’s going to allow unassigned code points in their domain name. If a Unicode 5 code point appears in a future IDN name, and the DNS system resolves it, it would follow that it is a valid code point, even if some client app only understood Unicode 3.2. That doesn’t help case mapping, but a pure Unicode 5 name should be easily understandable and resolvable even by downlevel clients.
Anyway, it seems to me that there’s more fuss about these issues than there needs to be. Probably if the IETF IDN folks and the Unicode folks worked with each other to resolve these issues then IDN would get updated faster. Right now it seems like each group is reacting on its own.