Oversimplification of EAI/IMA (International eMail Addresses)

A couple months ago I blogged about EAI Email Address Internationalization/Internationalized Email Addresses (EAI/IMA) and felt like blogging again.

China's been very interested in non-ASCII email addresses for some time, and is working hard to adopt the EAI standard.  I've heard a target of November 2009 for that standard.  https://www.china.org.cn/china/sci_tech/2008-09/27/content_16544162.htm briefly addresses EAI.

Oversimplification of EAI

The basic concept of EAI is "just" to use UTF-8 for email.  Most software can comply just by allowing Unicode in their email addresses.  Using UTF-8 is reasonably straight forward, and most of the details are just around compatibility with existing mail standards.  The IETF working group has a page at https://www.ietf.org/dyn/wg/charter/eai-charter.html.

Local Part of the Email Address

The local part of an email address is the user account part. Often times servers allow it to be case-insensitive, however it can also be case-sensitive. Similarly EAI allows the servers to define any mappings of the local part that are appropriate for that organization. Some may choose to do case mapping similar to existing case-insensitive servers. A different mapping, like Turkish behavior for i and I is possible. Another option would be to perform normalization like NFC or NFKC on the name. Width mapping and aliases are possible. Just like now, clients would just use the names given and let the recipient's mail server figure it out.

Domain Part

EAI allows Unicode (UTF-8) for the entire address, so special mapping isn't necessary. Of course if the domain doesn't have a valid registration, eg: isn't valid IDN, then it won't work, but that's not really an email protocol issue. EAI uses UTF-8 instead of "punycode" for domain names. Punycode only happens when "downgrading."

Negotiation

Mostly, "just using UTF-8" is pretty simple, but for backward compatibility, EAI aware servers and clients will need to negotiate their protocols. For SMTP, the UTF8SMTP does this. EAI aware servers can exchange the UTF8SMTP extension and agree to communicate in UTF-8. If the server doesn't provide that flag, then the client's have to use a different mechanism. The other protocols have similar handshaking.

Downgrading

All email clients and servers aren't going to instantly become Unicode aware, so there is a downgrading concept for compatibility. Downgrade is the area with the most churn in the experimental standards, but the basic concept remains the same.

If you have an EAI aware server and you try to talk to an unaware system, you'll need to fallback to the legacy protocols and encoding mechanisms. Effectively this means that EAI accounts will need an ASCII alias so that if an EAI mail fails, it can be resent using the ASCII alias and MIME encodings.

To a legacy recipient, such a mail would appear as any other legacy email, and replies would go to the sender's ASCII alias. The receiving server would need to recognize that the ASCII and Unicode EAI aliases were for the same account and route the mail appropriately.

There was some discussion of providing additional data that allows reconstructing a downgraded mail, but most of those techniques seem to break at least some legacy clients and have additional problems. My feeling is also that if a client knows how to reconstruct a downgraded mail, it also knows EAI anyway, so likely the mail would never be downgraded, so the additional complexity is unnecessary. I think it's likely that the initial standards will only specify minimal downgrading and not the ability to reconstruct a downgraded message.

Status

Of course the IETF RFCs are still experimental and China hasn't published their standards yet, but my oversimplification probably won't change much in the final version.