Hi, I am Vishu Gupta, a developer on the IE team. For the past year, I have been working primarily on CURI and International Domain Names (IDN) support. Browser support for navigating to URLs written in users’ native languages is critical for making the Internet truly international. IDN relies upon a standardized mechanism known as “Punycode” for encoding Unicode domain names using only the ASCII characters that are permitted by the DNS system.
After XPSP2 was released, I was asked to study and evaluate what it would take to implement IDN support in Internet Explorer. We determined that the workitems involved in implementing IDN support in IE were:
- Converting the Unicode domain names to Punycode before sending them over the wire.
- Maintaining consistency within IE for handling domain names which enter IE in Punycode, and treating them equivalent to their Unicode counterparts.
- Handling compatibility for existing scenarios.
- Providing security against homograph-spoofing attacks without giving a bad user-experience for IDN URLs.
Conversion to Punycode
This is accomplished by using the APIs provided by the recently released “Microsoft Internationalized Domain Names (IDN) Mitigation APIs 1.0”; these APIs will ship with Windows Vista and IE7 and are also available for download here. You can learn more about these APIs by reading the MSDN documentation.
Maintaining consistency within IE
Many websites work around the limitation that IE6 does not support IDN by linking to the Punycoded URL. To improve user experience with those websites and to ensure that IE behaves consistently for equivalent Punycode and Unicode domain names, IE7 handles the URL as Nameprep Unicode internally (as suggested by RFC 3490). IE converts Unicode domain names to Punycode just before the domain name is resolved or sent to the proxy. This ensures, for example, that if the user added ŧēśŧ.example.com to the Restricted Sites zone, http://xn--hea8l8ac.example.com is also treated as a restricted site.
Using Punycode for name resolution is the default behavior for IE7. A new “International” section in the Internet Control Panel offers permits disabling IDN when sending the domain name either to the proxy or to the DNS resolver. Disabling both options will revert IE7 to IE6 behavior when handling Unicode domain names.
Blocking IDN spoofing
Lookalike attacks (sometimes called “homograph” attacks) are possible within the ASCII character set (the usual examples are www.example.com vs. www.examp1e.com). But, with IDN, the character repertoire expands from a few dozen characters to many thousands of characters from all of the world’s languages, thereby increasing the attack surface for spoofing attacks immensely.
There is little doubt that showing the Punycode form leaves no ground for spoofing using the full range of Unicode characters; however, showing Punycode isn’t very user-friendly. The design of our anti-spoofing mitigation for IDN aims to:
- Reduce attack surface
- Treat Unicode domain names fairly
- Offer a good user-experience for users worldwide
- Offer simple, logical options to enable the user to fine-tune the IDN-experience
Given these considerations, IE7 imposes restrictions on the scripts allowed to be displayed inside the address bar. These restrictions are based on the user’s configured browser language settings. Using APIs from the aforementioned idndl.dll, IE will detect what scripts (character sets) are used by the current domain name. If the domain name is contains characters outside of the user’s chosen languages, it is displayed in Punycode form to help prevent spoofing.
A domain name is displayed in Punycode if any of the following are true:
- The domain name contains characters which are not a part of any language (e.g. www.▯.com)
- Any one of its labels* contains a mix of scripts that do not appear together within a single language. For instance, Greek characters cannot mix with Cyrillic within a single label.
- Any of its labels* contains characters that appear only in languages other than the user’s list of chosen languages. Note that ASCII-only labels are always permitted for compatibility with existing sites.
(* A label is a segment of a domain name, delimited by dots. www.microsoft.com contains three labels, “www”, “microsoft” and “com”.)
If none of the above conditions apply, the domain name is displayed in Unicode. Note that different languages are allowed to appear in different labels, so long as all of the languages are in the list chosen by the user. This is to support domain names like name.example.com where “example” and “name” are composed of different languages.
We do not describe “other language” URLs as “suspicious” because such URLs are completely harmless when displayed in Punycode form. Whenever IE7 has prevented an IDN domain name from displaying in Unicode, an Information Bar notifies the user that the domain name contains characters IE is not configured to display. It is easy to add additional languages to the Allow List using the IDN Information Bar. By default, the user’s list of languages will usually only contain the currently-configured Windows language.
Attack Surface Reduction
Our language-aware mitigation does two things:
- It disallows non-standard combinations of scripts from being displayed inside a label. This takes care of attacks like http://bạnk.example.com. That domain name will always be displayed as http://xn--bnk-sgz.example.com, because two scripts (Cyrillic and Latin) are mixed inside a label. This reduces the attack-surface to “single-language attacks”.
- It further reduces the surface attack for single-language attacks to only those users who have chosen to permit the target language.
Users who allow Greek in their language-settings are as susceptible to Greek-only spoofs as the population using English is susceptible to pure-ASCII based spoofs. That’s where IE7’s Phishing Filter kicks in for both Unicode and ASCII urls. If the user has opted into the Phishing Filter, a real-time check is performed during navigation to see if the target domain name is a reported phishing site. If so, navigation is blocked. For additional defense-in-depth, the Phishing Filter’s web service can apply additional heuristics to determine if the domain name is visually ambiguous. If so, the Phishing Filter will warn the user via the indicator in the IE address bar.
Whenever viewing a site addressed by an International Domain Name, an indicator will appear in the IE address bar to notify the user that IDN is in use. The user can click on the IDN indicator to view more information about the current domain name.
Users who do not wish to see Unicode addresses may set an Internet Control Panel option to “Always show encoded addresses”.
Call to Action
Internet Explorer 7 Beta 2 will include IDN support in nearly-final form and we would greatly appreciate feedback on the design. If you see a scenario not working properly (for example, if adding native language URLs to favorites was broken), please let us know.
– Vishu Gupta
Update: Changed one example to not point to an actual domain name!