NCL includes classes for networking related technology such as the base type System.Uri. URI’s are used extensively to identify network resources, especially on the Internet. We’ve found that different customers frequently ask us the same questions about System.Uri and its capabilities, so we would like to share some of those questions with you here. For space reasons we will address some of the harder questions in an additional post about System.Uri Customization.
1. System.Uri vs. Strings
Q. If a URI is just a simple string of characters, why do you need the System.Uri type at all?
A. A URI is a string of characters, but it is not necessarily simple. For example, take the following URI:
There are a number of public standards that outline the syntax for URIs. System.Uri’s role is to validate the given string is in the appropriate syntax, as well as to provide easy access to the subcomponents.
Microsoft took this seriously enough to add a series of Code Analysis rules in Visual Studio to discourage developers from storing URIs as Strings. See: http://msdn.microsoft.com/en-us/library/ms182174.aspx
2. Standards and History
Q. What public standards does System.Uri support? Has this changed in newer versions of the .NET framework?
A. When the System.Uri class was first introduced in .NET 1.0, it implemented the public standards described in RFCs 1736, 1738, and 2396, along with other functionality necessary for a base type in the .NET Framework. Since then, these standards have been updated or modified by new standards introduced in RFCs 3490, 3986, 3987, and others. In .NET 3.5 System.Uri was expanded to add support for RFCs 3490 and 3987 as described in question 10 below. In order to maintain backwards compatibility very few changes have ever been made to migrate System.Uri from RFC 2396 to 3986.
3. URI Components
Q. Which System.Uri properties give me which components of my URI?
A. Here are some of the common properties, along with the (canonicalized) data they represent from the previous sample URI. See RFCs 2396 and 3986 for more details on the component definitions. Note: IRI and IDN are enabled for these samples (see question 10).
www.ル.com (+ ‘:’ + Port, for non-default ports)
* The RFC’s indicate that Authority can optionally include UserInfo. To enhance security and limit the exposure of user credentials, System.Uri does not include UserInfo in the Authority.
Q. Why don’t I get back exactly what I put in?
A. According to the RFCs, some URI components may be accepted in not-quite-standard formats as long as they are converted to standard formats. This process is called canonicalization or normalization. Examples of such are that the scheme and host should be lower case, default port values should be dropped, and dot segments in the path should be compressed. Additionally, System.Uri expands IPv6 addresses to their longest form. All of these processes are required to facilitate equivalency checking when accessing resources and verifying security access.
5. Uri Manipulation
Q. Why can’t I change any of System.Uri’s properties?
A. This is because System.Uri is an immutable base type, the same as System.String. You couldn’t trust System.Uri as a base type if anybody could change its contents underneath you. To conveniently remove, change out, or assemble components of a URI use the System.UriBuilder class, or System.UriTemplate. Also see the relative Uri APIs; Uri.MakeRelativeUri, and a System.Uri constructor.
6. Supported Schemes
Q. What schemes does System.Uri support?
A. System.Uri has built in parsing rules for Http, Https, File, Ftp, Gopher, MailTo, NetPipe, NetTcp, News, Nntp, Uuid, as well as generic parsing rules applied to unrecognized schemes. Here are some samples for each of these schemes.
c:\path\file.txt (implicit DOS file)
\\host\share\path\file.txt (implicit UNC file)
7. UriFormatExceptions & Validating User Input
Q. Do I have to try/catch UriFormatExceptions every time I create a new System.Uri?
A. Not always. In .NET 3.5 we added the System.Uri.TryCreate methods so you can check for problems with your data input without all the hassle and performance problems of try/catch blocks and exception handling.
Uri result = null;
if (!Uri.TryCreate(userUriString, UriKind.Absolute, out result))
// Fail gracefully, ask the user to try again.
The down side here is that when a URI fails to parse, these methods do not return an error message explaining what was wrong with the input. If you need to display the error message then you should still use the try/catch pattern.
8. MailTo Multiple E-mail Addresses
Q. Why doesn’t System.Uri’s MailTo scheme support multiple e-mail addresses?
A. System.Uri supports RFC 1738 which only allows for a single e-mail address in the Authority of a MailTo URI.
An alternative way to represent multiple e-mail addresses with one System.Uri is by using query parameters as discussed in RFC 2368. The following example works well with Microsoft Outlook:
The only caveat with this format is that Outlook will only populate the To field with the primary e-mail address (email@example.com); Any To field data in the query (firstname.lastname@example.org ) is ignored. Other e-mail application’s behaviors may vary.
Alternatively, if you need to parse out multiple e-mail addresses from a data string, you can use the System.Net.Mail.MailAddressCollection class.
If you have access to a mail server, you may also consider adding all of the addresses to a mailing list. Then you only need a single address (mailto:MyGroup@constoso.com), and the lists membership can be updated without changing this address.
9. Implicit Dos/Unc File Paths
Q. Why can’t I put # in my URI path? It’s a valid file name character!
A. In a standard URI, the # symbol identifies the start of the Fragment portion. Because this may conflict with local file paths, System.Uri treats # differently for implicit Dos and Unc file paths.
As shown in the table, explicit file URIs are allowed to have fragments, so the # symbol is not considered part of the file path for the LocalPath or AbsolutePath properties. Implicit file URIs cannot have fragments, so the # symbol remains part of the path.
10. Unicode, IRI, and IDN
Q. Why does System.Uri turn my Unicode into xn--fek or %E3%83%AB?
A. Many computer programs and protocols do not yet understand non-Latin/ASCII character sets, so System.Uri has implemented RFC defined logic for encoding and decoding these non-Latin Unicode characters. This support was significantly expanded in .NET 3.5 to include the IDN and IRI RFC standards described below. For backwards compatibility reasons these new behaviors can only be enabled through configuration file settings. See the System.Uri documentation for details.
RFC 3490 outlines International Domain Names (IDNs). This standard covers the encoding of non-Latin Unicode characters in the Host portion of a URI. This encoding is designed for backwards compatibility with existing DNS standards and implementations. The Uri.DnsSafeHost property was extended to show this new format.
RFC 3987 introduces Internationalized Resource Identifiers (IRIs). This standard discusses how to encode non-Latin Unicode characters outside of the Host portion of a URI. Use the Uri.AbsoluteUri property rather than Uri.ToString() to get this encoding.
As you can see System.Uri is a powerful tool that helps you handle a large variety of complex URI syntaxes. In a following post we will address how you can expand System.Uri’s support for additional syntaxes through customization.