File URIs in Windows

Invalid file URIs are among the most common illegal URIs that we were forced to accommodate in IE7. As I mentioned in a previous blog post there is much confusion over how to handle file URIs. The standard for the file scheme doesn’t give specific instructions on how to convert a file system path for a specific operating system into a file URI. While the standard defines the syntax of the file scheme, it leaves the conversion from file system path to file URI up to the implementers. In this post, I describe the conversion we use in IE, and I have a list of best-practices to use when constructing or manipulating file URIs.

Proper Syntax

For the UNC Windows file path
     \laptopMy DocumentsFileSchemeURIs.doc

The corresponding valid file URI in Windows is the following:

For the local Windows file path
     C:Documents and SettingsdavrisFileSchemeURIs.doc

The corresponding valid file URI in Windows is:

The important factors here are the use of percent-encoding and the number of slashes following the ‘file:’ scheme name.

In order to avoid ambiguity, and for your Windows file paths to be interpreted correctly, characters that are important to URI parsing that are also allowed in Windows file paths must be percent-encoded. This includes ‘#’ and ‘%’. Characters that aren’t allowed in URIs but are allowed in Windows file paths should also be percent-encoded. This includes ‘ ‘, ‘{‘, ‘}’, ‘`’, ‘^’ and all control characters. Note, for instance, that the spaces in the example URIs above have been percent-encoded to ‘%20’. See the latest URI standardfor the full list of characters that aren’t allowed in URIs.

The number of slashes following the ‘file:’ is dictated by the same rules as other wellknown schemes like http and ftp. The text following two slashes is the hostname. In the case of the UNC Windows file path, the hostname appears immediately following the ‘//’. In the case of a local Windows file path, there is no hostname, and thus another slash and the path immediately follow.

The username, password, and port components of a file URI in Windows are not used. In IE, including any of these components means you won’t be able to navigate to the URI. In contrast, the query and fragment components may be used. The query component will not be used when locating the resource, but the application that displays the content from the file URI may use the query component. For example, if an html document contains script, the script may read the query component of its URI when accessed via the file scheme. Similarly, the fragment will be used like a fragment in any other URI scheme.

Improper Syntax Examples

The following are some examples of poorly formed file URIs with which we’ve dealt. (Paths have been modified to hide the identity of the culprits. :-) These “bad” URIs will continue to work in IE7, however you should steer clear of them for the reasons stated and since there’s no guarantee of support in the future.

Incorrect: file://D:Program FilesViewerstartup.htm
Correct: file:///D:/Program%20Files/Viewer/startup.htm

A large set of invalid file URIs come from the common but incorrect notion that it’s acceptable to place a Windows file path after the text ‘file://’ and call it a file URI. This is bad because Windows file paths, as mentioned earlier, may contain characters that aren’t allowed in URIs or that are important to the parsing of URIs. For instance, if a ‘#’ is in a Windows file path and that Windows file path is simply appended to the text ‘file://’ then we can’t know if the ‘#’ is supposed to be part of the path or if its supposed to delimit the fragment as it would in an actual URI. Similarly, if the path contains a ‘%’ then we can’t determine whether the ‘%’ identifies a percent-encoded octet, or if it is just a plain percent character in the Windows file path. Zeke Odins-Lucas wrote an informative and entertaining blog poston this topic.

Incorrect: C:Program FilesMusicWeb Sysmain.html?REQUEST=RADIO
Correct: file:///C:/Program%20Files/Music/Web%20Sys/main.html?REQUEST=RADIO

In many places inside IE, we allow a Windows file path as input when the input is actually specified as a URI. For example, the function CreateURLMonikerEx takes a string URI, but a Windows file path may be provided instead. Despite this, it is important to realize that a Windows file path is not a URI and a URI is not a Windows file path. You should not, as is done in this example, place a ‘?’ character after a Windows file path and provide a query component. The Windows file path has no such construct. If you wish to reference a file and provide a query then you must use a file URI.

Incorrect: file:////applib/products/a%2Db/ abc%5F9/4148.920a/media/start.swf
Correct: file://applib/products/a-b/abc_9/4148.920a/media/start.swf

The author of this URI was heading in the correct direction. They converted the backslashes in their Windows file path to forward slashes and they percent-encoded characters they thought should be encoded. Although they meant well, there are a couple of problems. First, ‘applib’ is meant to be the host, but is preceded by two extra slashes. If interpreted as an actual URI, then applib isn’t the host but rather part of the path. If interpreted as a legacy file URI (as described by Zeke in his previously mentioned blog post) then those percent-encoded octets will be interpreted literally. Additionally, the characters ‘-‘ and ‘_’ are percent-encoded in this example, but shouldn’t be, as stated by the URI standard.

Non US-ASCII Characters

Characters outside of US-ASCII may appear in Windows file paths and accordingly they’re allowed in file IRIs. (URIs are defined as US-ASCII only and so when including non-US-ASCII characters in a string, what you’ve actually created is called an IRI: Internationalized Resource Identifier.) Don’t use percent-encoded octets to represent non US-ASCII characters because, in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage. The meaning of a URI containing percent-encoded octets for bytes outside of US-ASCII will change depending on the locale in which the document is viewed. Instead, to represent a non-US-ASCII character you should use that character directly in the encoding of the document in which you are writing the IRI. For instance:

Incorrect: file:///C:/example%E3%84%93.txt
Correct: file:///C:/exampleㄓ.txt


In the latest URI standard IPv6 literals are a part of the URI host syntax. In Windows, file URIs are dereferenced by converting them to their corresponding Windows file path and then using Windows file APIs to access the Windows file path. Since there’s no way to include an IPv6 address in a Windows file path, there’s no corresponding file URI and so there’s no way to incorporate an IPv6 address in file URIs in Windows. You can still use a hostname that resolves to an IPv6 address in the file URI, just not the IPv6 literal itself.

In Conclusion…

To reiterate the points above, please construct and use well-formed file URIs. If you’re writing code that generates or interprets file URIs, use the functions PathCreateFromUrl and UrlCreateFromPathto convert between Windows file paths and file URIs. These functions will work correctly with well-formed file URIs and legacy file URIs. Even if your file URI syntax looks reasonable and works in one case, that doesn’t mean it will work correctly in corner cases like paths that contain the ‘#’ or ‘%’ characters.

If you know of other interesting misuses of file URIs or have other related comments please let us know!

Dave Risney
Software Design Engineer

edit: added incorrect/correct wording, link update
edit: Corrected URI/IRI language in the Non US-ASCII Characters section


Comments (45)

  1. Anonymous says:

  2. Mike Brown says:

    Thanks for this post! It’s good to finally see some clarification of your expectations re: the relationship between the components of a file URI and the components of UNC and file system paths. A more formal publication of that mapping, taking into account the different ways non-ASCII path characters might be exposed, would be much appreciated.

    The role of the authority component has historically been a point of contention. Here you’ve clarified that it can correlate to the UNC computer name, but not a drivespec. I personally support this position, but there was an argument that the drivespec should be allowed to be in the authority component (where the ":", having a reserved purpose there, would need to be percent-encoded or replaced with something else like "|"). The idea is that people want the URI path component’s "root" to correspond to the root directory of a particular drive, so a URI reference like href="/Temp/foo" would be just as portable when the documents are on a file system as they are when exposed through an HTTP server. As it stands, requiring the drivespec to be in the path component means that the path component root is at the "My Computer" level, not the drive level. Makes sense to me, but some folks might be irritated by it.

    There was also, historically (Netscape behavior-driven I think), a ubiquitous and unnecessary substitution of the drivespec’s ":" with "|", regardless of where it appeared in a file URI. I’m glad that’s mostly dying out, so I don’t encourage supporting it, but it’s something that might be acknowledged in future writeups.

    Along those lines, I’d like to point out the nuance that ":", when used in the path component of a URI, is semantically equivalent to "%3A", since ":" doesn’t have a reserved purpose in that particular component — at least not according to the generic syntax or the published definition of the file scheme; but since the scheme defers to implementers, you’re essentially one of the scheme’s many designers, and you’re free to say that ":" does have a reserved purpose there. Until then, "file:///C%3A/some/document", while it falls into the you-shouldn’t-overdo-the-percent-encoding guideline, should be treated the same as "file:///C:/some/document". It’s not treated the same in IE6. Has that changed in IE7?

  4. Phillip Stewart says:

    Very informative.  Thanks for trying to get compliant with the URI spec.

  5. So funny that you say all this without mentioning netscape or unix!

  6. John Baird says:

    Regarding IPv6 literals, IE7 under both XP SP2 and Vista accept them in the form http://IPv6_literal. I realize that is an URL, and you are talking about URI, but can’t they be included in a URI using the same syntax?

  7. DavRis [MSFT] says:

    @Mike Brown:  Thanks for your expressive comment.  I’ll respond to your comment in parts.

    With respect to ‘|’ as a drive delimiter:  I’m glad you’re against using the ‘|’.  I agree with your points.  That said, IE will continue to support this as it has done in the past in order to maintain compat. with older applications that depend on this.

    With respect to ‘%3A’ as a drive delimiter:  IE7 doesn’t treat this as a drive delimiter.  As you note we as implementers of this scheme get leeway and so we don’t need to equate ‘:’ with ‘%3A’ in this case.  But on the other hand there’s no reason not to from a design perspective and I like your points.  I’ll put this down to consider for future changes.

  8. DavRis [MSFT] says:

    @Andrew Sherman: I’m not sure what to say regarding file URIs and Netscape or Unix.  The file URI spec leaves resolution of file URIs up to the implementer meaning that Netscape’s  ideas of file URIs and other browsers on Unix may have very different concepts of file URIs.  

    This post was meant to highlight best practices for file URIs in Windows w/ IE6 or IE7.  Do you have any specific questions regarding Unix and Netscape?  

  9. DavRis [MSFT] says:

    @John Baird:  Actually, what I wrote about IPv6 literals applies only to file URIs in IE7.  It doesn’t apply to http, ftp, or any other URIs in IE7.  As you correctly note http URIs with IPv6 literals work in Vista and XPSP2 w/ IE7.  Its just file URIs that don’t work with IPv6 literals.

    With regard to URI vs URL check out RFC 3986 section 1.1.3 which describes the relationship between URI, URL, and URN:

  10. DavRis [MSFT] says:

    @Mike Brown: Wrt your comment about a more formal publication.  This document is an informal attempt to highlight best practices for file URIs in IE.  This means, as you noted, I didn’t get into the details of the various bad/deprecated practices for file URIs in IE like use of ‘|’ or the legacy file URI syntax I mention.  I’d be happy to work on a more formal publication.  Do you have any publication channel did you have in mind?

  11. DavRis [MSFT] says:

    @Mike Brown: With respect to the drivespec in the authority:  I find myself agreeing with you yet again.  That’s a an interesting point looking at drivespec in the authority with an eye towards what functionality it enables.  However, defining file URIs in this manner in IE isn’t feasible because of support for the legacy file syntax.  The legacy file syntax is described by Zeke in his blog:

    The syntax of good natured and well meaning file URIs w/ drivespecs in the authority would conflict with the syntax of the legacy file URIs.  We can’t remove support for legacy file URIs in order to maintain application compat. with a significant number of applications.  If the day comes when we can remove the legacy syntax discussion on drivespec in the authority can be reopened.

  12. DavRis [MSFT] says:

    I got the following comments from Alexei regarding my post which I’ll answer here:

    "Suggestion: The portion about non-ASCII file names seems to be a bit unclear. It would be nice if you could give some examples of proper conversion.  I’m thinking about something like:


    Following file URL c:документ.htm

    Should be represented as: ???

    And should not ever be presented like: ???

    Also you mentioned document encoding in that section. How does it relate to standalone file: URL? Does it also mean that file: URL should be  invalid URI for such cases (as non-ASCII characters can’t be directly present in URI)?

    Thank you,


    As an example of what I was talking about in the ‘Non US-ASCII Characters’ the following Windows file path:


    Should be represented as the following file URI with no percent-encoding applied to the non US-ASCII characters:


    It should not be written using percent-encoded UTF-8 like the following:


    Even though the IRI spec <; says the previous two example URIs are equivalent, the IE implementation for file URI resolution was written well before the IRI spec and doesn’t follow it.  Changing the file URI resolution implementation to better support the IRI spec in the future and still maintain backwards compat will be a challenge.

    Percent-encoded octets will be interpretted using the current system codepage.  So if you have the Cyrillic Windows (1251) codepage as the default for your system then the following percent-encoded URI will resolve to the original file:


    However, since the resolution of this URI will change based on the system codepage of the machine on which the document is viewed one shouldn’t encode a file URI in this fashion either.

    The recommended way then is:


    with the non US-ASCII characters directly in the URI.  Alexei is correct in that technically, this means it is no longer a URI since a URI is only defined over US-ASCII.  However, IE and most Windows URI parsing APIs (the Unicode versions anyway) allow non US-ASCII characters.  So going this route means its technically not a URI however it also means that your file ‘URI’ will resolve the same no matter the codepage of the system on which the document is viewed.

    That is to say, if you’d like your file URIs w/ non US-ASCII characters to work, then represent the non US-ASCII characters directly rather percent-encoding them.

  13. anonymous says:

    A little bit off topic here but I want to highlight to you again about a problem which causing a lot of complaint from our user after IE7 rollout: header disappear when printing from Outlook with IE7 installed.

    This seems like a problem which a lot of other people also face:

    I hope IE team can release a fix for this problem soon.

  14. Chris says:

    Great post, thanks guys.

    I am, however, seeing a strange effect in IE7 with a link to a local file. Can anyone comment? I know the link is wrong, but shouldn’t IE still work with it? Other browsers are OK.

    The link is shown near the end of my post here (I won’t reprint it directly incase the comment system mucks it up. Someone add a Preview mode soon!). See the part marked "Update 12 April 2006" above the first comment.

    In other browsers, the link is fixed and works. But in IE7, the drive letter is doubled! Hence the link doesn’t work. It is OK in IE6.

    Speaking of supporting standards… parsing of URIs etc.

    I’m glad to hear that IDN support is in IE7 now, but one of the things that has bugged me for a long time, is the lack of UNICODE support!

    Oh sure, IE6/7 do support UNICODE, but not ALL the characters… and as it always turns out, the one’s that anyone has interest in, are the ones IE doesn’t support.

    For example, lets take some of the shapes, in the #9600 range (9600-9700)

    If I try to view these, in various browsers (on the same machine, using a simple font like Arial) here’s what I get:

    Now, I’m going to go out on a limb here, and suggest that the 10 that don’t show in Firefox/Opera, just aren’t defined in the Arial font/windows, but as for IE?! What’s up?!

    Unicode Range (9600-9700)

     Mozilla Firefox:

       90 display, 10 don’t (9622-9631)


       90 display, 10 don’t (9622-9631)


       22 dispaly, 78 don’t!


       22 dispaly, 78 don’t!

    So, for this range, IE’s unicode support is (22/90) = 24%!

    Unicode Range: 9300-9400?

     Firefox: 88%

     IE7: 0%

    Unicode Range: 8700-8800?

     Firefox: 100%

     IE7: 13%

    Unicode Range: 8800-8900?

     Firefox: 100%

     IE7: 4%

    Please tell us that in future releases of IE, that fixing support for existing features/standards will be a high priority, if not much higher than adding new features.

  16. marc says:

    To honour file:// links in web ( no local ) pages is a bad design decission. Firefox for security purposes ( with default configuration ) does not follow this type of links (‘t_work  )

  17. Bert says:

    @ Andrew Sherman

    ‘So funny that you say all this without

    ‘mentioning netscape or unix!

    Why?  This is an IE7 blog with an entry about the handling of URI’s.  Why does there need to be any mention of other browsers, OS’, etc?


  18. Tomas says:

    We are having some trouble with file URLs in Internet Explorer 7 that can hopefully be answered by someone here. We have a software with an online documentation that lets you click on a Help button in each dialog box in the GUI and then go into the appropriate section in the HTML documentation.

    This is done by calling Internet Explorer with a file URL with an anchor to get into the appropriate place in the given help file. This means that we do something like

    iexplore file:///c:/product/doc/drawing.htm#circles

    The problem is that Internet Explorer 7 strips everything after the # and the drawing.htm file is opened at the begining and not at the correct position.

    The above syntax has worked in Internet Explorer 6 and older and also works in the other major browsers. I can also add that our application is written in Java so we need to be able to feed a file URL to Internet Explorer in a way similar to the one indicated above.

    Can someone please tell what we should do to accomplish the thing above in Internet Explorer 7.

  19. Pradeep says:

  20. says:

    Oh, and if it hasn’t been said a 1,000 times already, please make Node (from the DOM), a first class JavaScript citizen in IE7. Being able to prototype on this, *WOULD* allow developers to *SOLVE* almost all DOM related bugs in IE7+!

  21. EricLaw [MSFT] says:

    @Pradeep: Right-click on a command bar icon, choose Customize Command Bar, and check "Show only icons".

    @Tomas: Yes, I’m sorry to say that IE7 introduced a regression in this behavior.  The bug is as you noted, when passed a file URI on the command line containing a fragment, the fragment is dropped during navigation.

  22. Tomas says:

    @EricLaw: Thanks for the answer about dropping the fragment for a file URL. Can we expect a fix for this soon and can you provide some workaround that can be used in the mean time.

  23. Firebug available for MSIE!

    Well, not entirely, but with a quick hack, you can have a Firebug Console in MSIE!

    JS Debugging in MSIE… wow, where’s my Snowboard!

  24. Luke says:

  25. Zhe says:

    Is this a permission or a URI problem?

    Local files are inaccessible by the native XMLHttpRequest introduced in IE7.

    Here’s a testcase. Just set up a 1.html with whatever in it, then run it. It fails on IE7 but works fine on other browsers such as Gecko-based Firefox, WebKit-based Safari.

    Anyway, this works fine too by using the ActiveX XMLHttpRequest.


    var url = "1.html";  // 1.html is a local file

    http = new XMLHttpRequest();"GET", url, false);

    alert((http.status==200 || http.status==0) ? http.responseText : "HttpGet Error Status: " + http.status);


  26. Aedrin says:

    "Please tell us that in future releases of IE, that fixing support for existing features/standards will be a high priority, if not much higher than adding new features."

    If you are going to do a test, do it fairly and use the whole character range. Picking out a specific range you know that IE doesn’t handle well is a little bit unfair.

    You sound like a politician.


    I have several searchURL keys defined under

    HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerSearchUrl

    Any keyword that uses the file:/// is not recognized in the Address bar and

    it is performing the default google search. This is not happening with http

    url. This was working fine in ie6.

    E.g. for file URL is


    instead of running the file:/// when i type mysearch 1 in the address bar it

    tries to run


  28. Alberto says:

  29. dan says:

  31. This is totally off-topic, I’m afraid, but I haven’t been able to locate anywhere else to report this; a pointer would be appreciated if any.

    As a sysadmin, I normally use Group Policy to configure the IE proxy settings automatically, to avoid complication when dealing with users. Unfortunately, in IE7, this greys out the proxy settings for LAN in the IE options, such that they can’t be altered by the users.

    This would have been fine in the old days, where off-site users would connect to the Internet via dial-up connections. However, as wireless networks of all kinds are considered LAN connections, this means that the users are unable to access the web using IE when connected via wireless, as they are configured to use the proxy and cannot change this.

    Obviously, this is something of a problem.

    Are there any plans to modify this in future releases of IE, 7 or later?

  32. Steve says:

  33. Jay says:

    Interesting, I never realised you could link to UNC paths using URI’s.

  34. steve_web says:


    Re: "If you are going to do a test, do it fairly and use the whole character range. Picking out a specific range you know that IE doesn’t handle well is a little bit unfair.

    You sound like a politician."

    Sure thing, no problem… pick a range, any range you want… the 3 or 4 I picked out were at random… you’re welcome to post stats on any other range, but I highly doubt you will find stats in the reverse.

  35. Aedrin says:


    As soon as someone picks a range, they are unfair results, biased through one or more ways. The only fair way is to compare the -entire- range.

  36. SteveW says:

    What about IE7 and the URL:


    This worked in IE6 but does not in IE7.

  37. The IE team have written their interpretation of the file:// URI specs. With most operating systems the file: URI is simple, due to the common root used by most non-Microsoft operating systems. For example on Linux /home/dave/index.html would be file:/..

  38. steve_web says:


    "As soon as someone picks a range, they are unfair results, biased through one or more ways. The only fair way is to compare the -entire- range."

    Well, I’ll be honest, I don’t have time to sift through 10,000+ different characters, but if anyone is, and wants to try, please do, I would be interested in the stats.

    Otherwise, until you can find *any* range, that returns more correct characters in IE, versus Firefox/Opera, my statement about the significant lack of Unicode support in IE still stands.

    For giggles, I’ll ask you to pick (a/a few) random 100 character range between 1000, and 10,000 and we can compare results between the 3 browsers*.

    *yes, there are more than 3 browsers, but I need to test this on a single windows box to be fair… and the 3 browsers I have installed, are Firefox, Opera, and IE7.

  39. Andrew says:


    So.. the ability to see unicode ‘shapes’ (that do not appear to be related to a language) – in the URL of IE equates to a "significant lack of Unicode support"?  Hmm.

  40. Andrew says:


    To go a step further – I would argue that IE is correct in omitting this range.  

    These are untypeable characters and therefore their appearance in a URL is questionable.

    I expect that the ‘latest URI standard’ that Dave Risney linked to in his post highlights this – there’s a new task for you!

  41. Tim Altman says:

    If file://D:Program FilesViewerstartup.htm is invalid, why does IE 7 still save MHTML files using that format for the root document’s Content-Location?

  42. steve_web says:


    NO, was not talking about the URL at all.

    We’re talking about in page rendering of unicode chars.

    So, If I want to use characters, to draw images, or use special 1/2, 1/3, 1/4 type characters.

    There are many arrows, blocks and other useful shapes that would be very handy to be able to use, but for reasons yet unexplained, support for vast ranges of unicode is missing.

  43. David Hedley says:

    Not really related to the URI post, but can someone in the IE team _please_ fix IE to stop using square brackets when saving downloaded files to the temporary directory before invoking an external application.

    Square brackets are an invalid character for Excel and using them breaks downloaded Excel spreadsheets which contain pivot tables.

