Normalized paths are tricky [Brian Grunkemeyer]

Article
10/12/2004

A customer recently asked how to compare to see whether two paths refer to the same file, and he wanted to do this using a canonical path name. This turns out to be a non-trivial problem. Instead of talking about canonical path names which are somewhat expensive to get, we instead talk about normalized paths, meaning they look like they're in a reasonable format, and all the most obvious ways of creating an alias to a file name has been defeated. Path.GetFullPath does path normalization this way, and in fact we use this as a basis for the security checks done by FileIOPermission.

Here are some comments I wrote about FileIOPermission and path issues in .NET Framework Standard Library Annotated Reference, Volume 1. These comments were primarily targetted at people implementing their own copy of our class libraries, but are still good for background info.

FileIOPermission is a great idea. However, the implementation of this security check (and in our case, all the code that uses this security check) requires great care. This permission is based on string comparisons for determining whether a file is in one of the sets of allowed or denied files. This has a couple of surprising properties:

Since access is granted to a file based on its name and which directory it lives in, aliases for that file's data aren't covered by the security permission. Examples include cases like mapping a file as another drive (i.e., using the DOS subst command to make X: refer to a directory on C:) or more obscure features such as hard links, UNIX mount points, or NTFS reparse points (which may include symbolic links in a future version).
Beyond aliases, some common file systems have some very loose rules for how to access files. For example, FAT and NTFS are case-insensitive, and allow you to add a period to the end of the file names. Almost all file systems will allow you to go up a directory then back down to a different directory (i.e., "C:\foo\..\tmp\bar.txt" refers to the file "C:\tmp\bar.txt").
String comparison in the file system often works in a slightly surprising way.

We consider the first issue to be by design. If a user, such as the system administrator, has sufficient premissions to share out a portion of a drive and this can be used to circumvent permission checks, this may be intentional and useful. A great example is that you might wnat to deny code access to all of C:\; however, a particular directory on that drive has been shared out as a world-writable directory for logging, shared documents, or just a public drop folder. In this case, the administrator created the alias, possibly with the express puurpose of creating a separate conceptual permission space.

The second issue is very tricky. It would ideally require that you obtain file names in a canonicalized form before doing the string comparisons. On Windows, getting a canonical name for a file (ignoring the aliasing issue above) requires that you open the file and get a handle to it. This can be expensive, especially for remote file systems. Instead, we are relying on path normalization to cover us. Any implementation of Path.GetFullPath should be reviewed and tested against real file system behavior extremely closely.

The third issue is a bit more subtle. For a file system like NTFS, the file names can contain almost any Unicode character, and the file names are case-insensitive. However, the casing table used by a CLI implementation may differ from the casing table used by the file system. For example, NTFS writes the current OS's casing table to the file system when the file system is formatted, then all future OS's use that casing table for all case-insensitive comparisons. If an OS's casing table changes in a future release, to account for new Unicode characters or to correct for mistakes, then it may be nearly impossible to do case-insensitive comparisons in exactly the same manner as the file system.

You might think this is a gaping security hole. In practice, changes to an OS's casing table are somewhat rare and generally limited to obscure Unicode characters that are not commonly in use. But a thorough CLI implementation should at least be aware of this problem and investigate whether it could cause problems for the file systems most commonly used with that CLI.

Normalized paths are tricky [Brian Grunkemeyer]

Additional resources