Why do operating system files still adhere to the old 8.3 naming convention?


Commenter Brian Reiter asks a duplicate of a question that was already submitted to the Suggestion Box: Darren asks why operating system† files still (for the most part) adhere to the old 8.3 naming convention.

There are a few reasons I can think of. I'm not saying that these are the reasons; I'm just brainstorming.

First, of course, the name of a DLL cannot change once it has been chosen, because that would break programs which linked to that DLL by its old name. Windows 95 did not require the system volume and user profile volume to support long file names, although that was certainly the case by default. Companies which used roaming profiles or redirected folders may have had a heavy investment in servers which did not support long file names. Therefore, all system files on Windows 95 had to conform to the 8.3 naming convention.

I believe that Windows NT permitted the system volume to be a short-file-names-only FAT partition as late as Windows 2000. Therefore, any DLL that existed in the Windows 2000 era had to conform to the 8.3 naming convention.

Starting in Windows XP, long file names became mandatory, and a few system files such as shellstyle.dll waded tentatively into the long file name world. (The .NET Framework folks jumped in with both feet with their managed DLLs, but notice that their unmanaged DLLs like mscoree.dll still conform to 8.3.) But the waters in this world can be treacherous for operating system components.

First of all, you have to worry about the automatically-generated short name. Suppose the operating system setup program is copying the shellstyle.dll file, but there is already a file called shellstuff.dll. The short name for shellstuff.dll will probably be SHELLS~1.DLL, and therefore the short name for shellstyle.dll will likely be SHELLS~2.DLL. Now, this may not be a big deal, except that some programs like to hard-code a file's short name. (There are a lot of programs that assume that the Program Files directory is C:\PROGRA~1, for example.)

Furthermore, you can create confusion if the same DLL is loaded by both its short and long names, since the loader treats them as distinct:

#include <stdio.h>
#include <windows.h>

int __cdecl main(int argc, char **argv)
{
 printf("%p\n", LoadLibrary("SHELLS~1.DLL"));
 printf("%p\n", LoadLibrary("SHELLSTYLE.DLL"));
 return 0;
}

If you run this program, you will get something like this:

6F2C0000
00340000

Even though the two paths refer to the same DLL, the loader treats them as different, and you end up with two copies of the same DLL loaded into memory. Now things get confusing, since you now have two sets of global variables, and if two components both use SHELLSTYLE.DLL but one used the short name and the other the long name, things get exciting when those two components try to talk about what they think is the same thing.

It's like that time when I was a child and our family took a trip to Disneyland. Our parents put my brother and me on the gondola ride, and upon arrival at the other end, we were to go to the Autopia ride which was right next door. The plan was that our parents would meet us at the exit to Autopia. When my brother and I exited Autopia, we expected our parents to be waiting there for us, but they were nowhere to be seen. Sticking to the plan, we waited patiently for our parents to arrive. We sat there for what seemed like two hours (but which was probably much less), until eventually we decided that my brother would stay put and I would go looking around, at which point it didn't take long for me to find my father, who was walking around looking for us.

What went wrong? Well, the problem was that the map of Disneyland showed Autopia, but what the map didn't say was that there were two Autopia rides (and therefore two Autopia exits) right next to each other. My brother and I were waiting by one exit, and our parents were waiting by the other. Each of us thought the other party was simply late.

Similarly, if a DLL goes by multiple names, you can end up with two copies of it loaded into the process, with different components talking about different copies, unaware that they are talking about different things.

And one final reason I can think of for sticking with 8.3 file names for operating system DLLs is simply, "Well, that's the way we've always done it. All the problems with 8.3 names are well-understood and under control. If we switched to long file names, we'd end up discovering a whole new set of problems. Why mess with something that works if it isn't broken?"

Better the devil you know.

Exercise: Why is it okay for the .NET Framework to use long file names for their managed DLLs?

Nitpicker's Corner

†s/operating system/Windows operating system/. Apparently nothing is obvious from context any more.

Comments (44)
  1. Anonymous says:

    Because the .Net framework doesn’t use the filename as anything other than the name of the assembly file?  That is, the name and version information about the assembly are stored inside the file and those values are used by the assembly loader rather than anything involving the file name.

  2. Anonymous says:

    Exactly. .Net implements its own loader, so since assemblies are identified by strong name the system is not confused by filename aliasing.

  3. Anonymous says:

    Another extremely good reason for DLLs to conform to 8.3 notation is simply that developers tend to have to type the names of the DLLs they’re working on quite often.  Shorter names are easier to type, so developers rarely want to make filename longer than 8 characters.

  4. Anonymous says:

    Those comments are close. The .Net loader subsystem is Fusion, which uses manifests to load assemblies. Fusion is a Windows subsystem, and it can use the same technique to load unmanaged (win32) assemblies. The key is the manifest file (or resource), which explictly tells the loader which file to expect and load based on a combination of factors, including the file name, version and a signed hash of the file contents. This metadata/file content based approach enforces unique distinction of the assembly or module regardless of file name, but the file name is still part of it. If you change the file name of the assembly, Fusion just won’t load it.

  5. Anonymous says:

    As recently as Win2K, I had to a post-install action to rename a file with a long filename.

  6. Anonymous says:

    This is the kind of muck that just needs to die already.  

  7. Anonymous says:

    Reminds me of a blog article by Larry Osterman: "What is AUDIODG.EXE?" (http://blogs.msdn.com/larryosterman/archive/2007/01/31/what-is-audiodg-exe.aspx)

    Seems like the installer technology (e.g. INF) can add some other reasons to the list as well.

  8. Anonymous says:

    "(There are a lot of programs that assume that the Program Files directory is C:PROGRA~1, for example.)"

    Unfortunately, using 8.3 aliases such as PROGRA~1 and MICROS~1 in the registry causes problems for backup programs which work on a file-by-file basis rather than imaging an entire filesystem.

    e.g. Suppose I have the directories:

    "Microsoft Office" (MICROS~1)

    "Microsoft Visual Studio" (MICROS~2)

    If I back up and restore my hard drive, there’s no guarantee that the new short names, which are ALWAYS auto-generated, will match the original one.

    Obviously this will cause problems for anyone who referenced the generated short names.

    Not a flame, just an observation (although I’m sure most people are aware of this.)

  9. Anonymous says:

    That DLL alias is kinda weird.  I assumed that the loader would de-alias DLLs when doing lookups just to avoid the problem mentioned.  Does Windows do anything special to de-alias hard-links to DLLs on an NTFS partition?

  10. Anonymous says:

    Ain’t it also because of the installation CD that the files are 8.3? The CDs use ISO 9660 Level 1 to ease copying.

    BTW, Windows XP can be installed on a FAT32 filesystem too, so I guess you meant that Windows 2000 could also work on FAT16? (And that filesystem was made before Windows 95 so does not support LFN, and FAT32 does?)

  11. Anonymous says:

    My understanding is that at least on XP (and probably before that), the loader should correctly resolve short/long names and other aliases, so this shouldn’t result in the same dll being loaded more than once.

  12. Anonymous says:

    The thing I find most funny is how much Vista is reverting back to shorter names.  I understand this is for practical reasons of only having MAX_PATH characters in a path in Win32, but C:Documents and Settings is now C:Users, and "Application Data" is now just AppData.  "Local Settings" is now just Local (and LocalLow).

    I really wish there were an easy solution to transition out of the 255 character path limitation.  I rather liked the verbose names (except when I had to type them :) )

  13. Rosyna says:

    Jack, I believe this was due to limits in MAX_PATH. Too many people were hitting it. There’s a series on it here http://blogs.msdn.com/bclteam/archive/2007/03/26/long-paths-in-net-part-2-of-3-long-path-workarounds-kim-hamilton.aspx

  14. Anonymous says:

    @Jack Mathews

    "I really wish there were an easy solution to transition out of the 255 character path limitation."

    Especially since it’s just a constant in WinDef.h and the underlying file system, NTFS, supports 32k characters for file names. Further, you can get around the MSCRT limitation (not sure how this plays with all Win32 APIs, though) by prefixing paths with "\?"… Just try that in .Net though: System.IO.Path will stop you cold.

    (Even more) tangentially, interested readers can do their part on that last front. Here’s the Connect issue to vote to get rid of that limitation in .Net (although it is closed, you can still vote for it):

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=240812

  15. Xepol says:

    Of course, a small tweak to the loader could make it easily recognize that short and long filenames were equivalent, forever removing any concern and solving the problem once and for all.

    ANd any app that decided to load MyCust~1.dll instead of MyCustom.dll and got MyCustardSkin.dll would have ONLY itself to blame for making stupid, unpredicable, asinine assumptions what are just as likely to go wrong in an 8.3 world anyways.

    Change the loader to NOT load any dll with a short name where the short name and long name do not match and the rest of the arguments are just spurious.

  16. Anonymous says:

    Incidentally, the FS has to generate short names for lfn files.  If you don’t have a long name, though, you only need one $FILENAME metadata entry.  This may or may not actually make a difference, but I speculate that it makes the directory indices smaller and more efficient for large directories of system files like System32.

  17. Anonymous says:

    Short filename generation is actually optional; I usually turn it off ASAP for my NTFS filesystems.

    Unfortunately, I then discover just how many third-party installers for modern, up-to-date programs are completely retarded and require short filenames.

    Will mentioned file-by-file backup programs: on XP/2003+ there’s actually an API for that, SetFileShortName(), so backup programs that are aware of the issue can handle it.

  18. nobodyman says:

    … I wasn’t trying to claim that 8.3 filenames were the sole cause of Vista’s delays.   I meant that last sentence  to be more "big-picture" than it wound up.

  19. Ryan Bemrose says:

    The situation with loading by long and short names may be more complicated than Raymond suggests.  When working with databases on Win2k, I discovered exactly what he outlines here, that LoadLibrary() treats "msjetoledb40.dll" and "msjeto~1.dll" as distinct files, but FreeLibrary() treats them as the same.

    int __cdecl main(int argc, char **argv)

    {

    HANDLE short = LoadLibrary("SHELLS~1.DLL");

    HANDLE long = LoadLibrary("SHELLSTYLE.DLL");

    FreeLibrary(long); // frees short version

    FreeLibrary(short); // Double-free error

    return 0;

    }

    I think I recall verifying this also happened on WinXP, though I can’t be sure.  I never checked on Vista.

  20. Andy, actually the 8.3 naming of the new audio files was a historical accident – we deployed the new engine at a time when the system was transitioning between the Windows XP INF based installer technology and the Vista manifest based installer technology.  During the period when we were creating the files, the system had to work in a wierd hybrid mode and it didn’t allow 8.3 names at that particular moment.  Later on it didn’t matter, but it was too much work to change the names (the lead developer of the build team described our suggestion of changing the name as "vanity", and he was right).

  21. Anonymous says:

    Speaking of multiple copies of the same DLL loading…

    I’ve had my own DLL Hell with .NET.  I have my binaries built in a tree structure like this:

    Program.exe

    PluginInterface.dll

    PluginsPlugin1.dll

    PluginsPlugin2.dll

    etc.

    The Program depends on PluginInterface, which contains interface IPlugin.  Plugin# also depends on PluginInterface and creates classes that derive from IPlugin.  Thus Program can load the Plugin# dlls, look for classes that inherit from IPlugin, and cast them to IPlugin and use them.  Wala, a plugin system.

    Now, because they all depend on PluginInterface, a copy of that DLL is put in when I build Program, and an additional copy in Plugins whenever I build a Plugin (and I just now remembered how to suppress that, heh).

    When Program.exe ran, .NET would load ProgramInterface.dll automatically since Program depended on it.  Then my code would run and load each Plugin#.dll.

    When each Plugin#.dll loaded, .NET would see it depended on PluginInterface… and then look for a DLL to load.  Oddly enough, instead of using the current loaded assembly, it opts to use the PluginInterface.dll from the current directory!  (I guess it finds it a closer match.)

    Now Program checks Plugin#.dll for classes that implement the interface in Program’s PluginInterface.dll.

    Except there are none… in any of the plugin DLLs.  Oops.  The Plugin’s classes derive from the PluginsPluginInterface.dll’s interfaces, which is a different assembly from PluginInterface.dll.

    Visual Studio’s Debugger was useless in diagnosing what was wrong, since all the class names were EXACTLY THE SAME.  I suppose if I had dug deep enough I would have figured out the classes were from different files, but I happened to figure it out before then in a spurt of inspiration… I deleted PluginsPluginInterface.dll.  Then Program worked fine.

    All the plugins use the already-loaded PluginInterface assembly if they can’t find one to load in the current directory and everything works fine.

  22. Anonymous says:

    Larry: There’s a fveupdate.exe in systemroot without a short file name too. :)

  23. Anonymous says:

    Thanks Raymond – its exactly this kind of post that keeps us coming back to your blog.  :)

  24. nobodyman says:

    Twenty years ago, I’m sure that 8.3 was more than adequate to uniquely identify every file on an MSDOS system disk, and even be descriptive.

    Fast forward to now.  There are currently 2,294 files in my %windir%system32 directory.  Plenty of room to uniquely identify them all, but I challenge you to tell me what “browselc.dll” does and how it differs from, say, “browsewm.dll”.  Calling it a "name" is a bit of an overstatement now.

    I agree that doing something different for its own sake is foolish.  I’d even allow that it’s worth it to make sacrifices in the name of backwards compatibility.  But all of these “Evil that you know” problems add up to the point to where you *must* break something in order to move forward.   Otherwise you reach a point where new features die a death of a thousand cuts.

    Windows Vista was delayed for years, most of its advertised features scrapped.  Isn’t it a fair criticism to say that fear of “The devil that you don’t know” and this slavish pursuit of backwards compatibility is at least partly to blame?  

    Do you agree (at least at some level)?  Do you feel that this issue was at the heart of Vista’s development woes?

  25. Anonymous says:

    And then there was LFNBACK.exe.

    If you wanted to back up Win95 using a DOS or Win31 backup system, you could strip out the long file names into a data file. On restore, you could get them back.

    Win95 would still start (just) if you had just LFNBACK’ed the system drive.

  26. Anonymous says:

    You can edit setupreg.hiv & hivesys.inf on the XP setup CD to set ntfsdisable8dot3namecreation to 1 and after setup XP would be completely devoid of any 8.3 names. You’d be surprised how many programs won’t run correctly in this environment, and i’m not talking about legacy apps. Apps that used InstallShield won’t uninstall from Add/Remove, for example. Or Firefox.

    I agree with nobodyman. With Vista, no one is obligated to fix their program to run under non-admin because Vista will automatically redirect files that write to HKEY_LOCAL_MACHINE and Program Files. And as if people or businesses are going to upgrade to Vista in order to run a program that was last updated for DOS 5.

  27. Anonymous says:

    Dan: Wala?  Is that the phonetic spelling of Voila (accent over the a)?

  28. Anonymous says:

    I know one reason why a small set of system files had to be 8.3.

    The NT Loader needs to launch the kernel with enough drivers to access the disk. The list would include the HAL, storage drivers, file system drivers, etc.

    Think about that last one – how does the NT Loader locate the filesystem drivers without using the filesystem drivers? Well, NT’s loader solved this catch 22 by using miniature versions of the filesystem drivers built directly into it. An 8.3 issue arose here because one of those miniature filesystem libraries, the one for FAT, never had long filename support added. As a consequence, the kernel, HAL, and core drivers had to fit into 8.3.

    (Note: this might not hold for Vista, which has an entirely new loader architecture).

  29. Anonymous says:

    I know one reason why a small set of system files had to be 8.3.

    The NT Loader needs to launch the kernel with enough drivers to access the disk. The list would include the HAL, storage drivers, file system drivers, etc.

    Think about that last one – how does the NT Loader locate the filesystem drivers without using the filesystem drivers? Well, NT’s loader solved this catch 22 by using miniature versions of the filesystem drivers built directly into it. An 8.3 issue arose here because one of those miniature filesystem libraries, the one for FAT, never had long filename support added. As a consequence, the kernel, HAL, and core drivers had to fit into 8.3.

    (Note: this might not hold for Vista, which has an entirely new loader architecture).

  30. Anonymous says:

    METADATA: CLR uses metadata to load an assembly, it doesn’t care abt the file name. Simple

  31. Anonymous says:

    The explanation by Adrian Oney sound good.

    During the initialization phaze of loading the operating system you can’t depend on having the long filename layer available.  The core files should have 8.3.

    Vista could be different.

    .NET kicks in long after the core operating system is loaded.  By that time the long filename handler should be ready.

  32. Anonymous says:

    Microsoft should start phasing out short filenames, and then the application developers will follow, even though they will be slow and probably reluctant.

    you can only maintain backward-compatibility to a certain point. After that everything turns around and starts to hurt you.

  33. Anonymous says:

    I don’t understand what everybody has against random system files having short filenames. If there was anything to gain from having long filenames, systems that have always supported them would already use them everywhere. Afterall, Unix has had a maximum of 14 character (or longer) filenames for over 30 years, yet finding a filename longer than 12 characters (the length of an 8.3 filename) on a Unix box is just as hard as on a Windows box.

  34. Anonymous says:

    Thanks for answering my question, Raymond. In the intervening time I also had stumbled upon a partial answer in Larry Osterman’s blog. Some part of the fundemental Windows installer technology before Vista was limited to 8.3 file names. I’m guessing this is the part used by the text-mode bootstrap installer.

    http://blogs.msdn.com/larryosterman/archive/2007/01/31/what-is-audiodg-exe.aspx

    "For a number of reasons that are no longer relevant (they’re related to the INF based installer technology that was used before Vista), we thought that we needed to limit our binary names to 8.3 (it’s a long story – in reality we didn’t, but we thought we did).  So the nice long names we had chosen (AudioEngine.Dll, AudioKSEndpoint.Dll, and DeviceGraph.Exe) had to be truncated to 8.3."

    It’s fascinating how old decisions, obsolete technologies continue to shape reality. Consider that everything in our culture is shaped by our collective history. The amazing thing is that we somehow expect computer systems to be exempt. Maybe this because they change so rapidly and seem so fungible.

  35. Anonymous says:

    The questioner did not ask why "operating systems" adhere to 8.3, (s)he asked why *Windows* still uses 8.3 so much!!

    "Another extremely good reason for DLLs to conform to 8.3 notation is simply that developers tend to have to type the names of the DLLs they’re working on quite often.  Shorter names are easier to type, so developers rarely want to make filename longer than 8 characters."

    That is a *horrible* reason!!

  36. Anonymous says:

    Brian: In addition, the text mode installer displayed the name of each file as it was copied; the space, in the bottom-right corner of the screen, is only a dozen or so characters wide, so anything much over 8.3 wouldn’t fit neatly.

    (For a long time after most of "us" would have stopped using FAT file systems, OEMs were using FAT-only disk duplication systems, so those had to be supported more recently than you might think as well; as a result, even systems which shipped with an NTFS Windows installation had to start life as plain old 8.3 FAT16 images.)

  37. Anonymous says:

    I have another related question.

    Why does the Win32 console still default to using non-unicode raster fonts and obsolete OEM code pages, forcing devs to go jump through all sorts of hoops in order to get text to display consistently in both environments? This is the case even with MS’s new Powershell.

    I can’t really see the backwards compatibility angle, since NTDVM could be extended to handle legacy fonts and code pages for DOS apps.

Comments are closed.