Why does MS-DOS use 8.3 filenames instead of, say, 11.2 or 16.16?


When I discussed years ago why operating system files tend to follow the old 8.3 file name convention, I neglected to mention why the old MS-DOS filename convention was 8.3 and not, say, 11.2 or 16.16.

It's a holdover from CP/M.

As I noted when I discussed the old MS-DOS wildcard matching rules, MS-DOS worked hard at being compatible with CP/M. And CP/M used 8.3 filenames.

Why did CP/M use 8.3 filenames? I don't know. There's nothing obvious in the CP/M directory format that explains why those two reserved bytes couldn't have been used to extend the file name to 10.3. But maybe they figured that eight was a convenient number.

Comments (45)
  1. Anonymous says:

    Some CP/M compatability annoyances still infested Visual Basic 6, clearly to be legacy compatable all the way back to BASICA.

    Specifically it was the file I/O routines Input # and Line Input #, which took EOF (ascii 26) characters as verbatim end-of-files, regardless of what the actual file length was.

    CM/M pretty much required respect for EOF characters, so BASICA did, then GWBASIC, then QuickBasic, then VisualBasic…

  2. Anonymous says:

    I suspect the limitation came from Mesopotamian craftsmen; the clay tablets they crafted could comfortably capture a total of eight cuneiform pictograms.

    The extensions were an Egyptian innovation made possibly by papyrus scrolls.

  3. Anonymous says:

    Reserved bytes in a file system structure are just common sense. If the file name itself had been 10 bytes long, there would still be reserved bytes and you’d have the same question.

    Dave Cutler might have some insight on this, since CP/M was created to vaguely resemble Digital’s user environments. I don’t know if the 8-character filenames were part of that, though.

  4. Anonymous says:

    CP/M was written on an Intel Intellec MDS machine, the alternative at the time was ISIS from Intel, that had a filename limitation of 6.3, so CP/M had already expanded the name part by two bytes.

  5. Anonymous says:

    DEC’s RT-11 and RSX-11 had 6.3 names.

  6. Anonymous says:

    CP/M was written by someone who’d seen DEC’s TOPS-10 system (I think Gary Kildare even said he used it as a model).

    TOPS-10 used 6.3 filenames, aka one and a half words (with fixed-length names restricted to the SIXBIT character set).

    Why 8.3 though, I have no idea.

  7. Anonymous says:

    If they had copied Unix, we’d have 14-character filenames.

  8. Anonymous says:

    > how many file extensions are longer than three characters?

    .html and .jpeg for starters.

    Or are you asking how many file extensions on a system supporting 3 characters have more than 3 characters?

  9. Anonymous says:

    to dave: not "Kildare", Gary Kildall was the creator of CP/M:

    http://en.wikipedia.org/wiki/Gary_Kildall

  10. Anonymous says:

    On the one hand, ‘intelligible’ is in the eye of the beholder. At the time of MS-DOS, that beholder had to accept a lot more sufferment from his computer than that 3-character extension limit.

    On the other hand, there were .doc = WordPerfect document and .doc = Microsoft Word document. Making these two intelligible in 3 ascii characters is a challenge (‘pdc’ and ‘wdc’?)

  11. Anonymous says:

    Wordperfect for DOS did not really have extensions.  An entire generation of word-processors learned to treat 8.3 filenames as 11-character filenames.  In the Windows world, Wordperfect has adopted .WPD.

    These days, you can still find legal briefs with an 8.3 filename printed at the bottom of every page.  Lawyers are famously old-fashioned.  Not only do they love their Wordperfect, but they also stick to their 8.3 filenames.

    Does anybody keep track of word-processor market share in the legal industry anymore?  Last I heard, Wordperfect was below 50%.

  12. Anonymous says:

    What I find puzzling is why the extension comming last in text representations became the defacto standard across OS’s and file systems.

    Think about it.

    exe.foobar

    txt.readme

    doc.paper

  13. Anonymous says:

    Tom: the last time I worked with any lawyers, they were working using a collaborative document creation system that was accessed via vt220 terminal emulators connected over serial lines to a dec vax.  This was 2004, so they may have switched to a PC app by now.

    Scott: "Incidentally, Palm OS uses 4 letter creator IDs." — Which it inherited from MacOS, which it is largely derived from (I understand PalmOS was bootstrapped from compilers running under MacOS, hence many structures are similar, including the layout of memory segments etc).  In both cases, the idea is that rather than treating the name as a string, it can be handled as an int.  Same idea as the fourccs used to identify codecs in .AVI files.

  14. Anonymous says:

    Joseph Koss: the idea is so that related files are sorted together by an alphabetical sort.  The convention seems to have originated in Unix, where the developers’ primary concern was to keep the .c, .s and .o files relating to a particular program module together in the directory listing.

  15. Anonymous says:

    porter: would have been kind-of hard to be used to classic mac in 1980, when Tim Patterson started writing the OS that would eventually become MSDOS.  Apple DOS supported limited file typing; files were either text, binary, or BASIC programs.  File extensions weren’t used, so it was up to the user to decide what program to open anythign with that wasn’t either text or a basic program.

  16. Anonymous says:

    Why did CP/M use 11 character filenames?  It doesn’t seem to be recorded anywhere, but the evidence available makes it look like the reason is something like this:

    CP/M file systems have a 32 byte directory entry for each file.  The directory entry’s format [ http://www.seasip.demon.co.uk/Cpm/format14.html ] has two reserved bytes, so at first glance it would seem 10.3 could have been supported, but also note that the directory entry format is nearly identical to the first 32 bytes of the FCB format [ http://www.seasip.demon.co.uk/Cpm/fcb.html ], and that at least one of those reserved bytes is used internally by the OS in the FCB.  I wouldn’t be surprised to find that actually both were, at least in some early version.  And very early (unreleased) versions may well have had an FCB layout that was identical to the on-disk directory entries, rather than a byte longer as the first released version was.

  17. Anonymous says:

    The Question, of course, is why the eight character limit *before* the dot. Having three characters for the extension make sense, even know (how many file extensions are longer than three characters?). I guess that the 8 character limit was a way of "rounding" the 6 character limit of older systems. Remember that, back in time, all filesystem and many memory structures had a size that was a power of two, because doing otherwise would waste precious bytes at the end of the disk sector or memory page. The exception was Apple’s ProDOS, which IIRC could fit seven directory entries in a standard 512 byte sector, with something like 13 or 15 spare bytes.

  18. DWalker59 says:

    Many years ago, my Mom said "Who would need more than 8 characters in a file name?  Putting blanks in a file name is dumb too."

    Now she happily uses long filenames to name all of her files.  Once you go long, you never go back.

  19. Anonymous says:

    Yes, .htm and .jpg are great examples…

    I think the point was, how many *need* more than three characters to be intelligible.  Particularly in the days of DOS…

  20. Anonymous says:

    8.3 is nothing special but I think extended filenames in FAT format deserves an article.  I coded this is 8051 assembly once, talk about rounding up all the scattered pieces!!

  21. Anonymous says:

    @Joseph Koss: "What I find puzzling is why the extension comming last in text representations became the defacto standard across OS’s and file systems."

    As Jules mentioned, I believe one reason is for sorting purposes. And I guess "transform.c" and "transform.h" feels more natural than "c.transform" etc.

    Some notable exceptions: some may remember the Amiga tracker modules, with filenames in the form "MOD.name". Although AmigaOS did seem to follow the suffix convention for filenames that included the type – such as ".info", ".library", ".device".

  22. Anonymous says:

    >> I think the point was, how many *need* more than three characters to be intelligible.

    If you were used to the classic Mac, you would ask "Why do you need file extensions at all?"

  23. Anonymous says:

    "It doesn’t seem to be recorded anywhere, but the evidence available makes it look like the reason is something like this:"

    You didn’t explore the reason why it’s limited to 8.3, but rather you just showed how they designed the system around that limit. The FCB and 32-byte directory entry have those numbers because of the 8.3 limit, not the other way around.  They could have just as well doubled the size of each to support longer names.

  24. Anonymous says:

    > I think the point was, how many *need* more than three characters to be intelligible.

    Intelligible is in the eyes of the beholder.  And if they had chosen 8.2, we still would have managed ok.

    Incidentally, Palm OS uses 4 letter creator IDs.

  25. Anonymous says:

    @Scott: PalmOS, Classic MacOS (and to a limited extent, MacOS X) don’t have 4 letter creator/type IDs. Instead, creator/type is a 32-bit unsigned integer. It’s just we chose to limit the range of values, but they can handle binary just fine.

    And .htm is a bastardization of .html, the Web, after all, was created on machines that didn’t have such limitations. It was just a popular OS that forced the issue. I think I’ve seen several popular httpds extension type configurations say "Support lame OS extensions" and list .htm -> text/html as a mapping. In the early web, if you forgot to do the rename, things may not have worked.

  26. Anonymous says:

    I suspect the 6 character length goes back to OS/360 (which is older than TOPS-10), if not before, but it would be interesting why the next size up was 8.

  27. Anonymous says:

    Jules: These makes sense to me. I heard that before DOS 3, most disk I/O operations are done with FCB related interrupts.

    J: I think doubling size of filesystem structure would be a hard decision to make, considering even those 8 inches floppy disks (980kb capacity) are "latest high capacity storage media" at that time (That’s not counting those slow tapes).

  28. Anonymous says:

    For me the DOS 8.3 format seemed cramped from the start, coming from my C64 that managed 16 characters…

  29. Anonymous says:

    Imagine a world with hard disks. Without DVDs or CDs. Floppy disks ruled the land. The big black 5 1/4" floppy disks. Single sided. 160K each.

    Now, in this land there was a flat directory structure. No folders, sub-directories, links, etc.

    Imagine working in this land and being able to store a couple of dozen files on a disk. Do you want to give up some of that storage space for longer directory names.

    (and Unix. Yeah right. Gary and Tim didn’t know about Unix. )

  30. Anonymous says:

    "I suspect the 6 character length goes back to OS/360 …"

    Without going into all the gory details, OS/360 filenames were up to 44 characters (Don’t ask, I don’t know).  They were made up of various bits and bobs separated by separators! (That bit IS unusual).  Bits and bobs were up to 8 bytes.  eg WINDOWS.VISTA.SOURCE.H(WINAPI) might be a location for winapi.h.

  31. Anonymous says:

    I think Jules is on the right track (sorry).

    11 bytes for the filename (8.3) along with the other sector location and flag information make the CP/M entry a nice tidy 32 bytes meaning 4 entries fit cleanly on a 128-byte sector.

  32. Anonymous says:

    > The big black 5 1/4" floppy disks. <<

    You’re not thinking back far enough – back when CP/M was King, the common floppy disk was 8".  I can’t recall what the capacity of those things were.

  33. Anonymous says:

    AmigaOS had *.library, mod.*

    OS X has *.prefPane, *.backupDB, *.component, *.plugin, *.textClipping and plenty of other "super-long" filename extensions. It does help readability, especially for exotic program-specific files (*.pages, *.graffle)

  34. Anonymous says:

    "and Unix. Yeah right. Gary and Tim didn’t know about Unix."

    Even if they did, the idea of running it on an S-100 system would have been a demented fantasy.

  35. Anonymous says:

    @Duke.NY:

    The S-100 bus was enhanced to run 24 address lines and at least one LSI-11-based computer design ran on this bus. The LSI-11 was a chipset from Western Digital that emulated a PDP-11 CPU.

    http://www.retrotechnology.com/herbs_stuff/s100bus.html

    It probably would not have been difficult to port Unix to such a system.

  36. Anonymous says:

    Some versions of CP/M use the two bytes labelled "reserved" in the docs Raymond linked to. s1 was used to identify how many bytes in the last extent were valid (so you could get an exact filesize in bytes, rather than multiples of 128 bytes) and s2 was used as the high byte of the extent field, to support longer files.

    Whether those was planned for early versions, but not implemented (hence reserving the bytes) or whether it was just bodged into the existing reserved bytes is a matter for history I guess.

  37. Anonymous says:

    The "Early CP/M source" archive at < http://www.cpm.z80.de/source.html > defines the two extra bytes as ‘unused for now’. And a quick glance at the PL/M code suggests they are unused, in disk and memory. That is, it manages to do all its file I/O with a copy of the the on-disk directory entry plus a single byte to hold state.

  38. Anonymous says:

    > The big black 5 1/4" floppy disks. <<

    >You’re not thinking back far enough – back when CP/M was King, the common floppy disk was 8".  I can’t recall what the capacity of those things were.<<

    You know, I realized that while driving to work this morning. I never used 8" floppies that much.

  39. Anonymous says:

    Joseph made me try it out and I stumbled upon some pretty humorous behaviour… when VB encounters a ^Z character it actually sets the file pointer (the one Seek returns or sets) beyond where it would be after a normal EOF condition. If it hits upon the ^Z directly, it raises an error and points it 1 too far, if it read N other characters before the ^Z it succeeds and sets the file pointer N + 1 too far, i.e. at Lof(file) + N + 2. Unlike QuickBasic which sets the file pointer at the ^Z character. GW-BASIC unfortunately doesn’t have Seek (only Loc, which doesn’t return very useful information for sequetial files) so all I know is that it refuses to read past the ^Z.

  40. Anonymous says:

    Dec-10 machines had 36-bit words and ran the influential TOPS-10 monitor. TOPS-10 filenames were stored in 6 6-bit characters in such a word, with the extension taking another half word. The directory entry put 18 bits of other information into the other half word. These machines could be ordered with an addition to the instruction set that facilitated access to 5 7-bit characters in a word, but since it was an option, the 6-bit character set was the least common denominator.

    They had one of those at the college I attended. Later, when I used CP/M on a Z-80 I thought, "Wow, so many things the same as TOPS-10 and *longer* filenames! Cute! Eight-bit registers are a hassle though."

  41. Anonymous says:

    "Having three characters for the extension make sense, even know (how many file extensions are longer than three characters?)."

    Part of the reason why many programs still "get away" with 3-character file extensions is because their authors can’t be bothered to see if an extension is already in use by someone else (in other words, they are not getting away with it at all).

    For example, Poser uses .CR2 extension for ‘character files’ and Canon uses .CR2 for RAW files.  What are the chances that someone would have more than one graphics program on their computer (especially if they are into creating graphics?) – not very high according to some it seems.

    If more people broke away from the 3-character convention, life would be a bit easier for users.

  42. Anonymous says:

    6.3 on DEC machines makes perfect sense to me.

    I remember on a PDP 11 there was Radix 50: it was an alternative coding to ASCII that allowed to pack 3 cheracters (uppercase only) in two bytes. So, a filename 6.3 just needed 6 bytes.

  43. Anonymous says:

    8.3 filenames is used because a company patented long file names.

  44. Anonymous says:

    @ATZ man:

    Only the first LSI-11 model was based on WD chips; later were DEC own designs. There were T-11 (16 bit address), F-11, used in LSI-11/23 (18 or 22 bit physical address), J-11 with 22 bit physical address and FPU, used in LSI-11/73.

Comments are closed.