What’s the difference between the COM and EXE extensions?


Commenter Koro asks why you can rename a COM file to EXE without any apparent ill effects. (James MAstros asked a similar question, though there are additional issues in James' question which I will take up at a later date.)

Initially, the only programs that existed were COM files. The format of a COM file is... um, none. There is no format. A COM file is just a memory image. This "format" was inherited from CP/M. To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go.

The COM file format had many problems, among which was that programs could not be bigger than about 64KB. To address these limitations, the EXE file format was introduced. The header of an EXE file begins with the magic letters "MZ" and continues with other information that the program loader uses to load the program into memory and prepare it for execution.

And there things lay, with COM files being "raw memory images" and EXE files being "structured", and the distinction was rigidly maintained. If you renamed an EXE file to COM, the operating system would try to execute the header as if it were machine code (which didn't get you very far), and conversely if you renamed a COM file to EXE, the program loader would reject it because the magic MZ header was missing.

So when did the program loader change to ignore the extension entirely and just use the presence or absence of an MZ header to determine what type of program it is? Compatibility, of course.

Over time, programs like FORMAT.COM, EDIT.COM, and even COMMAND.COM grew larger than about 64KB. Under the original rules, that meant that the extension had to be changed to EXE, but doing so introduced a compatibility problem. After all, since the files had been COM files up until then, programs or batch files that wanted to, say, spawn a command interpreter, would try to execute COMMAND.COM. If the command interpreter were renamed to COMMAND.EXE, these programs which hard-coded the program name would stop working since there was no COMMAND.COM any more.

Making the program loader more flexible meant that these "well-known programs" could retain their COM extension while no longer being constrained by the "It all must fit into 64KB" limitation of COM files.

But wait, what if a COM program just happened to begin with the letters MZ? Fortunately, that never happened, because the machine code for "MZ" disassembles as follows:

0100 4D            DEC     BP
0101 5A            POP     DX

The first instruction decrements a register whose initial value is undefined, and the second instruction underflows the stack. No sane program would begin with two undefined operations.

Comments (42)
  1. Anonymous says:

    “To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go.”

    You left out the part that the “first byte” gets loaded at offset 0x100 relative to the value of the segment registers though. And the “no fixups” part meant that the image had to be self-relocating.

    [There are plenty of details I left out since they were not relevant to the topic. -Raymond]
  2. J. Edward Sanchez says:

    I wonder if Mark Zbikowski ever thought to verify that DEC BP and POP DX were indeed undefined operations at the beginning at a program — just in case Microsoft ever decided to be sneaky and start renaming EXE files to COM files. If not, then that’s a pretty happy coincidence.

    In retrospect, I can’t help but think that something like “É0Σ═!” (90 30 E4 CD 21) would have been a better EXE marker. That disassembles to a NOP followed by XOR AH, AH and INT 21h (a call to DOS to terminate the program).

    Optionally: Allow a sequence of bytes to be inserted in between the NOP and the termination call. This would give EXE files the flexibility to contain a stub COM file that could print something like “This is an EXE program.” before terminating.

    Now where’s that time machine?

    [Um, you do realize that your “optionally” means that every COM program would get misdetected as an EXE? -Raymond]
  3. Anonymous says:

    @wades:

    I’d say that COM images aren’t self-relocating at all. Self-relocating means (IMO) that you can load them at another address than 0x100 but that really doesn’t work with a COM image.

  4. Anonymous says:

    IIRC, there is a 0000H on the stack when a COM program starts, and an INT 20H at PSP:0000H.  This is so that the program can exit just by doing a RETN.  So the POP DX would not really underflow the stack.

  5. Anonymous says:

    but why wasn’t the loader modified ONLY for the "well known programs"?

  6. Anonymous says:

    But if there really is no prolog and you just jump and execute, how are you guaranteed that there’s a 0000h on the top of the stack?

  7. Anonymous says:

    Raymond: Your question "So when did the program loader change" is answered "Compatibility", which leads me to think that by "when", you meant "why".

  8. Anonymous says:

    ""To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go."

    You left out the part that the "first byte" gets loaded at offset 0x100 relative to the value of the segment registers though. And the "no fixups" part meant that the image had to be self-relocating."

    Interestingly, CP/M and its successors(including MSX-DOS and MS-DOS) all loaded their programs at offset 0x100. It is perhaps the only thing that can be called "standard" among .COM files, even when different processor architectures are involved.

    The Z80 processor that was pretty common at the time could only adress 64KB of RAM, so no segment registers to worry about. I think it was no coincidence that the 8086 segments were created in that size.

  9. Anonymous says:

    Eber: Which ‘well known’ programs? OK, maybe in the beginning this was only needed by COMMAND.COM and EDIT.COM – but that list grew. Better to come up with a generic solution, allowing ANY .COM executable to exceed 64K by being in .EXE format, rather than keep updating a list of file-specific hacks!

    Also, Visual C++’s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) – by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI. If the .COM/.EXE hack were filename specific, this wouldn’t be possible – at least without the Visual Studio team getting the OS loader updated specially for them, which would probably irritate a lot of people as well as being bad engineering in principle.

  10. Anonymous says:

    > But if there really is no prolog and you just jump and execute, how are you guaranteed that there’s a 0000h on the top of the stack?

    Now, that’s quite a nitpick.  Raymond didn’t actually say that the loader performed absolutely no preparation for the COM program – he just said that nothing was done to the program image.

  11. Anonymous says:

    Useless trivia for the day:  Either "MZ" or "ZM" was a valid EXE header signature – at least in DOS.  I’m not sure about Windows.

  12. Anonymous says:

    "I’d say that COM images aren’t self-relocating at all. Self-relocating means (IMO) that you can load them at another address than 0x100 but that really doesn’t work with a COM image."

    Sure you can. With segment addresses overlapping the near 16-bit offset you could load it at any 16-byte aligned address in memory.

  13. Anonymous says:

    > Sure you can. With segment addresses overlapping the near 16-bit offset you could load it at any 16-byte aligned address in memory.

    Nitpicking. The CS:IP would be xxxx:0100 anyway.

    > In retrospect, I can’t help but think that something like "É0Σ═!" (90 30 E4 CD 21) would have been a better EXE marker. That disassembles to a NOP followed by XOR AH, AH and INT 21h (a call to DOS to terminate the program).

    In retrospect, that would be overkill.  It’s not like everyone was going to rename EXEs in COMs everyday.

  14. Anonymous says:

    Interestingly, CP/M and its successors(including MSX-DOS and MS-DOS) all loaded their programs at offset 0x100. It is perhaps the only thing that can be called "standard" among .COM files, even when different processor architectures are involved.

    I believe the historic reason is that the memory from 0x00 to 0xFF is used for the stack, which in turn originated in certain old CPU architecture (Z80 for example I think) where the stack pointer is only 8-bit.  Anyway, MS-DOS was derived from CP/M so naturally it followed the same convention as CP/M for COM images.

    Also, Visual C++’s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) – by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI.

    That sounds inaccurate to me.  16-bit Windows executables uses the NE format, which builds on top of the MS-DOS exe format such that there’s an MS-DOS exe "stub" (which is really just arbitrary code) that gets run if you run the program under MS-DOS, and the new NE-specific stuff essentially follows after the stub.  It makes much more sense for VC++ to make use of that, rather than .COM/.EXE, to support the dual-UI feature.

  15. Anonymous says:

    @reader: The stack started at 0xFFFE and grew downwards.

    0x00 to 0xFF was the Program Segment Prefix, which included such things as the command line arguments and two File Control Blocks, at least the first of which was helpfully filled in for you (IIRC) if the first argument looked like a filename.

    Another backwards-compatibility tidbit was that address 0x0005 contained a jump to an interrupt routine so that CP/M-like programs which did "CALL 0005" rather than "INT 21H" would also work.

    See also http://en.wikipedia.org/wiki/Program_Segment_Prefix .

  16. Anonymous says:

    CP/M didn’t start out on on the Zilog (Z80) CPU.  It ran on the 8080 (8-bit precursor to the Intel 8086).  It just so happened that the Z80, (coincidentally you know, not planned or anything… ahem), that the Z80 was a superset of the 8080 and could run 8080 programs just fine, thanks.  Plus it had a couple of other registers and a few extra instructions.  But CP/M ran on it, and that was the important part.  Under CP/M, the OS (such as you could call it one) owned the memory below 0x100.  It had bios call tables, the command line / default disk buffer, and a bunch of other undocumented things that people depended on not to move or change ever again.  (Raymond’s compatibility problems go back at least that far).  I don’t recall exactly what SP was set to, but if you saved it you could return directly to the "command interpreter," rather than doing a reset that required the interpreter to be reloaded.  The stack pointer was most definitely 16 bits.  There were some older chips that had 8 bit stack pointers, though.  Some of those are still around being used as micro-controllers (like toasters and exercise bikes).  When you want a bunch of them, getting them for a nickel is a good thing.

    Sorry, I’m rambling again.  Age will do that.

  17. Anonymous says:

    Thought I would mention, I found the book "Virus Research & Defense" published by the Symantec press (I forget the author name) to have quite an informative history of how code files evolved.  Including lots of details on window’s PE format.  It of course focuses on how they were abused over time, but it is still quite relevant.

  18. Anonymous says:

    I would be curious to know when exactly COMMAND.COM was renamed into CMD.EXE and how that affected compatibility? It seems like a much bigger, breaking change than renaming FORMAT.COM into FORMAT.EXE.

  19. Anonymous says:

    And, just because of that, I noticed that COMMAND.COM is still around. Huh, never realized that…

  20. Anonymous says:

    "I believe the historic reason is that the memory from 0x00 to 0xFF is used for the stack, which in turn originated in certain old CPU architecture (Z80 for example I think) where the stack pointer is only 8-bit."

    Hm, the z80 had a 16-bit SP.  6502 had an 8-bit SP, but its stack was at 0x100 – 0x1FF, just above zero page.

    But 0x000 – 0x0FF tended to have things hardwired in it like rst vectors.

    "And the ‘no fixups’ part meant that the image had to be self-relocating."

    Well, they could be loaded in any segment, and I recall a lot of push cs/pop ds… or es…  I don’t remember exactly which registers…  But I might be thinking of boot records, where it’s just smaller than loading the address you already know you’re at.

    But anyway, that’s a bit different from being really position independent or self-relocating.  With only one segment, addressing is flat and fixed so there’s nothing to patch internally.

  21. Anonymous says:

     "That sounds inaccurate to me.  16-bit Windows executables uses the NE format, which builds on top of the MS-DOS exe format such that there’s an MS-DOS exe "stub" (which is really just arbitrary code) that gets run if you run the program under MS-DOS, and the new NE-specific stuff essentially follows after the stub.  It makes much more sense for VC++ to make use of that, rather than .COM/.EXE, to support the dual-UI feature."

    James was correct.  There were other, older, DOS/Win16 Microsoft tools that used the MZ stub and the NE executable to provide dual-mode behavior, but what devenv did was a different sort of hack.  There was both devenv.com and devenv.exe, both were in fact PE’s with a standard stub, but because .COM files were found first by CMD.EXE (given the default PATHEXT) when you typed ‘devenv’ from a command prompt you got devenv.com, the console subsystem PE executable, but the start menu/etc shortcuts were to devenv.exe the windows subsystem PE executable.

    It was a hack.

  22. Anonymous says:

    Or, they could have provided small .com stubs to launch the .exe files.

    Nah, that would have been sane and kept things simple instead of doing something overly complicated and prone to strange side effects.

    And hey, as long as it all fits on a floppy, right?

  23. Anonymous says:

    > Or, they could have provided small .com stubs to launch the .exe files.

    Nah, that would have been sane and kept things simple instead of doing something overly complicated and prone to strange side effects. <<

    How is that simpler than simply having the loader look for the MZ signature?  What strange side effects does what Raymond described have?

    Having a small .com stub shell out to the real .exe is probably the first solution that would have come to my mind, but it has the downside that now you have to make sure 2 executables are available (and then you’d have Raymond’s article explaining "Why do some standard executables have both a .com file and a .exe file, such as format.com and format.exe?").  Personally, I think having the loader not care about the extension is much cleaner and preferable.

  24. Anonymous says:

    Btw, is Cmd.exe also in "maintenance mode" or legacy stuff. Why isn’t it being improved?

  25. J. Edward Sanchez says:

    [Um, you do realize that your “optionally” means that every COM program would get misdetected as an EXE? -Raymond]

    No, only every COM program that starts with a NOP.

    [Ah, right, sorry; I missed that part. It does make parsing the header significantly more difficult, however, since locating the header becomes O(n). -Raymond]
  26. Anonymous says:

    command.com seems to have more compatibility stuff than cmd.exe, which wouldn’t need it since cmd.exe wasn’t around in DOS/9x/ME.

    At least, I recall I prefer cmd.exe over command.com and I think that was the reason why.

    I always knew about the "MZ" header but didn’t realize it was some guy’s initials.

    Here’s some fun: http://www.eicar.org/anti_virus_test_file.htm

    The ASCII string which is actually a binary COM file.  You can paste it into notepad and save it as a .COM file and run it to see if your anti-virus catches it (it’s a harmless "Hello World!" style program used as a test for anti-virus products).  I always thought it was kinda neat how it didn’t have any control codes or >= 0x80 characters.

  27. Anonymous says:

    anon: There’s your problem.  Starting a COM file with a NOP is perfectly acceptable since COM files have no syntax.  The only way your idea would work is if you started the file with something that would be impossible to use at the beginning of the COM file because it wouldn’t work… like "MZ".  Of course if you use that or any variant your idea no longer works since it’s based on the idea that the file starts with acceptable COM code!

  28. Anonymous says:

    And hey, as long as it all fits on a floppy, right?

    To which end it appears that something along the lines of PKSFX was used to compress the executable.

    As for using the real-mode stub for a PE executable, I’ve only seen it done once, I think it was for some old version of Excel that shipped with its own copy of Windows (since most people didn’t have Windows then) and the job of the stub was simply to execute "win excel".

    I guess fitting on a floppy was the main reason why Windows 95 xcopy.exe launched xcopy32.exe instead of being dual-mode.

  29. Anonymous says:

    The ZM signature was valied in DOS but not in Windows.

  30. Anonymous says:

    > Also, Visual C++’s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) – by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI.

    That sounds inaccurate to me.  

    And yet it is true. If you use a command shell to execute "devenv /build mysolution.sln" then you’ll get devenv.com and it will be a text mode build. That’s because .COM comes before .EXE in the PATHEXT environmental variable.

    One person’s backwards compatibility hack becomes another’s feature.

    PS> ($env:PATHEXT)

    .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

    PS> get-command devenv | fl

    Name            : devenv.com

    CommandType     : Application

    Definition      : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

    Extension       : .com

    Path            : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

    FileVersionInfo : File:             C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

                     InternalName:     DEVENV.COM

                     OriginalFilename: DEVENV.COM

                     FileVersion:      8.0.50727.42 built by: RTM

                     FileDescription:  Microsoft Visual Studio Command Line

                     Product:          Microsoft® Visual Studio® 2005

                     ProductVersion:   8.0.50727.42

                     Debug:            False

                     Patched:          False

                     PreRelease:       True

                     PrivateBuild:     True

                     SpecialBuild:     False

                     Language:         English (United States)

    Name            : devenv.exe

    CommandType     : Application

    Definition      : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

    Extension       : .exe

    Path            : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

    FileVersionInfo : File:             C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

                     InternalName:     devenv.exe

                     OriginalFilename: devenv.exe

                     FileVersion:      8.0.50727.867 built by: vsvista

                     FileDescription:  Microsoft Visual Studio 2005

                     Product:          Microsoft® Visual Studio® 2005

                     ProductVersion:   8.0.50727.867

                     Debug:            False

                     Patched:          False

                     PreRelease:       True

                     PrivateBuild:     True

                     SpecialBuild:     False

                     Language:         English (United States)

  31. Anonymous says:

    @reader: The stack started at 0xFFFE and grew downwards.

    0x00 to 0xFF was the Program Segment Prefix,

    Oops, you’re right, I remembered incorrectly.  My bad.

  32. Anonymous says:

    I always liked the MZ+LE trick where you could write a VxD with a Dos stub was actually the Dos version of the program. The idea was that if you run it in Dos only the MZ part was used.  But that hooks int 2fh and uses that hook to load a small VxD when with just enough logic to make it keep working after Windows has virtualized everything.

    http://support.microsoft.com/kb/74516

  33. Anonymous says:

    I think xcopy32 was around just to give 16-bit xcopy long filename support.

  34. Anonymous says:

    cmd.exe is good enough, right?  I don’t think it requires much improvement.  We can build simple tools that drive cmd.exe such as my project winrosh to make it more fun.  OTOH, xcopy, copy and move commands can be improved (to make copying/moving files more capable than using GUI operations), but not replaced with a single robocopy command.

  35. Mike Dimmick says:

    The .com + .exe trick for Visual Studio was introduced at least as early as VC6 (msdev.com, msdev.exe) and adapted for eMbedded Visual C++ (evc.com/.exe) then adopted also for the ‘unified IDE’ of VS.NET 2002. VS 2008 still ships devenv.com and devenv.exe. The .com file is a small stub which loads the .exe, passing it a handle to the console that the .com was loaded in, so that Visual Studio’s build system can send output to that console. The ability to attach a program not already associated with a console to an existing console, using the AttachConsole function, was only added in Windows XP. The devenv.com program itself is a renamed console-subsystem Windows executable (PE file).

    @Mats Gefvert: CMD.EXE is a console-mode subsystem command interpreter. It isn’t required for running console-mode programs. x64 systems have a 32-bit version in %SystemRoot%SysWOW64 and a 64-bit build in %SystemRoot%System32.

    COMMAND.COM is the 16-bit DOS interpreter, and it *is* loaded for a DOS environment as DOS programs expected it to be there. x64 systems do not contain COMMAND.COM as they have no Virtual DOS Machine environment (ntvdm), as the required processor submode was removed by AMD. (It’s still there if you boot the processor in 32-bit protected mode, but a 64-bit OS cannot access it.) If you type COMMAND into the run box rather than CMD, you get a less functional, slower command interpreter on 32-bit, and an error on 64-bit. Use CMD.

  36. Anonymous says:

    @Mike Dimmick

    "x64 systems do not contain COMMAND.COM as they have no Virtual DOS Machine environment (ntvdm), as the required processor submode was removed by AMD. (It’s still there if you boot the processor in 32-bit protected mode, but a 64-bit OS cannot access it.)"

    True enough. But did you know that HAL in Windows XP x64 actually emulates 16 bit Bios code in software so that video drivers which still need to can call it? I guess by the time Vista 64 shipped the video card vendors had had enough time to find another way to get whatever information they wanted, because the emulator is no longer present there.

  37. Anonymous says:

    When 8-bit CP/M needed an expanded COM file format, the magic number used was 0xC9, which in 8080 machine code is RET. Later another extension was added by third-party developers, and that used 0xC7 (RST 0; it would be like starting a DOS COM file with 0xCD 0x20, INT 20h).

    I think there’s another criterion for a file being treated as EXE rather than COM; it’s got to be big enough to contain the EXE header. So this: 4d 5a ba 0c 01 b1 09 e8 fb fe cd 20 48 65 6c 6c 6f 24 is run as a COM file (at least on XP), despite starting MZ.

  38. Anonymous says:

    "But did you know that HAL in Windows XP x64 actually emulates 16 bit Bios code in software so that video drivers which still need to can call it? I guess by the time Vista 64 shipped the video card vendors had had enough time to find another way to get whatever information they wanted, because the emulator is no longer present there."

    Indeed the call for doing this was removed in WDDM.

  39. Anonymous says:

    Just a little nitpicking: .COM files are not limited to 64 KB. They can grow larger and address all of the space, at least on MS-DOS 3.2 and after. The only problem is that the writer of the .COM file has to handle all the segment arithmetic on his own, as the DOS loader did not perform any adjustments (as it does with .EXE files).

    Of course, switching to the MZ-EXE was a good move in the first place.

  40. Anonymous says:

    DEVENV.exe vs DEVENV.com

    this is true.

    DevEnv.com is the one that correctly handles piping/buffering the output so you can pipe it in something else.  

    So when you build at the command line, it uses devenv.com and you see the build output as it  progresses..  if you used DevEnv.exe, you only got the result at the end.

  41. Anonymous says:

    Wiem już, że kilka plików, które były wykorzystywane podczas infekcji, nie jest widocznych w systemie plików. Oznacza to (najprawdopodobniej), że zostały one usunięte… Czego szukam… Pliki, których szukam: C:Documents and SettingsAdmini

Comments are closed.