Why do text files end in Ctrl+Z?


Actually, text files don't need to end in Ctrl+Z, but the convention persists in certain circles. (Though, fortunately, those circles are awfully small nowadays.)

This story requires us to go back to CP/M, the operating system that MS-DOS envisioned itself as a successor to. (Since the 8086 envisioned itself as the successor to the 8080, it was natural that the operating system for the 8086 would view itself as the successor to the primary operating system on the 8080.)

In CP/M, files were stored in "sectors" of 128 bytes each. If your file was 64 byte long, it was stored in a full sector. The kicker was that the operating system tracked the size of the file as the number of sectors. So if your file was not an exact multiple of 128 bytes in size, you needed some way to specify where the "real" end-of-file was.

That's where Ctrl+Z came in.

By convention, the unused bytes at the end of the last sector were padded with Ctrl+Z characters. According to this convention, if you had a program that read from a file, it should stop when it reads a Ctrl+Z, since that meant that it was now reading the padding.

To retain compatibility with CP/M, MS-DOS carried forward the Ctrl+Z convention. That way, when you transferred your files from your old CP/M machine to your new PC, they wouldn't have garbage at the end.

Ctrl+Z hasn't been needed for years; MS-DOS records file sizes in bytes rather than sectors. But the convention lingers in the "COPY" command, for example.

Comments (34)
  1. Anonymous says:

    Ah, ancient software legacies… when you mention the COPY command and Ctrl+Z, do you mean ye olde "COPY CON input.txt" for writing keyboard input to a file?

    This reminds me of something I discovered a couple of years back – did you know that the first dozen or so VB runtime error numbers are directly inherited all the way back from the layout of the error code table in the original Altair BASIC 4K? That held right up to VB6, but that particular wheel was totally reinvented for .NET of course …

  2. Anonymous says:

    If you combine files with "COPY A+B C", the COPY command by default stops at the first Ctrl+Z. You need the /B switch to force copying past Ctrl+Z.

  3. Anonymous says:

    Well I live & learn … :-)

    I love this history stuff! More please!

  4. Anonymous says:

    …but COPY CON is a prime example of why once you’ve introduced a feature you can almost never get rid of it, because someone, somewhere will be relying on it. Removing Ctrl+Z support would really hack me off because I sometimes use COPY CON. (And doubtless there are others out there still relying on it for their own reasons.)

  5. Anonymous says:

    I use COPY regularly both to concatenate text files, and to send files to network printers I’ve mapped to LPT3, LPT4, etc.

  6. Anonymous says:

    Except edlin, is there any other way to create textfiles in the shell then to use ‘copy con’?

    I’d be very happy if they removed both of those options and shipped vi or something similar instead. From the PDC’03 it seemed like Jim Allchin would be happy for that change as well :)

  7. Anonymous says:

    I use COPY CON on a pretty regular basis — I find a snippet of code (ie, perl) that I want to try out or test and since I always have a DOS box open, I can just create a quick file with COPY CON without openning any editors. The annoying thing about Ctrl-Z is that Linux uses Ctrl-D to end a file (cat > myFile) and I usually end up with a Ctrl-D, Ctrl-Z in my files.

    Thanks for the history lesson, Raymond!

  8. Anonymous says:

    Andreas Häber, try the "edit" command. It’s a vaguely pico-ish console text editor that’s been around for years and years. It used to be the built-in editor for QBasic.

  9. Anonymous says:

    I used ‘edit.com’ when I coded QBasic before, and I try to forget it :) Really.. IMO every editor Microsoft ships with Windows sucks bigtime. And that’s sad for the rest of the nice platform.

    Seems like Microsoft stopped working on utilities such as calc, notepad, etc. after Windows 95. Looking at ‘edit’ it’s copyrighted in 1995…

    Reading about the nice new Microsoft Shell(MSH)[1] I’m wondering where users are gonna edit their nice cmdlets.. sure it’s gonna be a bad OOBE ;)

    Ship vi(m) with it instead. From reading [2] there’s already acknowledgment to some other 3rd party code. The Vim license[3] seems very nice and would fit :) [But emacs, nano etc. would be good as well. I just know that Jim Allchin likes vim + the license seems nice]. Or… since Microsoft is working on a new shell, will there also be a nice editor too? :))

    [1] http://weblogs.asp.net/jnadal/archive/2003/10/29/34413.aspx

    [2] http://support.microsoft.com/default.aspx?scid=kb;EN-US;q306819#10

    [3] http://vimdoc.sourceforge.net/htmldoc/uganda.html#copyright

  10. Anonymous says:

    How does COPY determine whether to operate in text mode? Are COPY CON and concatenation copies special-cased?

  11. Anonymous says:

    Andreas, I doubt very much that your average Windows "power user" wants to remember esc+i to insert, exc+yy to copy line or whatever, and so on. The first time they accidentally go into command mode with capslock on, they’ll put the computer through the wall and find a new career.

    I agree that edit.com sucks (among other things, IIRC it always replaces tabs with spaces, but GNU make insists on tabs at the start of lines…), but for Windows users, it’s a much more realistic option than vi.

  12. Anonymous says:

    It’s true that edit.com will not preserve tabs. Also I find that in Windows XP, its display has an annoyingly slow refresh rate.

    Interestingly edit.com can open text files with Unix line endings and then save them with Windows line ending. Notepad can’t do that. I recommend the U2D program (http://gnuwin32.sourceforge.net/packages/cygutils.htm) if you have to do this a lot but edit.com will work in a pinch.

  13. Anonymous says:

    Another alternative to copy con filename:

    echo @echo off> filename

    echo setlocal>> filename

    etc.

  14. Anonymous says:

    But then you are not going to have any ‘>’ or ‘%’ or such in your file.

  15. Anonymous says:

    C:>echo a^>b, 50^% skim milk > x.txt

    C:>type x.txt

    a>b, 50% skim milk

  16. Anonymous says:

    ^ as an escape character?! Dear God… Well, what the heck: Nowadays MS is using curly braces for capture-grouping in regexes. Some days, you know, it takes a lot of will not to start ranting like a Slashbot about Microsoft.

  17. Anonymous says:

    Yeah, but imagine the mess if they’d used as escape.

    cd \program files\microsoft office\templates

    copy \\server\share\flower patterns\happy.dot .

  18. Anonymous says:

    > ^ as an escape character?! Dear God… Well,

    > what the heck: Nowadays MS is using curly

    > braces for capture-grouping in regexes. Some

    > days, you know, it takes a lot of will not

    > to start ranting like a Slashbot about

    > Microsoft.

    Well, you know, it was the early 80s… everyone was doing cocaine… these things slip through. :-)

  19. Anonymous says:

    Base note:

    > it was natural that the operating system for

    > the 8086 would view itself as the successor

    > to the primary operating system on the 8080

    Yes and no. CP/M-86 viewed itself as the successor to CP/M for exactly that reason. MS-DOS 1.0 viewed itself as the successor to CP/M for an entirely different reason.

    MS-DOS boasted of its 8-bit API. MS-DOS gloated that CP/M-86 users would have to rewrite applications to use a 16-bit API, while MS-DOS was the most backwards compatible OS with CP/M.

    3/16/2004 7:52 AM Reuben Harris:

    > when you mention the COPY command and Ctrl+Z,

    > do you mean ye olde "COPY CON input.txt" for

    > writing keyboard input to a file?

    That’s completely different. Typing a Ctrl-Z yields an end-of-file on keyboard input, but the program doesn’t have to write a Ctrl-Z character to a file. For comparison, a Unix text file doesn’t usually contain a Ctrl-D character but Ctrl-D yields EOF on keyboard input. For comparison, a VMS text file doesn’t usually contain a Ctrl-Z but Ctrl-Z yields EOF on keyboard input. For anti-comparison, pressing the Return key usually does result in the application seeing an actual CR (or LF in the case of Unix) character.

  20. Anonymous says:

    Speaking of CR and LF, Raymond, do you think you can comment on why we have these different standards to represent a new line? I believe it’s LF on Unix, CR on Macs, and CR+LF on DOS/Windows.

  21. Anonymous says:

    CRLF is a scheduled topic for the 18th. I hope you can wait until then.

  22. Anonymous says:

    In reference to "Seems like Microsoft stopped working on utilities such as calc, notepad, etc."

    Calc now has multiprecision decimal arithmetic under the covers. Financial types kept complaining about results from IEEE floating point.

    Notepad post-win95 timeframe added: Unicode file support, status line, goto linenumber, faster find, replace text, standard print dialog, non-fixed width fonts, saving files in various Unicode formats and large file support. But it is not trying to compete with Wordpad or Word.

    We attempted to expose code-pages on the save dialog and gave up when we couldn’t do it smoothly without confusing most people.

    In the same context, the various games were continually improved until the day they were pulled from the product.

    Any other utilities you want a history lesson on?

  23. Anonymous says:

    Jim Davis: You’re right, of course, about all the backslashes. Backslash for escape would’ve been madness, and ‘^’ at least has a history of meaning "control key" (e.g. ^M == ‘n’, IIRC)

    Do any of the MS historians around here know why MS used backslash for a path separator and slash for the command line option prefix?

  24. Anonymous says:

    Because slash was already taken for command line options.

  25. Anonymous says:

    IIRC, before DOS had support for directories someone used the / as a command line switch; this meant that / couldn’t be used as a path seperator when directories were introduced or it would have broken a few apps, so they settled on and it’s been confusing me ever since… Another case of MS software deviating from the norm for the purposes of backwards combatability!

  26. Anonymous says:

    Which brings us to the question why the slash was chosen for command line options instead of the hyphen as we know it from other OSes.

  27. Anonymous says:

    AFAIK, DOS uses slash for its command-line switches because CP/M did. And CP/M did because CP/M’s syntax is borrowed from TOPS-10, an operating system for Digital PDP minicomputers. OpenVMS, which would be TOPS-10’s descendent on VAX and Alpha machines, still uses slash for its command-line switches.

  28. Anonymous says:

    At the time DOS was written, CP/M didn’t really have a standard option character. Some programs used $ (eg: STAT FILENAME $RO to make a file read-only) and some used square brackets (eg: PIP CON:=FILE.TXT[NU] to display file with line numbers in uppercase). Later versions went with square brackets plus commas (eg: DIR [EXCLUDE,NOSORT] *.TXT to list files other than *.TXT).

    Microsoft’s development tools for CP/M, on the other hand, all use the slash.

  29. Anonymous says:

    And, to complicate things further, MS-DOS commands until version 5.0 or so had a customizable switch character and there was a command (or was it a config.sys keyword?) called SWITCHAR to allow for such customization.

    4DOS/4NT, an alternative command processor for DOS and Windows NT respectively, allows to use a forward slash ‘/’ in both meanings, as a path separator and as a switch character. If it comes after a space, it is a switch. If it doesn’t, it is a path separator.

    And then there are these command line utilities in Windows NT, like ping, tracert and netstat, that refuse to assimilate with the Windows customs and insist on using dash for switchar, as it was in their native OS (BSD, if I’m not mistaken?)

  30. Anonymous says:

    Chris Walker:

    Sorry for the over-statement of the state of the accessories in Windows, and I’m wrong obviously.

    I guess most standard users wouldn’t need a more fancy text editor then notepad, since they probably only write letters in Word or something similar.

    However, I’d like to have an editor with a bit more features (some options for tabs-to-space, and _especially_ handling big files.). For example editing a big database-dump in notepad isn’t possible (if you don’t have at least the amount of RAM as the size of the file), because it (I assume) loads everything into memory when you load it. When you come to a machine with only Windows installed and have to edit such a beast of a file poor notepad gets the blame).

    Guess I haven’t seen a lot of the features, especially the statusbar, before. I often turn on the ‘Word wrap’ feature to read files without CR in them (it helps, but it doesn’t get perfect though :|). Thanks for letting me know, maybe I end up writing cmdlets in notepad after all now :)

  31. Anonymous says:

    4NT rocks – well worth the $70 (http://www.jpsoft.com/)

    I use & for my command separators (did I copy that from ‘nix or from VB? I can’t remember…)

  32. Anonymous says:

    If you use 4NT, here is my 4start.btm, it’s pretty useful:

    title Command Prompt

    alias ls = dir /4 /k /m /v

    alias cwd = cd

    alias pwd = cd

    alias rm = del

    alias mv = rename

    alias cp = copy

    alias vi = notepad

    prompt `%@exec[@echos %_cwd & color 11 on 0 & echos ^^> & color 7 on 0]`

    cls

  33. Anonymous says:

    I want to send an sms to my cell phone via WIN API with DELPHI, the problem is that to send the file I need to terminate the file with Ctrl Z(Control Z),

    What is the symbol of Ctrl Z.

    Where can I fine it?.

    This symbol i want to output it to terminate the file.

    Please reply to :pmbabela@yahoo.com

Comments are closed.