Why does Internet Explorer put tab stops at 8-character intervals instead of 4, like all right-thinking people?


When you embed a TAB character (U+0009) in a <PRE> block (or more precisely, an element whose white-space CSS property is computed to be pre or pre-wrap), Internet Explorer will move the current position to the next multiple of eight characters. Many people prefer four. (Some insist that only four is the correct value and anybody who disagrees with them is simply wrong.)

Why eight?

Because that's what the standard says.

All tabs (U+0009) are rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. Tab stops occur at points that are multiples of 8 times the width of a space (U+0020) rendered in the block's font from the block's starting content edge.

Note that the standard does not provide an extensibility point to customize the position of tab stops. The number eight is hard-coded into the standard. If you don't like that, then don't use tabs. (There appears to be a draft proposal to add a tab-size property to control this, but nothing standard yet, at least not at the time I originally wrote this article.)

Okay, but why did the standard pick eight as the tab stop interval? I don't know (never having attended any CSS standardization meetings), but I suspect they were observing existing practice. For reasons unknown even to Wikipedia, teletypewriter tab stops were historically placed at eight-column intervals.

Comments (50)
  1. Damien says:

    > Some insist that only four is the correct value and anybody who disagrees with them is simply wrong.

    But of course, they're in error themselves, and two is the only true value :-)

  2. > Some insist that only four is the correct value and anybody who disagrees with them is simply wrong.

    Four is good for source code and markup, but I find ten (yes, ten!) best when writing SQL. Then again, I always turn on the IDE's 'replace tabs with spaces' feature.

  3. AB says:

    Tab size 4 is obviously too small if you are using the TAB character for its original purpose (i.e. to arrange some numbers into columns).

    Tab size 8 is good if you are using a TTY with a common line size (e.g. 72, 80, 64, or 40 characters per line) – the tab stops split the line evenly.

  4. 12BitSlab says:

    Back in the olden days (Oh! No!  There he goes again!), unit record equipment (i.e., keypunch machines) had a tab stop so the operator could move quickly across a card.  Each tab stop came at intervals of 8.  I heard, but can't verify that the reason was is that there were 80 data columns on a card and tab stops of 8 worked well since it allowed the operator to move ahead by 1/10th of a card with a single keypress.

  5. anyfoo says:

    Related: does anybody know why 80 seems to be the magic number for terminal widths?

  6. dnew says:

    Punched cards were 80 holes wide, anyfoo.

  7. Rick C says:

    Anyfoo, this just pushes the question back one layer, but I bet it's because punch cards were most commonly 80 columns.  Wikipedia also suggests this: "A legacy of the 80 column punched card format is that a display of 80 characters per row was a common choice in the design of character-based terminals."

  8. Scott H. says:

    anyfoo: Allegedly, it'd due to IBM punch cards having 80 columns. The punch card size was supposedly based on the size of punch cards used in the 1890 census, which was based on currency size in 1890 so they could reuse the currency holders to hold punch cards. Why was currency that size? Who knows. I have no backup for any of that, but it makes for a good story.

  9. Paul R says:

    Just a guess, but a US standard 8.5 inch wide sheet of paper in with standard typewriter Pica (10 cpi) can do 85 characters per line with no margin.

    Leaving a (very small) quarter inch for each margin allows up to 80 characters per typewritten line.

    Supporting 80 characters on a punch-card (and later teletype and video terminals) could support the longest reasonable typewritten line.

  10. Maurits says:

    Looks like the 80 character IBM punch card comes from 1928, "almost double the previous capacity".

    www-03.ibm.com/…/year_1928.html

  11. >The punch card size was supposedly based on the size of punch cards used in the 1890 census, which was based on currency size in 1890 so they could reuse the currency holders to hold punch cards. Why was currency that size? Who knows. I have no backup for any of that, but it makes for a good story.

    I guess the dependency chain goes back to the width of the rear end of Roman horses.

  12. Mark VY says:

    @alegr1: I wish I could find a copy of that story!  I read it once, and now can't find a copy.

  13. Alexey says:

    The standard itself was, no doubt, typed up in either emacs or Vim.

  14. 12BitSlab says:

    One should note that there are NOT 80 columns on an 80 column punch card.  There are 80 DATA Columns and 1 column for a verify punch thus giving a total of 81 columns.

  15. Yuri says:

    @Mats Svensson

    You mean the iPhone has an app for the real number?

  16. chentiangemalc says:

    I suspect 8 was chosen because tabs were for originally designed for indenting paragraphs, not indenting code

  17. Myria says:

    I always love it when people whine about something Microsoft does in one of its programs, and it turns out it's because Microsoft is following a standard.

    I think 4 is better, but this is HTML and W3C rules the world on this matter.

  18. Random User 109374 says:

    @chentiangemalc

    Or, at least as much, tabs were originally designed to align TABle columns. (As AB observed further up.)

  19. Joshua says:

    Because if you make it customizable, people will customize it weirdly, breaking the intention of <PRE> as "I just embedded a pre-formatted document". It just so happens in the long end of things that source code is the last thing using tabs and we use it for block indentation and we have a protocol that can be used to make it a personal preference.

  20. Mats Svensson says:

    You're all wrong.

    The real number is actually:

    *fart*

  21. Wilson says:

    > I guess the dependency chain goes back to the width of the rear end of Roman horses.

    Now I'm highly intrigued.

  22. Wilson says:

    Oh, here we go.

    The Space Shuttle and the Horse's Rear End

    http://www.astrodigital.org/…/stshorse.html

  23. Gabe says:

    The tabulation feature goes back over a century to manual typewriters. On those machines you would set stops on a rack for the various table columns, and pressing the Tabulate key would move the carriage to the next stop.

    Their use for paragraph indentation was secondary, or they would have called the key 'Indent'.

    Typewriters had adjustable stops, the electronic printers that followed them had adjustable stops, and ISO 6429 even defines a way for terminals to programmatically set tab stops (see HTS in en.wikipedia.org/…/ISO_6429), so it's not clear where the 8-space default came from.

    The usual font size was 10cpi, so 5 spaces was a half-inch (and the standard for paragraph indentation). So why wasn't the de facto standard 5 spaces then? The closest I've found to an explanation is that 8 is a power of 2 and easier to implement in the transistor-based devices of the 70s.

  24. Georg Rottensteiner says:

    Strictly from the code view, as we all agree, tab characters are actually devil spawn and should be replaced by two spaces each.

    inbeforetheflamewarkthxbye

  25. ThomasX says:

    I wish Microsoft would always follow standards to the letter.

  26. Wizarth says:

    At my workplace, half the developers wanted 4 character tabs and half of them wanted 2. Management compromised and now our coding practice doc says we have to use 3 character tabs. *sigh*

  27. Neil says:

    Unix actually has a program called (unsurprisingly) tabs which (by default) resets the programmable tab stops to intervals of 8 characters.

  28. SMW says:

    @Wilson: Snopes disagrees with the horse's ass legend: http://www.snopes.com/…/gauge.asp

  29. Mr Cranky says:

    @12BitSlab: I used 80-column cards for several years when learning FORTRAN, assembler, and PL/I back in the 70s… there are exactly 80 columns (and 12 rows) on the "Hollerith" card used in that era on IBM mainframes.  I actually had to start with an ancient, art-deco 026 keypunch that punched BCD (from which EBCDIC was Extended).  One had to learn several over-punches to get several characters punched correctly.

  30. Ken Hagan says:

    To even talk about "how many characters" a tab is equivalent to is to put yourself in a state of sin. To insist that the number be an integer is an abomination.

    A tab takes you to the next column, pure and simple. How many characters you can fit into a column depends on the font and (for variable pitch fonts) your characters. You should believe with all your soul that your esteemed colleague at the next desk uses a different font and therefore there simply is no answer to the question "how many characters".

  31. Joker_vD says:

    On the related topic, will we see a post on "Why does Microsoft Windows represent line breaks as CR LF instead of LF, like all right-thinking people?" some day?

  32. @Joker_vD – probably a legacy of typewriters, which needed both a physical carriage return and line feed. Viewed that way, Windows is correct, everyone else is wrong :)

  33. Rick C says:

    @Ken Hagan, your comment doesn't take into account history.  When this stuff came about, everyone used monospaced fonts, so you absolutely could specify how many characters could fit in a tab stop.

  34. laonianren says:

    Wizarth wrote "Management compromised and now our coding practice doc says we have to use 3 character tabs".

    Well, duh.  If you get management involved in a dispute between developers you'll get a stupid resolution, even if they have to invent one specially.  They do this deliberately to encourage you to settle your own arguments.

    P.S. If you create a skeleton win32 app in Visual C++ it uses a mix of three and four character indentation, represented with a mix of tabs and spaces.

  35. 12BitSlab says:

    @ Mr. Cranky — a keypunch machine could be in 1 of two modes — punch or verify.  In "punch" mode, the first 80 columns were open to be punched and the 81st column oculd not be punched.  In "verify" mode, a card that was already punched was punched again, except that no punches were recorded in columns 1-80.  The machine "verified" that the two efforts were the same.  After punching the 80th column, if all punches matched, the machine automatically put a punch in column 81 indicating that the data had been verified.  Programmers – as we used to be called — almost never used verify mode.  That was the realm of data entry operators.

  36. Rick C says:

    @laonianren, that's what ^A^K^F is for.

  37. Maurits says:

    @RaceProUK this still leaves open the question of whether CR LF or LF CR are equivalent, and if not, which should be preferred.

  38. ender says:

    > Management compromised and now our coding practice doc says we have to use 3 character tabs. *sigh*

    Typical compromise – choose an option that neither side likes.

  39. @RaceProUK: Of course DOS/Windows is correct. Think about it everyone, <LF> should do just that, feed the line and move the cursor down one vertical row, staying in the same horizontal column (this is how old terminals worked). <CR> should move you back to the beginning of the line. Both together should bring you to the beginning of the next line.

    @Maurits: Shouldn't matter, but the convention is <CR><LF>

    Off topic, but since I mentioned old terminals, back in my college days I used to create a whole file consisting of nothing but <CTRL>-G characters and dump that to the line printer in the back of the computer lab. It was fun to watch the lab monitor run up to the back of the room to try to find out why the printer was beeping uncontrollably.

  40. Rodney says:

    4 is a multiple of 8: 0.5 * 8 = 4

    The standard does not say __integer__ multiple…

  41. Maurits says:

    I wonder if the convention is <CR><LF> because the physical operation of moving the carriage all the way to the left takes slightly longer than the physical operation of rotating the paper drum by one line. Perhaps CR was implemented asynchronously in some contexts.

  42. @Maurits,

    One obvious example: barring adequate buffering and/or flow control, carriage return and other carriage control operations are asynchronous with respect to a teletype's input. See, e.g., the UNIX 6th edition manual,

       man.cat-v.org/…/stty

    where delays for CR, LF, FF, HT, VT, and BS tailored to the performance characteristics of various terminal devices are described. Amusingly, vestigial support for these "delay types" persists in modern UNIX specs, see, e.g.,

       pubs.opengroup.org/…/termios.h.html

  43. Cheong says:

    8-space-tabs are good for tabulating numbers in tables, but too wide for indent purpose.

  44. John Doe says:

    @Maurits, it's mechanical. The lever in typewriters requires less force to roll the platen vertically than to push the platen. This way, you could make "ladder" phrases. This was not very common, but it was even less common to just push the platen without changing line.

    Hence, you're right about the effort, but that is intentional.

    However, CR LF is actually in the reverse order, perhaps in an attempt to prevent the last line feed to eject a new (albeit empty) sheet of paper in the early printers. With LF CR, you'd have an extra hardware register, adding complexity to detect that the characters were followed. Even in more advanced hardware, the software would become a bit more complicated just because.

  45. John Doe says:

    @RaceProUK, Mac is right (LF CR), Windows so-so (CR LF, but the same effect), Unix thought about saving one byte per line (LF only).

    But in fact, for pure computer text, it's not common to have a pure "line feed"-only and it doesn't make sense anymore to have two characters. Yet, we have to live with ancient artifacts in file streams being binary vs text mainly because of this, the other reason being character encoding. But encoding came after, and only the recent languages/libraries had them in mind since inception.

  46. Random User 1049572 says:

    @John Doe

    Strange. Maybe it was just the specific set of applications I was exposed to, but I thought Mac was just CR. At least, "classic" Mac.

  47. Maurits says:

    Mac was just CR up through OS 9. When they switched to a Unix-like OS (OS X) they adopted the UNIX convention of just LF.

    en.wikipedia.org/…/Newline

  48. Medinoc says:

    > Management compromised and now our coding practice doc says we have to use 3 character tabs. *sigh*

    At least it discourages that horrendous "braces at half tab" indentation.

  49. John Doe says:

    You're both right, my bad.

Comments are closed.