The x86 architecture is the weirdo: Structured exception handling

If your reference architecture is x86, then you will think that everything it does is normal and the rest of the world is weird. Except it’s the other way around: The x86 architecture is the weirdo.

I was reminded of this when commenter 640k complained, on the subject of what I think is table-based structured exception handling, “It would be interesting to know why this ‘invention’ was introduced in 64-bit Windows when no other version of Windows requires it.” (The original text was “when no other OS requires it”, but I’m assuming that this was in the context of Windows-based OS, since unix doesn’t have structured exception handling in the first place.)

This has a very narrow definition of “no other OS”, because it really means “No other non-x86-based version of Windows.” In this world, the color of the sky is x86.

In fact, x86 is the only architecture for which Windows uses stack-based exception chaining. All other architectures use table-based exception unwinding. The prologue and epilogue of each function must follow a particular format so that the actions performed therein can be unwound during exception handling. At the very introduction of Win32, it was only the x86 which used stack-based unwinding. The Alpha AXP, MIPS, and PowerPC all used used table-based exception unwinding. And as new architectures were added by Windows, they all used table-based exception unwinding as well. Itanium? Table-based. Alpha AXP 64-bit? Table-based. ARM? Table-based.

The use of table-based exception handling was not “introduced” with x64. It was introduced back in 1992, and has in fact been the exception unwinding mechanism for all architectures.

Well, almost all. Not the x86, because the x86 is the weirdo.

Comments (22)
  1. I'm not smart enough to know the pros and cons of each; but it would seem that by virtue of greater and more recent implementations, the table-based method is better.  Would you agree?

    [Table-based dispatch reduces the cost of SEH to zero when no exceptions are raised, since you don't have do anything to erect a frame or tear it down. -Raymond]
  2. SimonRev says:

    I feel like you left us hanging — WHY does x86 use stack based exceptions instead of table based ones?  I briefly looked over the links in the article, but nothing obvious stood out.

    [I don't know either. Would have been less confusing if it used tables, too. -Raymond]
  3. David Walker says:

    Sorry to be pedantic, but I started with mainframe operating systems.  

    Your linked article from 2004 mentions "modern architectures", says that x86 is weird in using variable-length instructions, and says "the others use fixed-size instructions".  That's looking at the world through microprocessor-colored glasses!  :-)

    IBM's mainframe operating system, the classic being System/360, morphed into 32-bit System/370 and System/390.  I claim that the current version, called Z/Architecture, is a 64-bit "modern architecture".  These machine languages have always had 2, 4, and 6-byte instructions.

  4. Joshua says:

    The x86 may be the weirdo here, but it's the only one that got it right. The table based approach cannot encode all cases, including "you cannot unwind past this function".

    [You encode that by saying "This frame has a handler" and have the handler return "stop". -Raymond]
  5. dave says:

    @David Walker:

    The instruction set is broadly compatible with S/360, is it not?  Therefore I don't think it counts as a "modern architecture".

    Me, I'm quite font of the VAX (the CISCiest, with instruction length from 1 to over a dozen bytes), but I wouldn't be calling it "modern" any more.

    Fixed-length instructions do seem to be the way to build fast ISAs nowadays.

  6. szmodz says:


    "The prologue and epilogue of each function must follow a particular format so that the actions performed therein can be unwound during exception handling."

    That's probably your reason. Windows/DOS didn't have SEH right from the start, and the table-based approach isn't compatible with SEH-unaware binaries (there is no way to unwind the frame of a SEH-unaware function). This becomes a problem when linking with libraries. Stack-based chaining, on the other hand, doesn't require all functions in the call stack to be aware of exception handling for unwinding to work.

    For other platforms, there were no existing binaries, so tables could be used.

  7. Henning Makholm says:

    To supplement @szmodz: I think this dates back to the historical origin of the x86 platform, where the OS basically left you alone unless you explicitly asked it to do anything. The PC platform was also characterized by a remarkable hands-off approach to software from IBM's side — in particular there wasn't a default compiler suite provided by the hardware vendor that everyone would use, even as an ABI reference. So the platform had an unprecedented diversity of different compilers on the market. Back then it wasn't widely appreciated that things such as procedure prolog/epilog and stack frame layout really ought to be standardized between compilers; this only came after the experience with the free-for-all compiler market for the IBM PC showed that there was a real problem to solve.

    Add to that the fact that the 8086 is a register-poor architecture (either in absolute terms, or compared to its contemporaries such as the 68000), which makes it particularly desirable for compiler writers to squeeze out extra performance by tweaking the calling conventions — and we end in a situation where the x86 platform is historically obliged to support a wide variety of disparate calling conventions, whereas a self-respecting ISA designer these days will make sure to lay down the law about such matters even before his architecture exists in silicon.

  8. Myria says:

    The only reason that this is true is because x86-32 was the first architecture with the feature, made by Borland, and the failings of the original design prompted the redo for other architectures.  However, this might also be because the existence of too many calling conventions made a consistent table-based format difficult.

    That said, I really wish more programmers would understand the table model.  There have been many times where I've had to fix someone's x86-64 assembly code because they don't realize that you have to follow a very strict calling convention and function layout, and that you have to create these function tables.  Otherwise, when that code crashes, and it will, your process will disappear rather than give a crash log.

    I've also had to deal with graphics drivers from a company starting with an N that do runtime x86-64 code generation in processes they don't own without calling RtlAddFunctionTable, RtlInstallFunctionTableCallback, or AddVectoredExceptionHandler, the three legal ways to support runtime code generation on table-based architectures.  It was fun figuring out why we couldn't get crash logs when the user-mode graphics driver component crashed our program.

  9. Myria says:

    [Table-based dispatch reduces the cost of SEH to zero when no exceptions are raised, since you don't have do anything to erect a frame or tear it down. -Raymond]

    This is not completely true.  Clearly, it has a memory and disk space cost.  However, there are subtle runtime costs to it, but they are far less than the x86-32 model.  For example, the strict calling convention can impede certain types of optimization.  Also, runtime code generation is more complex and slower because of the need to lay out structures and functions correctly.  RtlInstallFunctionTableCallback can alleviate this somewhat by delaying function table generation until exception time, but not it's not reduced completely.

    Overall, I agree that the table-based model is superior, since memory, particularly on a 64-bit architecture, is cheap while CPUs aren't getting much faster per thread.

    [True, there are secondary costs (like lost optimizations). The memory cost is paid only when an exception is raised (because otherwise the tables are never paged in). -Raymond]
  10. Joshua says:

    For example, the strict calling convention can impede certain types of optimization

    I've had more trouble reading the documentation than I previously thought, but I can't imagine what the tables look like for guaranteed tail call (there is no epilog).

    Custom calling convention is right out, no matter how much it would benefit as the table can't encode it. (Custom calling conventions that beat the stock one are always selective register calling conventions).

  11. Matt says:

    @Raymond: "The memory cost is paid only when an exception is raised (because otherwise the tables are never paged in)"

    But you still take a virtual memory reserve hit, even if you're not taking a commit hit, because you need to have reserved somewhere in the address space that you can page the tables into. Granted this isn't an issue for IA64 or x64, but it is an issue on ARM for instance, where 2GB isn't actually all that much space.

    [True, but it's the same restriction that existed on all the non-x86 32-bit platforms. -Raymond]
  12. 640k says:

    Thanks for the elaboration.

    x86-64 could have used table based exceptions. There's nothing in the x86 hardware that's requiring stack based SEC at all. The "weirdo" is the software, not the hardware/ISA.

  13. voo says:

    @David Walker: No idea about S/360 (too young ;)), but up to 6 bytes for 1 instruction doesn't sound too bad compared to x86. I remember implementing a x86 C compiler and the Intel manuals were umn.. "fun". The longest possible instruction in x86 is 15 bytes long, which I'm pretty sure has to win some horrible, horrible price. That said I also fondly remember writing an assembler for MIPS and god how much nicer that architecture is in general (you can actually implement a simple pipelined MIPS CPU for a FPGA without going completely crazy as a student!)

  14. JustSomeGuy says:

    @David Walker: the hardware is zArchitecture, the (primary) OS is z/OS. It's not the only OS that runs on zArch, there's also zVM and native Linux. @dave, I'm not surprised the VAX used a lot of bytes. When you introduce an instruction to do polynomial expansion at the low level, that's probably going too far :-)

  15. Neil says:

    I actually found a web page claiming that 15 bytes was the longest 8086 instruction, which is clearly nonsense. The best I could come up with was 2 (prefixes) + 1 (opcode) + 1 (mode) + 2 (displacement) + 2 (data) = 8 bytes (e.g. lock add es:[di+1234], 5678).

    As for 80386 instructions, you need to be running 32-bit code in a 16-bit code segment so that you can throw in address and operand size prefixes, plus segment prefix and a lock prefix for good luck. Total: 4 (prefixes) + 1 (opcode) + 1 (mode) + 1 (scale) + 4 (displacement) + 4 (data). So yes, technically 15 bytes.

  16. saveddijon says:


    15 bytes is indeed the limit. Although you can come up with sequences that are longer, the hardware vendors impose the 15-byte limit to constrain their gate-level/RTL implementations.

    BTW, you can double-up prefixes. Intel docs show how you can make NOP instructions encoded in anywhere from 1-15 bytes by prefixing NOP or other do-nothing instructions with as many prefixes as you want (keeping within 15 bytes).

  17. David Walker says:

    @Dave: I was reading "modern architecture" to mean one that is currently in widespread use.  Z/Architecture certainly has a lot of new instructions that 360 and 370 assembler code could not use.  But I get your point!

  18. Killer{R} says:

    /*Table-based dispatch reduces the cost of SEH to zero when no exceptions are raised, since you don't have do anything to erect a frame or tear it down. -Raymond*/

    But it increases cost of raising exception, cause instead of walking linked list its neccesarry now to lookup tables, disasssemble and simulate code – not trivial and fast things.. Well, MS addressed this by skipping kernel-mode transition when exception raised in x64 process that is not debugged.. Interesting why x86 followed old always-call-NtRaiseException way.. To keep performance 'balanced' or some (theoretical) compatibility problems?

    [Could you tone down the bile a bit? Yes, the cost of raising an exception increases, but if your program spends all its time raising exceptions, I would argue that the design flaw is not with the exception dispatcher performance. -Raymond]
  19. Anonymous Coward says:

    @When no other OS requires it: On Linux you get to choose between sjlj (setjump-longjump, which saves all registers for every frame) and Dwarf2 (which uses tables).

    Now, I know that Dwarf2 is more efficient when you have to deal with exceptions, but most code (that I've written) neither throws nor consumes exceptions and the Dwarf2 tables can get really, really big (if exceptions must be allowed to pass through).

    Of course, most of my Linux work was on libraries and libraries should neither throw exceptions nor call code that may. There are too many differences between how different languages deal with them; if your API contains exceptions, you're doing it wrong unless your lib is monolingual.

  20. Joseph Koss says:

    re: 15 byte instructions…

    The thing to keep in mind is that its OK to have long instructions if its due to their length adding value. One of the reasons for the long instructions is the powerful x86 addressing modes. The alternative is using more than one instruction to accomplish the same thing, and if the CISC vs RISC wars have taught us anything its that simpler instructions have both pro's and con's.

    The current performance winner of these wars is clearly x86-64, indicating that CISC needing more complex decode logic to achieve performance is better than RISC needing more decode bandwidth feeding the decode logic to achieve performance. When performance isn't a consideration, but instead something like operations per watt is, then ARM's latest RISC offerings are clearly winning.

  21. Joshua says:

    @Joseph Koss: At least Linux coding makes it easy to avoid ever throwing exceptions.

  22. Bob says:

    @JustSomeGuy: Remember the VAX instruction set was designed in collaboration with the VMS engineers. The idea was to make the OS more efficient by having frequently used OS instruction sequences implemented as single "machine" instructions. For example, OSes spend a lot of time inserting and removing things from queues. The VAX instruction set included insert queue and remove queue instructions, thus making them atomic and non-interruptable, and eliminating the need for the VMS engineers to write the code to do so. Now why, polynomial expansion, was included in the VAX instruction set, I'm clueless.

Comments are closed.