How did protected-mode 16-bit Windows fix up jumps to functions that got discarded?

Commenter Neil presumes that Windows 286 and later simply fixed up the movable entry table with jmp selector:offset instructions once and for all.

It could have, but it went one step further.

Recall that the point of the movable entry table is to provide a fixed location that always refers to a specific function, no matter where that function happens to be. This was necessary because real mode has no memory manager.

But protected mode does have a memory manager. Why not let the memory manager do the work? That is, after all, its job.

In protected-mode 16-bit Windows, the movable entry table was ignored. When one piece of code needed to reference another piece of code, it simply jumped to or called it by its selector:offset.

    push    ax
    call    0987:6543

(Exercise: Why didn't I use call 1234:5678 as the sample address?)

The selector was patched directly into the code as part of fixups. (We saw this several years ago in another context.)

When a segment is relocated in memory, there is no stack walking to patch up return addresses to point to thunks, and no editing of the movable entry points to point to the new location. All that happens is that the base address in the descriptor table entry for the selector is updated to point to the new linear address of the segment. And when a segment is discarded, the descriptor table entry is marked not present, so that any future reference to it will raise a selector not present exception, which the kernel handles by reloading the selector.

Things are a lot easier when you have a memory manager around. A lot of the head-exploding engineering in real-mode windows was in all the work of simulating a memory manager on a CPU that didn't have one!

Comments (16)
  1. Peter Buffalo says:

    If I recall correctly, segment 1234 would have been a ring 0 segment, not a user segment.

  2. Kevin says:

    >Exercise: Why didn't I use call 1234:5678 as the sample address?


  3. Yuri Khan says:

    @Kevin: 5678 is in fact a stricter-aligned offset than 6543.

    But in those times, nobody cared much about alignment, except maybe for stack pointer.

  4. Kevin says:

    @Yuri: Yes, that's the joke.

  5. Yuhong Bao says:

    @Yuri Khan: Even now, no one cares about alignment of x86 instructions for obvious reasons.

  6. poizan42 says:

    @Yuhong Bao: memory addresses used in SSE must be 16-byte aligned. So every function using SSE must have extra instructions in the preamble to ensure that the stack is aligned if it's not known. That's the reason for the alignment requirement of 16-byte in both 64-bit calling conventions.

    Btw. does anyone know the reason why Microsoft choose not to use the System V AMD64 calling convention?

  7. Cesar says:

    @Yuhong Bao: gcc does. See -falign-functions, -falign-jumps, -falign-loops, and -falign-labels, all enabled by default on -O2 and above.

  8. GWO says:

    @Cesar:"gcc does. See -falign-functions, -falign-jumps, -falign-loops, and -falign-labels, all enabled by default on -O2 and above."

    Too right it does.  About 5 years ago I spent literally days trying to debug unpredictable crashes from a DLL function loaded by VB6, where the DLL was compiled with MinGW-GCC.  Turned out the compiler was assuming a certain stack alignment for arguments that would be processed by SSE instructions, and VB6 was using a different calling convention/assumption.  Worst Heisenbug I've ever tracked down — -falign-functions was the solution.

  9. Joshua says:

    Also, note that this was not true of 16 bit CPUs because the L1 instruction cache was too small for this to be meaningful (it can be thought of as caching the next instruction only).

  10. JamesJohnston says:

    "Even now, no one cares about alignment of x86 instructions for obvious reasons."

    Indeed; outside of esoteric uses like SSE aligned instructions, you'll never encounter a fault.

    Accessing unaligned variables in C/C++ is undefined; e.g.

      char myIntBytes[4];

      int* myInt = (int*)myIntBytes;

      *myInt = 5;

    But on x86 this will silently work but with a likely performance penalty, depending on architecture…  On older ARM processors, it won't work at all.

  11. poizan42 says:

    "On older ARM processors, it won't work at all."

    Though some OSes will emulate unaligned accesses when it gets the fault – but that comes with a *massive* performance penalty.

  12. Cesar says:

    @poizan42, @JamesJohnston, @GWO: you are talking about alignment of *data*. What I and @Yuhong Bao are talking about is alignment of *code*.

    On CPUs with power-of-two fixed-length instructions like ARM, instructions usually must be aligned. On CPUs with variable-length instructions like x86, instructions usually do not need to be aligned, even when they're longer than one byte. However, there can be a performance penalty when a *jump target* (or call target, which is almost the same thing) is not well aligned. Suppose, for instance, that your jump is mispredicted, lands in the last byte of the last cacheline of a page, and the instruction at that address is five bytes long. The instruction fetcher has to do extra work, and can't hide the latency as well as it could if you had jumped to the beginning of a cache line. Thus gcc (and probably other compilers) can pad the code with a few NOPs so a jump target has a better alignment.

  13. Mark says:

    > (Exercise: Why didn't I use call 1234:5678 as the sample address?)

    Because the 286, which only had a 24-bit address bus, would wrap it to 0x234:5678. I'm guessing the 286 kernel didn't map segments that high.

    An amusing aside given the recent discussion about kernel boundaries – in real mode on a 286, addresses above 0xf000:ffff referred to memory beginning at 1MB, while on previous processors they referred to 0 (giving a shortcut to the IDT). I wonder if there were any "no man's land" rules due to this?

  14. Mark says:

    Oh, I've just found on wikipedia that this is why A20 line was pulled low by default in real mode. Figures.

  15. Mark says:

    Oh, disregard my first comment altogether. 0x1234 is already a selector on 286 protected mode, and therefore ring 0 as Peter Buffalo said. 0x987 on the other hand is ring 3, which I think would imply Windows 3.1?…/25.TXT

    Both are in user memory.

    [The 80286 also had rings (known as "privilege levels"). -Raymond]
  16. voo says:

    @JamesJohnston no that's wrong, c/c++ has no problem with unsigned accesses (out of its scope). The problem you have there is that you violate strict aliasing.

Comments are closed.

Skip to main content