How did real-mode Windows implement its LRU algorithm without hardware assistance?


I noted some time ago that real-mode Windows had to do all its memory management without any hardware assistance. And yet, along the way, they managed to implement an LRU-based discard algorithm. Gabe is really interested in how that was done.

As we saw a few months ago, inter-segment calls were redirected through a little stub which either jumped directly to the target (if it was in memory) or loaded the target (possibly discarding other memory to make room) before jumping to it. And we saw that the executable format had INT 3Fh instructions baked into it so that the Entry Table could be loaded directly into memory for execution.

As it happens, Windows didn't take advantage of that feature, because it wanted to do more.

When it came time to load the Entry Table, the loader did a little rewriting, converting each

    db  flags
    INT 3Fh
    db  entry_segment
    dw  entry_offset

sequence into

    db  flags
    sar byte ptr cs:[xxx], 1
    INT 3Fh
    db  entry_segment
    dw  entry_offset

where the xxx refers to a table of bytes in the Entry Table preallocated for this purpose, initialized to 1's.

What is "this purpose"?

Whenever anybody needed the address of an inter-segment function, instead of return the address of the int 3Fh, the kernel returned the address of the sar instruction. The sar instruction stands for shift arithmetic right, For a byte value, this means to shift the bits right one place, but keep the high-order bit the same.

a b c d e f g h
a a b c d e f g

Okay, so what was the effect of sticking this little sar instruction at the start of every inter-segment call? Since the values in the table were initialized to 1, a right arithmetic shift changed the 1 to 0. Therefore, each time an inter-segment call was performed, the corresponding byte in the table was set to zero.

Hooray, a software-implemented Accessed bit!

Every 250 milliseconds, Windows scanned and reset the Access bits, using the data to maintain an LRU-list of all the segments in the system. That way, when it was time to discard some memory, it could discard the least recently used ones first.

Today, a timer that runs continuously at 250ms would incur the wrath of the power management team. But back in the days of real-mode Windows, there was no power management. Like Chuck Norris, PCs ran at only one power level: Awesome.

I continue to be amazed at how much Windows 1.0 accomplished with so little.

[Raymond is currently away; this message was pre-recorded.]

Comments (29)
  1. Antonio Rodríguez says:

    The original Macintosh System (ten years before it was called Mac OS) also did all memory management in software without an MMU (but IIRC, it got a little help from the 68000, which was more advanced than the 8086). Anyway, I agree it's awesome what the OSes of that time got to do with so little memory (384 Kb of for Windows 1.0, 128 Kb of RAM and 64 Kb of ROM for Macintosh System 1.0). Keep that in mind the next time a moron tells you need 8 GB of RAM to write a letter in Word! :-)

  2. Adam Rosenfield says:

    When you said "initialized to 1's", I assumed you meant that each entry was all 1's (0xFF), for which a SAR is a no-op.  Took me a bit to figure out you really meant each byte is initialized to the number 1.  Neat trick, though.

  3. David Walker says:

    @Adam Rosenfeld:  I had the same confusion.  Bytes that are "initialized to 1's" sound like they are full of FFs, especially when you are talking about bit manipulation in the code.  But I get it now.

  4. Martin Wilson says:

    I too fell pray to the same confusion about the byte being "initialized to 1's". Very neat trick though.

  5. Brian_EE says:

    @ Adam, David, Martin: Not to be irritating to Raymond, but I'll be the first to point out that "apostrophe s" does not make the 1 plural. A better wording would have been "… each initialized to the value one."

    [Back in the old days, the rule for pluralizing numbers written as digits was to use an apostrophe-s. Nowadays, the guidance is to use a plain s, but old habits are hard to break. -Raymond]
  6. Nick says:

    "Like Chuck Norris, PCs ran at only one power level: Awesome."

    That's also the level that Raymond writes at.

  7. prine says:

    Why use SAR though? Something to do with that instruction being faster / taking less code bytes than say, MOV byte ptr [xxx], 0 ?

  8. Brian says:

    Was the format of the executable frozen before the sar mechanism was implemented?

    [I thought I answered this in the linked article. -Raymond]
  9. Mark says:

    It also means you could do something more sophisticated by initialising to a larger number, or by playing with the high bit to mean "always resident" or track different history.  Sure, an accessed bit is all you need, but would be interesting to know if anyone did more than that.

  10. Mark says:

    prine: yes, one byte less.

  11. Adam Rosenfield says:

    @Brian: I'm not sure how that's relevant here.  If "1's" here were possessive instead of plural, it wouldn't make any sense (initialized to the what belonging to 1?).

  12. Brian says:

    [I thought I answered this in the linked article. -Raymond]

    Afraid I'm not seeing it. I see INT 3F is in the file so it can be copied into memory and used directly. I see a statement that says the details are different in unimportant ways (which I can see SAR being one). I don't see anything addressing why INT 3F is the file but SAR (with appropriate padding) is not. If you know you will need both, it doesn't make sense to me to put in one and not the other.

    I'm speculating that INT 3F was added to the file format for the stated reason and then tools and executables were built. Later, it was decided to do LRU and a SAR instruction was required. Rather than change the format and cause a ripple effect on tools and existing executables, it was decided to just rewrite the entry as it was loaded.

    [The MS-DOS overlay linker invented the INT 3F format. -Raymond]
  13. mikeb says:

    To those who are worried about the apostrophes, here's what the Chicago Manual of Style FAQ has to say about it (http://www.chicagomanualofstyle.org/…/Plurals20.html):


    Chicago style omits the apostrophe, but the thing about style is, there is no single great arbiter who makes rules that everyone follows. Different houses use different styles. Following a particular style allows a person to be consistent within a given document, but it really doesn’t matter which style you choose.


    So everyone's right! Participation medals all around!

  14. Skyborne says:

    FWIW, my first parse was also "… a table of bytes [each] initialized to 1s" but the subsequent paragraphs cleared the confusion.  sar on a value of FF is a NOP, so each byte must be initialized to a value of 1.

    Considering the scope of Windows 1.0, any benefit from using a value other than 1 would probably not be worth the time and OS code to do it.

  15. David Walker says:

    Didn't mean to start a discussion about grammar…  The article is interesting, once I figured out the technique.  Thanks, Raymond.

  16. chentiangemalc says:

    Well they ran at awesome until you hit the TURBO button, then you got extra awesome!

  17. Mark says:

    4 times per second incurs wrath?  How often is safe?  Once a minute is surely ok, right?

  18. Nick says:

    @Mark:

    I'm pretty sure that the goal is to remove as much timed polling as possible so that the CPU can remain in a lower power state.  But if you do need to use a timer to poll something, I think the newer coalescing timers are the preferred method (SetWaitableTimer(Ex)).

  19. Doug says:

    @Mark

    The only way to make a battery last a long time is to turn off as many chips as possible. The OS tries very hard to keep all components in the system off as often as possible. Waking up a sleeping CPU isn't too bad — just a few microseconds. But waking up a stopped clock takes a few hundred microseconds, and waking up a sleeping motherboard takes a few milliseconds. So each time the system has to transition from sleeping to waking takes several milliseconds, during which time all of the chips on the motherboard are using the battery. If you take several milliseconds to wake up, execute 10 instructions, and then go back to sleep, you used 20 milliseconds of power to do 10 nanoseconds of work. In addition, certain deeper sleep states take even longer to wake up, so the OS won't go into those sleep states unless the system has been completely idle for say 300 milliseconds.

    Bottom line is that something that wakes up 4 times a second makes the motherboard use 80 milliseconds of full power per second, and also completely prevents it from going into the deepest sleep, so it continually uses say 10% of full power always, for a total of 18% power usage when it should have been something like 1% power usage.

  20. configurator says:

    This might be a silly question, but couldn't the SAR instruction be done by Windows inside the INT 3Fh call?

    [That would defeat the direct jump optimization. When the segment is in memory, there is no INT 3F, which is exactly the time you want to track usage! (When the segment is not in memory, the usage is clearly zero.) -Raymond]
  21. configurator says:

    Of course. I forgot the INT 3Fh was actually being removed.

  22. @Adam, David, Brian and Martin et al …. "bytes initialized to 1s (or 1's)" is quite clear to me.  It comes down to whether the reader (incorrectly) injects some words into the sentence that aren't actually there, namely "… [all the bits of] the bytes", or replaced the word "initiatialized to" with "with all bits set to".

    Grammar or punctuation isn't the problem, simple reading comprehension/false assumptions is the issue.  ;)

  23. Mark Y. says:

    @Doug: Thanks, that was rather enlightening!  Although, come to think of it, your explanation implies that in this case there seems to be a loophole.  If the computer is in sleep mode, we can turn off fancy memory management routines, since no one is using memory.  If it's not in sleep mode, we can afford to run 4 times a second, since we're awake anyway.

  24. ErikF says:

    @Mark: Of course, back in the bad old days when Win16 was around, polling wasn't considered as bad because the computer never went to sleep anyways; the CPU ran in two speeds: 100% and "off" (accomplished by hitting the Big Red Switch).

  25. Danny Moules says:

    "Keep that in mind the next time a moron tells you need 8 GB of RAM to write a letter in Word! "

    But you do need n GB of RAM to write a letter in Word, if you use one for a non-obsolete OS. Don't shoot the messenger.

  26. Nick says:

    the CPU ran in two speeds: 100% and "off" (accomplished by hitting the Big Red Switch).

    Don't forget the Turbo Switch! :)

  27. Brian_EE says:

    Speaking of turbo switch, I had a 33Mhz 486DX and a case that had a two-digit seven-segment display. I figured out how to rewire the display so it read 33 normally and 99 with the turbo switch pressed. All my buddies kept asking me how I got the CPU to run at 100 MHz and were jealous of me. Little did they know…

  28. Gabe says:

    If you think 250ms-timers are bad on a single machine, imagine what it's like on a VM host running dozens of virtual machines!

  29. user says:

    Tell us more about Windows 1 and what you were doing back then.

Comments are closed.

Skip to main content