Oh, that's probably why I'm in the Quake credits


Back in 2012, I couldn't remember why I was in the Quake credits. But then the comment from Kristaps pinpointed the most likely reason: The Sys_Page­In function.

void Sys_PageIn (void *ptr, int size)
{
    byte    *x;
    int     j, m, n;

// touch all the memory to make sure it's there. The 16-page skip is to
// keep Win 95 from thinking we're trying to page ourselves in (we are
// doing that, of course, but there's no reason we shouldn't)
    x = (byte *)ptr;

    for (n=0 ; n<4 ; n++)
    {
        for (m=0 ; m<(size - 16 * 0x1000) ; m += 4)
        {
            sys_checksum += *(int *)&x[m];
            sys_checksum += *(int *)&x[m + 16 * 0x1000];
        }
    }
}

What this code does is access the memory block specified by the ptr and size parameters in an unusual pattern: It reads byte zero, then the byte at an offset of 16 pages, then byte one, then a byte at an offset of 16 pages plus one, and so on, alternating between a byte and its counterpart 16 pages ahead.

This specific access pattern in Windows 95 defeated the "sequential memory scan" detection algorithm.

Recall that computers in the Windows 95 era had 4MB of RAM. Suppose you were working in a document for a long time. Finally, you're done, and you close the window or minimize it. Boom, now your desktop is visible and the wallpaper bitmap needs to be paged in. If your screen is 1024 × 768 at 16 bits per pixel, that comes out to 1.5MB of memory. Paging in 1.5MB of memory means for the bitmap means kicking out 1.5MB of memory being used for other stuff, and that's a lot of memory for a machine that has only 4MB to work with (especially since a lot of that 4MB belongs to stuff that isn't eligible for being paged out). The phenomenon we saw was that repainting your desktop would flush out most of your memory.

And then the next thing you do is probably launch a new application, which will cover the wallpaper, so the wallpaper memory isn't going to be needed any more. So we basically purged all the memory in your system in order to handle a huge block of memory that got accessed only once.

The trick that Windows 95 used was to watch your pattern of page faults, and if it saw that you were doing sequential memory access, it started marking the memory 16 pages behind the current access as not recently accessed. In the case of a straight sequential scan, this means that the entire buffer cycles through a 64KB window of memory, regardless of the buffer size. With this trick, a 4MB buffer ends up consuming only 64KB of memory, as opposed to using all the memory in your system.

The Sys_Page­In function specifically defeates the sequential-scan detector by intentionally going back 16 pages and accessing the page again. This causes it to be marked recently used, counteracting the not recently used that the sequential-scan detector had done. Result: The memory pages are all marked recently used and are no longer prime candidates for being paged out.

Comments (25)
  1. EMB says:

    And in Windows 98 SE, Raymond implemented an algorithm that specifically defeates the function that specifically defeates the sequential-scan detector.

    Just kidding...

  2. Niklas says:

    Why would Quake want to use that algorithm? I guess it would somehow help them achieve a smoother frame rate?

  3. Sintendo says:

    Am I missing something or is that an unused variable (int j)?

  4. McBucket says:

    @Niklas: Sure. It guarantees that a particular chunk of memory is marked as recently used, so it's less likely to be paged out.Presumably, they kept tight control on what parts of their world needed to be in memory at any one time, and didn't want to incur  extra cost of paging in memory multiple times, just because Windows XP had a bad use case (for which it decided to apply something of a hack to the paging rules).

  5. Jim Buck says:

    *Everything* in Quake, and games in general (but especially so in Quake at the time it was released), is about trying to get a smoother framerate.

  6. Ken in NH says:

    @Sintendo

    I suspect j was meant to hold (size - 16 * 0x1000).

  7. Ancient_Hacker says:

    Oh my, that explains, 20 years later, why some of my attempts to touch my critical heap areas way back then did not work very well.   20 years too late.  

  8. McBucket says:

    Makes me wonder what the rules for detecting sequential memory access were. If Windows was detecting only forward accesses, then maybe the Sys_PageIn() function could have walked through the pages backwards.

  9. cheong00 says:

    @EMB: I think it could be earlier, as Virtual*() APIs are just available in NT based systems. The trick was needed by games written for Win3.X and Win9X families to improve performance.

  10. Scarlet Manuka says:

    Surely the combination of low-end memory capacity and high-end video would have been an unusual one (though I understand it still had to be supported)? 800x600x16 was a good video card back then; I wouldn't have thought that people who were on a tight enough budget to only get 4MB of RAM would be splurging on 1024x768 graphics.

  11. Matt says:

    Scarlet, you have a very poor memory of the time. 1024x768 was common, 1280x1024 not unusual and 24 bit was common too (at least on the desktop - games may have been deliberately configured to run at lower resolutions and depths.)

    My second PC - a Pentium 133 with 8MB - was bought about the same time as Quake's release, and even my first - a 486 with 4MB - was capable of running resolutions and colour depths you think were exotic.

  12. Alex Cohn says:

    Wasn't this an example of using a global solution to resolve a local problem? Not Quake, I mean - that Sys_PageIn() was a local solution to a global problem.

    The local problem of desktop refresh could have been solved by smart bitmap renderer that could have much smaller memory footprint than the whole bitmap. But instead, the solution was built into the system core, and caused troubles for Ancient_Hacker and many many others, who could not understand what was going on, and could not ask Raymond to find a hack around.

  13. Boris says:

    I think I need an (HTML) diagram of this, showing the in-memory locations of the desktop, the window before the desktop reappeared and the new window which covered it again, and exactly which area is covered by the 64KB. Although the desktop wasn't accessed recently (being covered by the original application), it also wasn't unloaded from memory so it didn't need to be loaded again in the interim before the second application was launched?

  14. Neil says:

    @Boris: The key here is that painting the desktop mostly corresponds to copying memory from the bitmap to video memory. You don't actually need all of the bitmap in memory at once to achieve this, so the sequential page scan detector said "It looks like you're doing a memory copy, and you'll only be needing the memory you paged in once, so I'll just mark the RAM as ready to page out again". This means that as the memory copy pages in the bitmap, it recycles the same RAM that it used 64KB ago, rather than loading the whole bitmap into RAM at once.

  15. ZLB says:

    Well that's a blast from the past!

    I remember porting WinQuake to Windows CE 2.11 and commented that function out because it made it take a while to startup.

    Also remember something about not being able to malloc more than about 11Mb of ram, so used either HeapAlloc or VirtualAlloc which did work.

  16. Karellen says:

    @AlexCohn - I think it's more that the wallpaper-displaying code is just one example of a very common class of problem, where "some program" would do a single sequential read of a very large chunk of memory, using it (or only part of it) once, and not need the memory again for a long time. But, if you're using a least-recently-used algorithm for you memory pages, you've just evicted everything else that was very recently being used in a random-access manner - and will be again very shortly, which you now have to page back in. Therefore, this is a solution to that entire class of problems, system-wide, for all programs that do this.

    In fact, I remember one of John Carmack's .plan updates many years ago discussing this very problem with regards to the amount of texture memory on the graphics cards of the time. When rendering a scene with a LRU cache, if the size of the textures needed for a scene was just slightly larger than the texture memory available, performance would suddenly suffer *really* badly. IIRC, the idea is that if you have 32Mb of memory, and 33 1Mb textures, the first 32 textures fill up your cache slots 0-31, and then texture 33 fills up the oldest used slot, slot 0. When painting the next frame, texture 0 has to be reloaded from main RAM and goes in the oldest used slot, slot 1, whereupon tx 1 immediately needs to be reloaded - into slot 2, etc... His solution was to switch to a MRU cache, which reuses slot 31 for tx 32 when painting the first frame. Then, when painting the second frame, txs 0-30 are already in the graphics card's texture memory, and only tx 31 and 32 need to be reloaded from main RAM, each into slot 31, which ends up acting as a "scratch" buffer.

    Of course, doesn't work well over the entire lifetime of a program. He probably had some kind of counter or timer so that if a texture wasn't used for long enough then it could get evicted from the graphics card's memory.

  17. sense says:

    @McBucket: I guess it probably had to detect backward accesses, since bitmaps where commonly stored in a bottom-up manner. I could be wrong.

  18. @Matt: I don't know what kind of circles you were in, but most people I knew used mostly 640x480 or 800x600 back in the Windows 95 days.  We were at the whims of our parents, of course, who couldn't afford fancy monitors or powerful graphics cards, and I know I saw references to the higher resolutions back then, though I didn't actually get access to them until later, when we got Windows 98 SE.

  19. Yukkuri says:

    @MNGoldenEagle Circles that are older than you, obviously.

  20. bzakharin says:

    Our first computer, bought in 1993, had 4 MB RAM and a 33 MHz 486 processor. It was capable of doing 1024x768, though I don't recall the BPP. We did keep it at 800x600 because the 14" monitor made 1024x768 hard to read. By the time Windows 95 came out, we did upgrade it to 83 MHz Pentium and 8 MB RAM, but not the video card, which was powerful enough for everything we did. I don't recall playing Quake on it, but Doom ran fine. In DOS at lower resolutions, obviously. That setup lasted as our primary computer for 3 more years after that, though at some point we doubled the memory again and bought an additional hard drive.

  21. Azarien says:

    Okay, now I know I can purge all that weird paging and memory locking hacks from my own Quake sourceport :-)

  22. McBucket says:

    @sense: Perhaps, but even if the backwards scan is detected as sequential memory access, it may not have mattered, considering what Raymond describes as the 'fix':

    "The trick that Windows 95 used was to watch your pattern of page faults, and if it saw that you were doing sequential memory access, it started marking the memory 16 pages behind the current access as not recently accessed."

    Depending on what "behind" means, it may just be marking pages as not recently used that are actually "ahead" of the way you're accessing them anyways.

    Even so, the upside down bitmap issue might not be detected as sequential either. Yes, the scan lines are stored upside down, but the pixels are not reversed in the scan line. Read a scan line that spans a page would have a forward access, but then reading the next scan line would result in a backwards access. I guess a lot goes into how sophisticated the sequential memory access detection was.

    One other thing that's on my mind on this: why is accessing every byte necessary? Can't you ensure that an entire page is in memory by just accessing a single byte in the page? Unless, from the example above, the variable sys_checksum is meaningful for more than just making sure that accesses are actually used, and so don't get optimized out.

  23. Azarien says:

    @McBucket: sys_checksum is never used, but it's volatile so it won't get optimized out.

  24. McBucket says:

    @Azarien: Oh, duh -- it's in the original code, which I glanced at a couple of days ago, but ignored thereafter. Thanks.

  25. DF says:

    I'm missing the point of the n loop. And what if ptr is a 4k+2 byte ptr (eg 16*0x1000+2)?

Comments are closed.

Skip to main content