A question about avoiding page faults the first time newly-allocated memory is accessed

A customer had a question about memory allocation.

When allocating memory with malloc and new, the memory is not loaded into the physical memory immediately. Instead, the memory is placed in RAM only when the application writes to the memory address. The process of moving pages into physical memory incurs page faults.

In order to avoid page faults, we use Virtual­Lock to lock all allocated memory into physical memory immediately after allocating it. According to MSDN, "Locking pages into memory may degrade the performance of the system by reducing the available RAM and forcing the system to swap out other critical pages to the paging file."

Assume that we have plenty of free RAM on the machine (say, 64GB). Is there any option to configure Windows to load the memory into RAM an allocation time and to swap it into the disk only when it runs out of memory?

The customer says that they have "plenty of free RAM", which I interpret to mean "so much RAM that there will never be any paging." But then the customer says, "when it runs out of memory", which contradicts the previous statement that they have "plenty of RAM."

So let's just ignore the "plenty of RAM" part of the statement. It is confusing and doesn't add to the discussion.

Assuming there is free memory available, the initial page fault will grab a page of free memory and zero it out. This is no slower than grabbing a page of free memory and zeroing it out at allocation time. All you did was change the time the work takes place; the total amount of work remains the same.

In other words, you will gain no performance increase by doing this. Just allocate the memory normally and don't do anything special to "force" it into memory. Forcing it into memory up front means that some other memory may need to be paged out in order to satisfy your immediate demand, and then when that other page is needed, it will need to be paged back in. The system as a whole will run slower than if you waited until the memory was accessed.

And if the page that you prematurely wrote to needs to get evicted before you get around to using it for real, then the system will discard it (assuming it's still all zeroes), and you're back where you started.

But what is the customer's real problem that that makes them think that preloading zero pages is going to help? I had a few theories.

"We think that page faults are bad and are doing everything we can to get rid of them."

The customer is not distinguishing between hard faults and soft faults. Hard faults are bad. Soft faults not so much. The page faults you are eliminating here are soft faults.

"We have carefully profiled our program and have determined that touching at allocation will reduce performance fluctuations during processing, even if it comes at a cost in overall performance. We prefer predictability over throughput."

This is a valid concern.

"We saw the function Virtual­Lock and said, 'Hey, that sounds cool! How do I get a piece of that action?'"

This is a case of finding a hammer and looking for nails.

The customer liaison replied that the customer agreed that the total time is the same, but they want to do it at allocation time in order to avoid taking page faults during processing.

If it is indeed the case that the customer is sophisticated enough to be able to measure the additional cost in terms of elapsed time (as opposed to simply counting page faults because page faults are bad mmkay?), then sure, go ahead and touch all the pages at allocation time. There is no built-in setting to do this automatically, though. You'll just have to do it yourself.

It's possible that the customer is sophisticated enough to measure this, but the way they worded the question suggests that they simply didn't understand how the operating system manages resources. Sometimes a very advanced question is hard to distinguish from a very naïve question.

Comments (15)
  1. The MAZZTer says:

    Yeah it sounds like the customer assumes that even if a system has infinite RAM, page faults would still happen for some reason. The best solution here is to let the OS do its job and for goodness sakes don’t turn off the page file to stop paging. That just exchanges page faults for out of memory errors which likely will crash most apps or at least put them into undefined states.

    Even if you have enough RAM, the difference between a system with a page file and one without is the one with a page file has a safety buffer in case for some reason commit size (that’s the term, right?) does exceed RAM, whereas the one without starts crashing apps.

    1. Piotr says:

      That “some reason” might be hipervisor’s baloon driver, which simulates memory pressure specifically to force paging to reserve some memory for other machines on the same hardware.

    2. alegr1 says:

      >Yeah it sounds like the customer assumes that even if a system has infinite RAM, page faults would still happen for some reason

      Windows just LOVES to discard pages if it thinks they’ve not been touched for a while, in favor of file cache, even though those files will not ever be opened any soon. For example, if it runs antivirus scan or indexing (those guys like seriously haven’t head of FILE_FLAG_NO_BUFFERING). Like, you have a box with enough RAM so its commit size never exceeds it, and yet, when you come back to it after a while, even the Explorer is paged out.

      1. Dave says:

        +1. And the further you go back in Windows versions, the worse it gets, Under NT 4 (or 3.51) you could cause the OS to page itself to death by copying a single file as large as or larger than physical RAM, it would take more and more memory to cache the file contents until eventually it was thrashing solidly. I’ve run Win7 and Win8 boxes without a swapfile because I’ve never come remotely close to using more than a fraction of the total RAM, but there were still just enough corner cases that caused problems that I eventually created a tiny pagefile that didn’t serve any useful purpose except provide Windows something to swap arbitrary pages to, not because it needed to but because, as you say, it just likes to page stuff out for no good reason.

        1. Alex Cohn says:

          Once (was it Windows 3.11?) it was possible to configure the system to run w/o page file. But this switched Windows onto a special mode which slowed it down significantly, because in order to keep as much RAM as possible available for the applications, it was reloading all system DLLs.

          1. You could always configure Windows to run without a pagefile. But it does sound likely that Windows 3.x/95 had special rules for how to behave when the pagefile was disabled.

  2. Joshua says:

    I remember when optimizing away soft page faults paid for itself. Today, not so much. I’m kind of surprised there’s no good call that says commit this RAM (several pages) immediately so the code only pays for one context switch to get to kernel mode rather than one per page.

    1. If page-related context switches really are killing you, you might as well just go to large pages. A one-time shortcut that only works on the first commit is pretty tiny in most applications, compared to the eventual cost of actually using that memory.

  3. kantos says:

    In my experience the vast majority of (younger) developers even for native code don’t fully understand how paging works from an OS/Kernel perspective. To be fair this included me until I made it a point to find out and understand better because I wanted to understand context switches. They understand the basic concept of paged memory usually; but not the vast array of resources the OS uses to ensure that memory can seem infinite even before swapping out to disk.

  4. Kevin says:

    Charitable assumption: The customer is a relatively new indie game studio. They want to minimize the number of page faults that happen while the game is being played, even if that means taking more faults during the loading screens.

    I say “relatively new” because the established players all know how to do this already.

  5. Mike says:

    I remember a game (Windows 95 time frame IIRC) touching its first allocated… was it 32 MB (which was quite a bit back then) to make sure it was paged in and immediately available (or more likely to make sure anything else not needed was paged _out_, so the game had it’s VA in RAM). This was obviously due to the real-time demand to avoid later jerkyness.

    But to do something like that today? Even for a game it’d almost be silly to do it. It’d have to be some _pretty_ specific requirements in place to even attempt it.

    I’m with Raymond here; it smells like the customer in error had equated “paging”with “swapping”, without knowing there are soft vs. hard faults.

  6. JoeWoodbury says:

    Before heading down the rabbit hole of trying to be too clever, first check your algorithms and how you are using the data. In other words, if every class instance in a collection has it’s data scattered all over the memory space, being clever with the base collection may not really buy anything.

    I’m also reminded of “clever” code I wrote 16 years ago, leveraging something in NT 4.0. I later realized that all I’d done is shift the burden from one area to another without actually improving anything. Worse, I’d ended up making the OS do the easy part while my code was doing the hard part, which also made the code harder to maintain.

  7. Adam says:

    There’s an interesting investigation of the costs of memory allocation / deletion and soft page faults at https://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/

    For example, freeing 32MB of memory that has been written to is several orders of magnitude slower than freeing it without writing to it.

    1. rwg says:

      But, is there possibly mileage in taking the page faults in another thread?

      If you know that your application is going to come through that block of memory with a memory intensive but CPU light operation (perhaps copying a large block of memory) *and* you know this far enough in advance to make the idea feasible, then making the OS page the memory in while you are doing something else useful in prep for the memcopy sounds plausible.

      Could be a job for ‘VirtualLockAsync’ – if it existed… More likely custom code to touch every page in region from a thread pool thread or similar.

Comments are closed.

Skip to main content