If I zero out my memory pages, does that make them page in faster?


In an earlier discussion of discardability, I noted that if you allocate some zero-initialized pages and never modify them, then the memory manager can just throw the memory away because it can "recover" the page full of zeros by simply zeroing out some memory. Commenter L wanted to know if this means that zeroing out memory can help program performance.

No, zeroing out memory does not help.

The reason the memory manager knows that it can throw the zero-filled memory away is that the page is not dirty.

When a page faults in (either because it's a zero-initialized page being accessed for the first time, or because it needs to be loaded from disk), the memory manager assigns a physical page, fills the page with the appropriate data (zeroes or data from disk), and points the page table entry at the page. It also clears the dirty bit, which is a special bit in the page table entry.

When you write to memory, the CPU sets the dirty bit in the page table entry for the page you wrote to. This bookkeeping is done automatically by the CPU and requires no effort from the operating system. When it comes time to page out the memory, the memory manager can do a quick check of the dirty bit, and if it's clear, then it knows that the memory was not modified since it was originally faulted in, which means that there are no changes that need to be written to disk. The next time the page faults in, it can be initialized the same way it was last time (filled with zeroes or loaded from disk).

If you manually zero out the page, then you set the dirty bit, and the memory manager will say, "Well, it looks like the program modified the memory, so I'll have to write it out to the page file so I don't lose it."

Now, in theory, the memory manager could add an extra step: Check if the page consists entirely of zeroes, and if so, then mark it as a zero-initialized page and discard it. The memory manager doesn't do this because it's such a low probability shot. The savings in the rare cases where a page being paged out happens to be a dirty page full of zeroes are outweighed by the cost of checking the page in all the cases where the page is not filled with zeroes.

In a sense, this is a self-fulfilling prophecy. The memory manager doesn't perform this check because it pays off so rarely as not to be worth the effort of checking. But since the memory manager doesn't perform the check, programs don't bother zeroing out pages when they are done with them. This creates a feedback loop and the net result is that nobody zeroes out pages because it doesn't help.

You can imagine an alternate universe where a positive feedback loop exists: The memory manager performs this check because it pays off, and the fact that the memory manager performs the check induces more programs to zero out their pages, which increases the payoff. But that's not the world we live in today.

And we're likely never to enter that world: Programs which want to tell the memory manager, "Don't bother paging this memory back out because I don't care what's in it" can convey this message by passing the MEM_RESET flag to the Virtual­Alloc function.

Comments (20)
  1. Karellen says:

    I wonder how quick the all-zeros check would be in the average case for a dirty, non-zero-filled page? Remember, you can stop scanning as soon as you find any non-zero byte, and the checks would be embarrassingly parallelable with e.g. SSE instructions.

    Or, for an average dirty, non-zero-filled page, what are the odds that the first, say, 64 bytes are all zeros? Almost certainly a lot more likely than 2^-64, as the average dirty page is not random, but probably still fairly unlikely. Alternatively, for the worst case, what are the odds that all but the last 64 bytes are zero?

    In comparison, what's the speed difference between scanning a page of physical memory, and writing a page to disk? Note, this would have been much higher in the past with spinning rust, especially on a system that's I/O bound because it's doing lots of swapping due to memory pressure - i.e. at the time that the most paging will be happening!

    It's might not end up being a win, but the benchmarks (if they had existed) might have proved interesting.

    1. Eduardo says:

      @Karellen, the cost is low, but not as low is any gains.

      Resetting a page is simpler and more informative to the memory manager than zeroing it.

  2. Peter says:

    I find the MEM_RESET flag a little confusing. What happens if I call VirtualAlloc(MEM_RESET) and then later write to the page? Does the memory manager realize that this is an interesting page again? (The MEM_RESET_UNDO flag isn't available until Windows 8.)

    1. IInspectable says:

      Since it is the CPU that sets the dirty flag, the memory manager will know, that you have written to the page. When it comes time to page out that memory page, the memory manager presumably checks both the dirty flag and the MEM_RESET flag to evaluate, whether it needs to be paged out to disk or not. However interesting the page may be, if the MEM_RESET flag is set, it simply gets thrown out the window.

      1. Douglas says:

        Well, the VirtualAlloc documentation says this:
        "Using this value does not guarantee that the range operated on with MEM_RESET will contain zeros."

        So I imagine what happens is that the memory manager just clears the dirty bit.

        Then if you read before the page gets paged out (sorry, discarded. It would be paged out normally.), you read the old values.
        If you read from the page after it's been discarded, it's as if you're reading from a freshly committed page, you read zeros.
        Whenever you write to the page, the dirty bit gets set, so from then on, the page gets paged out instead of discarded.

        Ergo, there is no "MEM_RESET flag" that is kept around. Passing the MEM_RESET flag to VirtualAlloc just clears the dirty bit flag (which is undone by writing to the page again).

        @Peter:
        As said above, writing to the page sets the dirty bit again, so it won't get discarded. However, if it's been discarded in the meantime, the old data was lost.

        The purpose of MEM_RESET_UNDO, as far as I can tell, is twofold:
        1) Set the dirty bit without modifying the page contents. Naturally, if you can just say "*x = *x;", it doesn't matter.
        2) Tell you if any pages were discarded, or if they just happened to be kept around. You might have trouble telling this on your own.

        1. Peter says:

          That sounds reasonable to me, but I wish the documentation spelled this out. The way the documentation is written, it sounds like the memory won't ever be paged again. (Which would make re-using it extremely dangerous.)

  3. The MAZZTer says:

    I would imagine the better way to do this is simply deallocate your memory when you would want to fill it with zeroes, then reallocate it when you would want to use it again. Then you're giving the memory manager all the information it needs to manage your memory as efficiently as possible, including deallocating the page and reallocating you a page full of zeroes with no dirty bit later.

    1. ErikF says:

      I would imagine that if you would find MEM_RESET to be easier if you are managing your own memory allocations from a larger arena: you would keep your allocator but gain the benefits that the VM memory manager gives you. Whether this applies to your program likely has more to do with the language and libraries that you use rather than any particular philosophical viewpoint you have regarding memory allocation, IMO.

  4. DWalker07 says:

    But but! Zeros weigh less than ones, don't they? So, zero-filled pages should be brought in faster.

    Also, if you zero out your entire hard drive, it will weigh slightly less, right?

    1. Killer{R} says:

      Of course not. Reed–Solomon coding replace you zeroes with something else, so you need to find exact pattern.

      1. bmm6o says:

        Reed-Solomon is a linear code, so the all zero message should get mapped to the all zero codeword. It's longer, but it will weight the same.

    2. Brian_EE says:

      I think it's the other way around, since 0's are fat and 1's are skinny.

      Anyway, the secret to data compression is to stack the 0's up like tires and stuff all the 1's inside them.

      1. DWalker07 says:

        I hadn't thought of sticking the 1's inside the zeros! That's genius.

      2. Tom West says:

        > That's genius

        Indeed. I blame not thinking of it on the fact that I'm a software engineer, and stacking 1's and 0's is perilously close to a hardware concern.

  5. Those so-called RAM defragmentation apps...

    1. Yukkuri says:

      But Dr. Reimu's patented snake oi-- I mean RAM defragmenter solves all PC Ills! Buy now!

    2. xcomcmdr says:

      Only some apps are concerned with the performance hit that heavy memory fragmentation brings to their use case, but usually they manage it themselves (ie. AAA video games with their own memory manager/allocators).

  6. Ray Koopa says:

    There's this feeling again after reading your article, being surprised how trivial it was to answer the question in the title.

  7. John Doe says:

    To me it looks like memory manager do checking zero.
    https://msdn.microsoft.com/en-us/library/dn613877.aspx
    "In addition, the memory manager now checks for zeroed pages before it reads or writes."
    "the memory manager simply puts the page on the zero-page list and marks the page as demand-zero in the page table."

    1. Adrian says:

      The memory manager as of Vista does indeed check for all zeroes before writing to the pagefile. I wrote the original prototype that showed that it was worth it. Some of this was due to apps dirtying their zero-init pages with inits to zero. Some of it was due to user mode zeroing pages for security reasons (not kernel mode - pages on its free lists don't get paged out). But it was common enough to make it worthwhile.

Comments are closed.

Skip to main content