When I enable page heap, why is my one-byte buffer overrun not detected immediately?

Page Heap is a mode of the heap manager¹ that can be enabled by the Debugging Tools for Windows. When enabled, each memory allocation is placed at the end of a dedicated page, and the page that follows is left invalid. That way, if you overrun the allocation, you will crash with an access violation because you are reaching into the next (invalid) page.² When the memory is freed, the entire page is decommitted, so that any use-after-free bugs will result in an access violation. (Eventually, the heap manager will reuse the address space.)

A customer noted that when they enabled page heap, the crash did not occur on their one-byte buffer overrun. They were able to overrun the buffer by 13 bytes before the crash occurred. "We thought full page heap was supposed to catch buffer overruns immediately."

Page heap places the allocation as close to the end of the page as it can, but it may not able to push it right up to the edge because the heap is contractually obligated to respect the MEMORY_ALLOCATION_ALIGNMENT. Without these alignment guarantees, the heap would be much harder to use because every allocation would have to be manually aligned to the desired boundary. (In practice, nearly all data structures have nontrivial alignment requirements, because they will almost always contain at least one member that is larger than a byte.)

On 64-bit systems, the allocation alignment is 16, which means that Heap­Alloc promises to return a value that is evenly divisibly by 16. This contractual obligation means that if you make an allocation request for 3 bytes on a 64-bit system, then the allocation will be placed at an address of the form xxxxxxxx`xxxxFFF0, with three bytes of actual data and 13 padding bytes.

That's where the 13 bytes of slop are coming from. The heap manager cannot give you a pointer of the form xxxxxxxx`xxxxFFFD, because that would violate the alignment contract. However, the heap manager does put canary bytes in those extra 13 bytes, and when you free or reallocate the memory block, the heap manager verifies that the canary bytes have not been tampered with. So the write overrun is detected eventually.

If you want to break the alignment contract and make the memory go right up to the edge, you can ask for /unaligned /full. Note, however, that handing back unaligned memory is likely to result in other problems, because one of the ground rules of programming is that in the absence of explicit permission to the contrary, pointers must be properly aligned. The consequences of breaking this rule vary depending on how strictly your platform enforces alignment. The code might take alignment faults. Or the code might simply operate on the wrong memory. The x86 architecture is mostly alignment-forgiving, but there are still places (such as interlocked operations and SIMD instructions) where alignment is still important.

¹ I'm using the definite article on "the heap manager" because I'm referring to the system-provided heap manager. The one that you are using when you call functions like Heap­Alloc. If your program uses a custom heap library, then the page heap settings have no effect on that custom heap library. (This sounds obvious, but sometimes customers expect the page heap settings to somehow be able to alter the behavior of code it didn't write.)

² Formally, this model is known as "full page heap". There's also a "standard page heap" which places canary bytes after the end of each allocation. When you free or reallocate the memory, the heap manager checks whether the canary bytes have been tampered with; if so, then it informs you of a heap buffer overrun. I don't know why they call this "standard page heap" because there are no pages involved.

Comments (19)
  1. Smithers says:

    “Note, however, that handing back unaligned memory is likely to result in other problems,”
    I doubt this is as likely as you imply, as I would expect the alignment requirement of nearly all allocations would be a factor of their size. One could, in theory, allocate 3 bytes, then treat the first two bytes as a 16-bit integer and the third as an 8-bit, causing a problem, but I doubt this is common. More likely someone would create a C-style struct Foo { int16_t x, int8_t y } and then allocate sizeof(struct Foo) bytes; since the structure’s size gets rounded up to 4[1] (as otherwise arrays would not align), the resulting allocation at …FFFC is sufficiently aligned for the structure.

    As for “standard page heap”, I assume because the allocation is made from a “standard page” (i.e. one that could be shared with other allocations)?

    [1]Assuming no meddling with #pragma pack. My predecessor left too many of those behind for me to make such assumption lightly.

    1. Joshua says:

      Indeed; I have noted that if the structure is not being padded by the compiler, allocating an array would surely cause alignment disasters.

      Since this does not happen, we may normally assume that allocating an array works correctly for oddly-sized structs. In fact the linux equivalent of this does indeed put the last byte of odd sizes at the end of the page and therefore returns an odd pointer in such cases.

      I suppose this is slightly more likely to break in that it will break on an unpadded struct for which arrays cannot be allocated, but that must be rare.

      1. Torsten says:

        Consider the pattern of the variable size structure:

        struct s {
        __m128 vector;
        size_t numBytes;
        byte payload[0];

        This structure requires 16 byte alignment but can have any length, including odd ones. The heap manager has no way of knowing whether any given allocation is intended for something like that.

        1. Kevin says:

          As of C99, the size of the structure is “as if the [zero-length] array member were omitted except that it may have more trailing padding than the omission would imply.” So the compiler can add padding if it likes, but otherwise the member is ignored.

          (Before C99, this was a non-standard feature, so who knows what will happen.)

          1. Joshua says:

            I suspect the dead giveaway bus error told you this application was not compatible. You never used this in production anyway so no matter.

          2. Stuart says:

            Does that mean that C99 doesn’t require a trailing zero-length array to be aligned to its own type? So that given, say,

            struct v {
            byte length;
            double entries[0];

            it’d be legal for sizeof(struct v) and alignof(struct v) to be 1?

    2. ranta says:

      The alignment could be a problem with a structure that ends with an ANYSIZE_ARRAY, like SP_DEVICE_INTERFACE_DETAIL_DATA. That structure starts with a DWORD member, and applications first call SetupDiGetDeviceInterfaceDetail to get *RequiredSize = the number of bytes that you need to allocate for the structure, but number might not be a multiple of sizeof(DWORD).

  2. AsmGuru62 says:

    There are issues with HeapValidate() also.
    On small overruns – it will report a block as OK in Released code.
    Custom heap with alignment and exact allocation sizes is a way to go.

  3. rw says:

    Shouldn’t it be a three-byte buffer that must be overrun by 13 bytes to get detected?

    1. cheong00 says:

      It’ll be detected “when you free or reallocate the memory block”. Add checking on every write operation will probably slow your code to unacceptable level that you just want to turn it off.

      That said, I wonder if there can be option that makes the heap manager to allocation 16 byte aligned memory, but report address from the tail side of allocated block minus requested allocation size. Of course the heap manager would have to remember the reported address and translate to actual starting address when free/reallocation of the pointer occurs in this mode.

  4. kantos says:

    While this is a useful tool I honestly think it would be better for VC++ to support AddressSanitizer. The heap cookie system it uses is an excellent way of finding these sorts of bugs and doesn’t require an access violation to work.

    1. The nice thing about PageHeap is that you can turn it on retroactively for any binary. No recompilation needed. Recompilation may alter behavior enough to make the problem go away! And the bug might be in a component you didn’t compile, or in a component not written in C++.

  5. Michael Yang says:

    “canary bytes” protection does not help on kernel pool overrun detection, correct?

  6. sense says:

    Just the thing I was researching last week!:
    Is there any way to enable pageheap in the middle of the running application? Or only for some of allocations?
    It seems it should be possible, as pageheap can be enabled only for allocations of a specific dll. But I’m unable to find a way to control it at will.
    I need this, because enabling pageheap from the start of my app exhausts 32bit address space completely, but I need tracking a bug in a specific part of the app.

    Anyone knows of a way?

      1. sense says:

        Tnx for the link. I’d seen it.
        It states its possible to enable pageheap per dll; But I want to enable pageheap only for some allocations in the main exe, not per dll. For eg. some way to enable/disable it by calling a function.
        As it is possible to be enabled per dll, I think it should be possible to enable at will. Right?

        1. cheong00 says:

          I don’t know… I’m not familiar with C++, but maybe you can create a macro that on debug will replace your memory allocation with “memory allocation round up to the next page size” then set the memory on next page with page guard?

          You’ll need to write macro for each memory allocation/free function, and things can break if the arguments of those functions changes in future.

        2. voo says:

          You’re misunderstanding how this system works, the SO answer is trying to explain exactly that (if you think about it, that’s also the only way this system can work).

          You can turn the heap flag on per DLL that actually uses HeapAlloc. Since all your dlls presumably use malloc or new they all call into a single dll to allocate memory, so whatever setting exists for that dll is relevant.

          Which also suggests the solution: You’ll have to use a separate heap for your interesting dlls. This is error prone and requires that the application plays along but doable.

  7. Ismo says:

    Any way to query how much padding after the reserved block there is to the page break ? Perhaps one should tweak all reservations to be 16 bytes ( or some other boundary) and in own code bump the pointer up if possible ( like character buffer, it can be aligned at one byte precision).

    Or just use _aligned_malloc,

Comments are closed.

Skip to main content