A short puzzle about heap expansion

At the 2008 PDC, somebody stopped by the Ask the Experts table with a question about the heap manager.

I don't understand why the heap manager is allocating a new segment. I allocated a bunch of small blocks, then freed nearly all of them. And then when my program makes a large allocation, it allocates a new segment instead of reusing the memory I had just freed.

Under the classical model of the heap, the heap manager allocates a large chunk of memory from lower-level operating system services, and then when requests for memory come in from the application, it carves blocks of memory from the big chunk and gives them to the application. (These blocks are called busy.) When those blocks of memory are freed, they are returned to the pool of available memory, and if there are two blocks of free memory adjacent to each other, they are combined (coalesced) to form a single larger block. That way, the block can be used to satisfy a larger allocation in the future.

Under the classical model, allocating memory and then freeing it is a net no-operation. (Nitpicky details notwithstanding.) The allocation carves the memory out of the big slab of memory, and the free returns it to the slab. Therefore, the situation described above is a bit puzzling. After the memory is freed back to the heap, the little blocks should coalesce back into a block big enough to hold a larger allocation.

I sat and wondered for a moment, trying to think of cases where coalescing might fail, like if they happened to leave an allocated block right in the middle of the chunk. Or maybe there's some non-classical behavior going on. For example, maybe the look-aside list was keeping those blocks live.

As I considered the options, the person expressed disbelief in a different but telling way:

You'd think the low-fragmentation heap (LFH) would specifically avoid this problem.

Oh wait, you're using the low-fragmentation heap! This is a decidedly non-classical heap implementation: Instead of coalescing free blocks, it keeps the free blocks distinct. The idea of the low-fragmentation heap is to reduce the likelihood of various classes of heap fragmentation problems:

  • You want to make a large allocation, and you almost found it, except that there's a small allocation in the middle of your large block that is in your way.
  • You have a lot of free memory, but it's all in the form of teeny tiny useless blocks.

That first case is similar to what I had been considering: where you allocated a lot of memory, free most of it, but leave little islands behind.

The second case occurs when you have a free block of size N, and somebody allocates a block of size M < N. The heap manager breaks the large block into two smaller blocks: a busy block of size M and a free block of size (N − M). These "leftover" free blocks aren't a problem if your program later requests a block of size N − M: The leftover block can be used to satisfy the allocation, and no memory goes wasted. But if your program never asks for a block of size N − M, then the block just hangs around as one of those useless blocks.

Imagine, for concreteness, a program that allocates memory in a loop like this:

  • p1 = alloc(128)
  • p2 = alloc(128)
  • free(p1)
  • p3 = alloc(96)
  • (Keep p2 and p3 allocated.)
  • Repeat

Under the classical model, when the request for 96 bytes comes in, the memory manager sees that 128-byte block (formerly known as p1) and splits it into two parts, a 96-byte block and a 32-byte block. The 96-byte block becomes block p3, and the 32-byte block sits around waiting for somebody to ask for 32 bytes (which never happens).

Each time through this loop, the heap grows by 256 bytes. Of those 256 bytes, 224 are performing useful work in the application, and 32 bytes are sitting around being one of those useless tiny memory allocations which contributes to fragmentation.

The low-fragmentation heap tries to avoid this problem by keeping similar-sized allocations together. A heap block only gets re-used for the same size allocation it was originally created for. (This description is simplified for the purpose of the discussion.) (I can't believe I had to write that.)

In the above scenario, the low-fragmentation heap would respond to the request to allocate 96 bytes not by taking the recently-freed 128-byte block and splitting it up, but rather by making a brand new 96-byte allocation. This seems wasteful. After all, you now allocated 128 + 128 + 96 = 352 bytes even though the application requested only 128 + 96 = 224 bytes. (The classical heap would have re-used the first 96 bytes of the second 128-byte block, for a total allocation of 128 + 128 = 256 bytes.)

This seemingly wasteful use of memory is really an investment in the future. (I need to remember to use that excuse more. "No, I'm not being wasteful. I'm just investing in the future.")

The investment pays off at the next loop iteration: When the request for 128 bytes comes in, the heap manager can return the 128-byte block that was freed by the previous iteration. Now there is no waste in the heap at all!

Suppose the above loop runs 1000 times. A classical heap would end up with a thousand 128-byte allocations, a thousand 96-byte allocations, and a thousand 32-byte free blocks on the heap. That's 31KB of memory in the heap lost to fragmentation, or about 12%. On the other hand, the low-fragmentation heap would end up with a thousand 128-byte allocations, a thousand 96-byte allocations, and one 128-byte free block. Only 128 bytes has been lost to fragmentation, or just 0.06%.

Of course, I exaggerated this scenario in order to make the low-fragmentation heap look particularly good. The low-fragmentation heap operates well when heap allocation sizes tend to repeat, because the repeated-size allocation will re-use a freed allocation of the same size. It operates poorly when you allocate blocks of a certain size, free them, then never ask for blocks of that size again (since those blocks just sit around waiting for their chance to shine, which never comes). Fortunately, most applications don't fall into this latter category: Allocations tend to be for a set of fixed sizes (fixed-size objects), and even allocations for variable-sized objects tend to cluster around a few popular sizes.

Generally speaking, the low-fragmentation heap works pretty well for most classes of applications, and you should consider using it. (In fact, I'm told that the C runtime libraries have converted the default C runtime heap to be a low-fragmentation heap starting in Visual Studio 2010.)

On the other hand, it's also good to know a little of how the low-fragmentation heap operates, so that you won't be caught out by its non-classical behavior. For example, you should now be able to answer the question which was posed at Ask the Experts. As you can see, it often doesn't take much to be an expert. You can do it, too.

Sidebar: Actually, I was able to answer the customer's question even without knowing anything about the low-fragmentation heap prior to the customer mentioning it. (Indeed, I had barely even heard of it until that point.) Just given the name low-fragmentation heap, I was able to figure out roughly how such a beast would have operated. I wasn't correct on the details, but the underlying principles were good. So you see, you don't even have to know what you're talking about to be an expert. You just have to be able to take a scenario and think, "How would somebody have designed a system to solve this problem?"

Comments (18)
  1. Matt Fisher says:

    The puzzle is short, but the answer is long.

  2. Yuhong Bao says:

    It does have another point, though. In Vista and later, you are always using the low-fragmentation heap, there is no other option.

  3. nathan_works says:

    Custom memory managers, or dinking with the parameters of the one you have is something that only the very very wise or utterly foolish would do.. But often, the person doing it is both..

  4. Adrian says:

    One of the hardest bugs I ever had to track down was in a homemade, classical memory manager.

    It was back in the 16-bit Windows days.  If you did a particular (long) sequence of steps, the app would crash.  In the debugger, we could see that one data structure was thoroughly trashed, but we couldn’t tell why.  Worse, setting a breakpoint *any* breakpoint prevented the problem from reproducing.

    Our homemade memory manager allocated 64KB blocks from the OS and carved them up as needed by our application.  If it ran out of that space (which was rare), it would allocate another 64KB block.  The long sequence of steps had the effect of using up the first 64KB block and a little bit of the second one, and then freeing some of those recent allocations, and then allocating some new blocks.

    The bug was that the memory manager would accidentally coalesce a free block at the tail end of the first 64KB block with one at the front end of the next 64KB block, *if* the 64KB blocks were at consecutive memory locations.  That was a huge problem, since our memory manager doled out near (16-bit) pointers.  The long sequence of steps resulted in a buffer that supposedly spanned one of these 64KB boundaries.  As the buffer filled up, the near pointer wrapped around to the beginning of the first 64KB block, trashing one of the very first data structures the program had allocated.

    In the bad old 16-bit days, the debuggers were more invasive.  If you set a breakpoint, the debugger made an allocation off your OS heap.  It happened that that block was always between the first and second 64KB blocks that our memory manager got.  Since the 64KB blocks were no longer adjacent in memory, the coalescing bug never manifested if you had a breakpoint set.  It took days of printf debugging to finally crack that nut.

  5. Arlie says:

    I know this is a Windows blog (chiefly), but I’m so glad I don’t have to worry about heap fragmentation any more.  I do nearly all of my work in C#, and with some pretty stringent performance requirements.  The benefits for performance of a compacting GC are just amazing.  No VA fragmentation!  Your cache locality *increases* over time!

    When I go back to plain old C/C++ development now, it feels like bear-skins-and-stone-knives time.  Don’t get me wrong — I appreciate all the work that went into Windows (hell, I was a Windows Networking dev for years), and into the C/C++ environment.  But I’m through with it, and I love it.

  6. wojo says:

    @yuhong2, mostly true. I believe it disables the LFH in certain scenarios like fixed size heaps or with the flag HEAP_NO_SERIALIZE (which is used often when you take care of more details in your own memory manager). Also, when the LFH is disabled, a very naive (read: slow) heap manager is used that is even slower than the pre-LFH memory manager. Watch out for that!

  7. Goran says:

    Raymond, it looks like you don’t like Lisp! I mean, look, this snippet:

    “… (This description is simplified for the purpose of the discussion.) (I can’t believe I had to write that.)”

    should really be:

    “… (This description is simplified for the purpose of the discussion. (I can’t believe I had to write that.))”

    [News flash: English and Lisp are different languages. -Raymond]
  8. Adam says:

    “So you see, you don’t even have to know what you’re talking about to be an expert.”

    Slightly dangerous comment given past gripes :)


    [Um, that was the case of somebody who didn’t care about the logic behind the answer, the opposite of somebody who tries to reason it out. -Raymond]
  9. Semi Essessi says:

    "So you see, you don’t even have to know what you’re talking about to be an expert. You just have to be able to take a scenario and think, "How would somebody have designed a system to solve this problem?""

    This is a good attitude to have and its not specific to programming or any field in particular.

    Logic >> experience.

    I can’t count the number of times I’ve solved a problem or improved a process with little or no knowledge, simply because what would be an obvious bug in code (due to logic) stands out like a flashing red warning light.

    The downside is that some people get really offended when this happens…

  10. Mark Sowul says:

    Arlie: I agree, but it’s not something you can put completely out of your mind, because the large-object heap is not compacted…

  11. Adam says:

    "[Um, that was the case of somebody who didn’t care about the logic behind the answer, the opposite of somebody who tries to reason it out. -Raymond]"

    You weren’t reasoning – you were providing an educated guess that proved to be correct.

    But on the up side, unlike that other person, you did go to confirm that you were correct – so kudos there.

  12. AsmGuru62 says:

    @wojo: HeapAlloc only slow in DEBUG. In RELEASE it is quite fast.

  13. Gabe says:

    Adam: Isn’t the difference between an educated guess and a regular guess that an educated one has reasoning behind it and a regular one is effectively random?

  14. Yuhong Bao says:

    BTW, on enabling the low-frag heap in Windows 2000, just because HeapSetInformation is exported in kernel32.dll doesn’t mean the low-frag heap is actually available, as all it does is GetProcAddress RtlHeapSetInformation from ntdll.dll, which is the DLL that actually contain the heap code. If this function is not actually exported from ntdll.dll, HeapSetInformation will fail and the low-frag heap is not available. I found this out by stepping into kernel32!HeapSetInformation using a debugger.

    [I don’t remember anybody asking about the low-fragmentation heap on Windows 2000. I’m further convinced you comment just to hear yourself talk. -Raymond]
  15. wojo says:

    @AsmGuru62, I believe that even in release mode it is. I received feedback from someone at Microsoft’s team that worked on the heap manager and he mentioned the slowdown I’m mentioning (when HEAP_NO_SERIALIZE) was due to now slower code paths in the heap manager. It can happen from certain debugging flags and the HEAP_NO_SERIALIZE flag. These paths are slower because of the reorganization during Vista development, supposedly. If you are interested more, let me know.

  16. 640k says:

    When I go back to plain old C/C++ development now, it feels like bear-skins-and-stone-knives time.

    c/c++ doesn’t necessary exclude a gc. Infact, VS do combine these if you want.

  17. Neil says:

    Does "Low-fragmentation" refer to internal fragmentation, external fragmentation, or both?

  18. Jules says:

    There are, of course, other reasons to allocate memory blocks of the same size together.  I’ve used allocators before that decide on a single size-of-allocation for each page, and then just use a bitmap to decide which blocks within the page are allocated.  This makes allocation faster and reduces memory overhead compared to the traditional linked-list-of-blocks approach.  I’m actually quite surprised that any modern allocator isn’t taking this approach.

Comments are closed.