Constant Soft Faults per second -- another old Sidewalk Story

There was one other Real Gem from my experience with Sidewalk so I thought I'd share that one too. I originally wrote this back in 2000 for an internal audience but here it is for you all now. Remember this is discussing performance on 1998 era hardware but I think some of these phenomena are still relevant.

... time shift to the year 2000 ....

While doing performance tests [in 1998] we noticed an interesting phenomenon which motivated us to think about how the [Windows] NT heap was working for us. What we noticed was that regardless of our web page throughput, page faults per second stayed pretty much constant. In the same way that clock speed is ultimately binding, it seemed that the OS's ability to handle page faults was forever our bottleneck. And so, much of what we did was work to reduce the number of page faults per HTTP request. If faults/sec is constant then its easy to see that your throughput is determined almost completely by the number of page faults per request.

Well, the good news is that these page faults were almost 100% "soft faults". And a good thing too or we'd have been dead (2500 page faults/sec was about what we were seeing as I recall).

At this point you might be asking the same things we were. "We're the only thing running on this box, we've got only a tiny amount of code (compared to any app you care to name) that should be all resident. Our memory usage is basically flat. Why on earth are we soft-faulting all over the place?"

Analysis:

We're using the NT heaps for our memory management in virtually all cases. Sometimes they are private heaps that are thread specific, sometimes the have to be thread safe, sometimes it's the global heap, but in any case it's the NT allocator. The NT allocator wants to be a good citizen, so when it can, and when it deems appropriate, it returns memory to the OS for use by other processes. There's the problem right there. If you give the memory back to the OS, it's out of your working set... when you reallocate that memory a few milliseconds later (which naturally our service would do as the next request arrived) you'll get a nice clean zeroed out chunk of virtual memory from the OS. There's the source of the soft faults right there... the first time we write to that nice new memory, the zero page has to be truly committed and becomes a real r/w page.

Conclusion:

Well, what can I say but, "ACK!" Since we're churning so much, giving the memory back to the OS is a mistake, we know darn well we're going to need it again "real soon". So we wrote a little custom memory allocator which we used to front end NT heaps in those cases where we could. Our memory recycler doesn't actually free the memory when you free it. It keeps it on some linked lists... Nothing too exciting there, but, doing it this way means no attempt is made to give the memory back to the OS, which saves a ton of soft-faults. The recycler "heats up" over the course of a few minutes and then stops doing "real" allocations at all.

Naturally making this change had no effect on page faults per second, but our throughput went up by around 30% (i.e. page faults per request went down and requests per second went up).

Postscript:

I wonder how/if this situation is different in Win2000. Our experience was (again) with NT4 SP3.

.... return to the present day ...

Now that it's 2006 you can imagine how different things are with the different heap strategies in Windows Server 2003 and of course garbage collected heaps in the managed stack. I used to spend a lot of time working on these kinds of issues and I don't do so nearly so much. I think that's a good thing :)