ASP.NET 2.0 Cache Objects get trimmed when you have low Available Memory

In my line of business I normally do a fair amount of debugging but not as much as an Escalation Engineer would do. This was just one of those problems I was running into and since I really didn't have a clue on what would be the next steps, I resorted to gathering dumps (which didn't really help). But in this blog i am going to try and approach this problem from a troubleshooting perspective to understand the issue and maybe give you an insight on how PSS works.

PROBLEM DESCRIPTION
==================
The title would normally tell you where i am coming from, but here is a brief description.
Customer had written a brand new application in ASP.NET 2.0 and while running a stress test he observed that the Cache objects were getting trimmed almost immediately once they got created. Interestingly there really wasn't a "Memory Pressure"

A bit of clarification here. When I say that "Memory Pressure" wasn't high, what I really mean is that the total committed memory for the worker process was showing up as 100 MB ONLY. This does have to do something with memory pressure, but I will open that card when I come to the end of this blog :-)

 

TROUBLESHOOTING METHODOLOGY
============================
As usual when a customer calls in and says "I am running into such and such problem" and this is my environment, I get skeptical, not because I don't believe the customer, but just because I don't believe anything until I see it. There are too many cases I have dealt with where we were focused on a particular issue and were getting nowhere only to realize later that the information provided was insufficient or incorrect. And a lot of times, its not the customer to blame but someone else down the line who provided the wrong information.

So the first thing i told Punit (the SE who got this case) was to verify what the customer means when he says he is running into a cache problems getting trimmed, what does he really mean? In this case the customer really knew what he was talking about and to isolate the issue he created a simple application which added about 1000 items to the cache and he had implemented a callback method for these cache items. This callback method would then enumerate the items removed from the cache and display the reason for expiration.

On the customers environment we could see that the cache items were almost immediately trimmed in less than a minute and the reason showing up was "Underused". If we take a look at the documentation (https://msdn2.microsoft.com/en-gb/library/system.web.caching.cacheitemremovedreason(VS.80).aspx) this mentions that "Underused" is when "The item is removed from the cache because the system removed it to free memory"

This was a bit surprising to me because when we looked at the worker process in the Task Manager it was only showing up to be around 100 MB in size. We asked for the sample application and tried to reproduce this behavior and it worked perfectly in our environment. With some experimenting and playing around with the config file at the customers end, we found out that if we increase the value of percentagePhysicalMemoryUsedLimit to 90 or above it doesn't trim the cache items.

So the question now would be why is the Worker Process thinking that its under "Memory Pressure" and trimming the cache objects? First thing that came to my mind was that since this is only specific to the customers environment maybe he just doesn't have enough RAM on this machine. This theory fell to the ground when we found out that he has 2 GB of RAM.

OK, so what else? Dependencies? Maybe the cache items we are inserting have dependencies like File Dependencies or SQL Dependencies. Well from theory we know that the dependent item (cache object) will automatically be removed from the Cache if the dependency changes. So if I have a File Dependency set up and if something is modifying this file like for example a Virus Scan it would cause the Cache item to be removed. But this wasn't the case either, there were no dependencies associated with these items.

Application Restart's? This could be another possibility, Cache is global to the application domain itself, so if due to some circumstances like for example the web.config or global.asax is touched by a virus scan, it could cause a App Domain Recycle resulting in the cache getting trimmed. There are various other reasons for this to happen and its beyond the scope of this blog so I will keep it short for now.

So at this point, I was running out of ideas and I did the next logical thing we do here at PSS, take dumps and Perfmon data. The counters in perfmon that i was looking for were around cache, specifically under the "ASP.NET Application Counters". I was also interested in looking at the counters under .NET Memory to see if I can find anything in there.

When i looked at the perfmon data, it just proved what we already knew. The counters i was looking at were, the Cache Total Entries, Cache Total Turnover Rate, Cache API Entries, Cache API Turnover Rate.  Interestingly the turnover was large and the total entries was going up and then suddenly falling down.

Next thing I did  was to look at the memory dumps (we had taken three consecutive dumps and when i looked at the Private Bytes Pressure this is how it looked.

0:000> !do 05597268
Name: System.Web.Caching.CacheMemoryPrivateBytesPressure
MethodTable: 68a16d9c
EEClass: 68a16d2c
Size: 52(0x34) bytes
GC Generation: 2
 (C:\WINDOWS\assembly\GAC_32\System.Web\2.0.0.0__b03f5f7f11d50a3a\System.Web.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fed1c  4001345        8         System.Int32  0 instance       81_pressureHigh
790fed1c  4001346        c         System.Int32  0 instance       80 _pressureMiddle
790fed1c  4001347       10         System.Int32  0 instance       79 _pressureLow
790fed1c  4001348       14         System.Int32  0 instance        0 _i0
791240f0  4001349        4       System.Int32[]  0 instance 0559729c _pressureHist
790fed1c  400134a       18         System.Int32  0 instance       12 _pressureTotal
790fed1c  400134b       1c         System.Int32  0 instance        2 _pressureAvg
790fcb80  4001350       20         System.Int64  0 instance 1286602752 _memoryLimit
79105ef0  4001358       28      System.DateTime  1 instance 05597290 _startupTime
79104f64  4001351      808       System.Boolean  0   shared   static s_isIIS6
    >> Domain:Value  0016cc10:NotInit  0019ef10:0 <<
790fcb80  4001352      7f0         System.Int64  0   shared   static s_autoPrivateBytesLimit
    >> Domain:Value  0016cc10:NotInit  0019ef10:-1 <<
790fcb80  4001353      7f8         System.Int64  0   shared   static s_totalPhysical
    >> Domain:Value  0016cc10:NotInit  0019ef10:0 <<
790fcb80  4001354      800         System.Int64  0   shared   static s_lastReadPrivateBytes
    >> Domain:Value  0016cc10:NotInit  0019ef10:26214400 <<
79105ef0  4001355      1c8      System.DateTime  1   shared   static s_lastTimeReadPrivateBytes
    >> Domain:Value  0016cc10:NotInit  0019ef10:055af9e8 <<
79105474  4001356      80c        System.UInt32  0   shared   static s_pid
    >> Domain:Value  0016cc10:NotInit  0019ef10:6104 <<
790fed1c  4001357      810         System.Int32  0   shared   static s_pollInterval
    >> Domain:Value  0016cc10:NotInit  0019ef10:120000 <<

 

So surprisingly from the above results, it looks like the the memory limit is 1.2 GB. I did a lot more digging around in the dump to make some sense of the results I was seeing, but didnt make headway, so I took the next step at this point and engaged the Product Team by raising an internal bug. I went  back and forth with Thomas from the product team and finally found out the problem was related to the "Available Memory" being low. Now Cache objects are trimmed based on the algorithm

Pressure = (Total Memory - Available Memory)/Total Memory

 

So in the customers case the total memory was 2 GB, Available was 350 MB and this expression evaluates to
(2048-350)/2048 = 0.82 which means 82%.

The customer had set the percentagePhysicalMemoryUsedLimit to 80% and thats ASP.NET was trimming the cache items. As a rule when we go over the _preassureHigh value we drop half the cache entries and if we go over _preassureMiddle we drop a small percentage.

ASP.NET 2.0 attempts to flush roughly 50% of the memory consumed by cached entries (both output cached and objects cached using the Cache object) when memory consumption for a process approached 90% of the memory consumption limits.  On a 2GB 32-bit machine, the default memory limit is 800MB and on a /3GB 32-bit machine (or WOW64) the limit is 1.8 GB.