It's been about a week since I posted the debugging challenge for this lab, things have been a bit busy lately so sorry about the tardiness on posting the lab.
I have a love-hate relationship with statistics, I like them cause you can use them to get a point across and they can help you analyse things, but at the same time I hate them because people have a tendency to rely on them blindly and use them out of context without understanding the meaning behind them. The classic example is of course the sentence "More than 98 percent of convicted felons are bread eaters", that sentence is excellent to combat badly used statistics in a discussion:).
Since the start of the debugging demos, over 5000 people have downloaded the labs. (wow, there's a lot of debuggers out there:)) If half of these have done the labs that means that the buggy bits site has gotten close to 30 million hits given how much we are stressing it with tinyget... that's more than some of the biggest online newspapers in Sweden have gotten in that time... YAY!
Of course, looking at it like this, knowning the details, these statistics are completely useless, but if you didn't have this context you could end up with something along the lines of the WTF story "I've got the monkey now". A really funny read:)
Anyways, on to todays lab... we are going to look at the shipping details for bugspray 1000 times and watch it create the memory leak posted in the Lab 6: Debugging Challenge
Previous demos and setup instructions
If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.
Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review
Lab 2: Crash
Lab 2: Crash - review
Lab 3: Memory
Lab 3: Memory - review
Lab 4: High CPU hang
Lab 4: High CPU hang - review
Lab 5: Crash
Lab 5: Crash - review
We have started getting out of memory exceptions on the buggy bits site and we have been able to determine a scenario in which we think we are leaking memory but we can't seem to figure out where the memory is going.
The leak seems to be occurring on our ProductInfo page for example
and we can reproduce it by stress testing.
It seems like it is leaking just a small bit every time but since it is something that customers look at a lot and over time the process will crash with an out of memory exception.
Reproduce the issue and gather data:
1. Restart IIS (iisreset)
3. Set up performance monitoring per Lab 3 but also add the .NET CLR Loading counters and start monitoring the performance
4. Stress the application with tinyget (tinyget -srv:localhost -uri:/BuggyBits/ProductInfo.aspx?ProductName=Bugspray -threads:50 -loop:20)
5. After tinyget has finished, get a hangdump with adplus (adplus -hang -pn w3wp.exe -quiet)
6. Stop the performance monitor log
Review the performance monitor log to figure out what we are leaking:
1. Open up the performance monitor log in performance monitor and look at the following counters (set the scale appropriately so that you can see the graphs in the window)
.NET CLR Memory# Bytes in all heaps
.NET CLR Memory# Total committed bytes
.NET CLR Memory# Total reserved bytes
.NET CLR LoadingCurrent Assemblies
2. Compare Private Bytes, Virtual Bytes and #Bytes in all heaps
Q: Do the graphs for these 3 counters follow eachother or do they diverge? Based on this, can you tell if the issue we are facing is a virtual bytes leak, a native leak or a .NET leak?
3. Look at the Current Assemblies counter
Q: Should this counter stay flat or is it ok for this counter to increase like this? What does it mean?
Debug the memory dump
If the curves for private bytes and bytes in all heaps diverge we either have a "native leak" which means that we have a native component that is leaking (in which case debug diag would be the next step), or we have an assembly leak.
1. Open the memory dump, load up the symbols and load sos.dll (see information and setup instructions for more info)
Q: What is the size of the memory dump (on disk)
2. Run !eeheap -gc and !dumpheap -stat
Q: What is the size of the .NET heap according to !eeheap -gc, why is it different from #Bytes in all heaps?
We saw from performance monitor that we appeared to be leaking assemblies, so the next step is to determine where these assemblies are created and why we are leaking them.
3. Run !dumpdomain to look at the assemblies loaded in the process
Q: Which domain has most assemblies loaded?
Q: Are these dynamic assemblies or assemblies loaded from disk? (is there a path associated with them)
4. Dump the module contents using !dumpmodule <moduleaddress> where module address is the address given right after Module Name on one or a few of the dynamic assemblies. eg. in the example below you would run !dumpmodule 11b7e900
11b7e900 gyq9ceq2, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
5. Run dc <MetaDataStart> <MetaDataEnd> to dump out the metadata for the module and find out what is implemented in this dynamic assembly. eg. in the example below you would run dc 114d09e4 114d09e4+0n4184
Note: We use the start address + 0n4184 because the metadata is 4148 bytes and the 0n stands for decimal
0:000> !dumpmodule 11b7e900
Name: gyq9ceq2, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
MetaData start address: 114d09e4 (4184 bytes)
Q: What type of assembly was this? What is it used for? How is it generated?
Putting it all together and determining the cause of the assembly leak
If we look at the MSDN documentation for XmlSerializer we get the following information about dynamically generated assemblies related to XmlSerialization:
Dynamically Generated Assemblies
To increase performance, the XML serialization infrastructure dynamically generates assemblies to serialize and deserialize specified types. The infrastructure finds and reuses those assemblies. This behavior occurs only when using the following constructors:
If you use any of the other constructors, multiple versions of the same assembly are generated and never unloaded, which results in a memory leak and poor performance. The easiest solution is to use one of the previously mentioned two constructors. Otherwise, you must cache the assemblies in a Hashtable...
From this, and the fact that our performance logs and dump shows that we are continously generating new XML serialization assemblies we can conclude that it is very likely that we are not using one of the standard constructors. Search the project code for new XmlSerializer or use reflector like in this example to determine where we are generating these dynamic assemblies.
Q: What method / line of code is causing the problem?
Resolve the issue and rerun the test to verify the solution
1. Resolve the issue by caching the XmlSerializer using the sample in the MSDN documentation for XmlSerializer.
2. Rerun the test to verify that the assembly "leak" no longer exists.