Web Application Memory leakage caused by XML operations - GetElementsByTagName()

 

Symptom

=============

In ASP.NET web application, if you do a lot of GetElementsByTagName() operations with an XML document which is stored in ASP.NET Application state, the CLR memory usage will continuously increase and finally leads to OOM(Out Of Memory).

 

Root Cause

=============

This problem occurs because the GetElementsByTagName method returns an XmlNodeList collection that registers listeners(instances of XmlNodeChangedEventHandler) on the NodeInserted and the NodeRemoved events. For example, when you call the GetElementsByTagName method ten times, the NodeInserted and the NodeRemoved events have ten listeners. Therefore, when you call the GetElementsByTagName method many times, many XmlNodeChangedEventHandler objects are created and they will only be released when the XmlDocument is released.

 

Analysis

=============

With the memory Userdump, we can find most of the memory is consumed by XmlNodeChangedEventHandler and XmlElementList. Please ignore the XmlElementList, because they are created together with XmlNodeChangedEventHandler. The amount of XmlNodeChangedEventHandler is almost two times of XmlElementList, this means two listeners(on NodeInserted and NodeRemoved events) serve for one XmlElementList.

 

0:000> !DumpHeap -stat

Using our cache to search the heap.

   Address MT Size Gen

0x79bff564 1 12 System.Runtime.Remoting.Activation.ActivationListener

……

……

0x16b111c4 92,970 1,859,400 System.Xml.XmlText

0x0221236c 767 2,005,896 System.Char[]

0x0221209c 56,987 3,424,876 System.Object[]

0x79b94690 163,304 15,341,816 System.String

0x17c4f0c4 4,159,551 183,020,244 System.Xml.XmlElementList

0x16adcc14 8,319,114 232,935,192 System.Xml.XmlNodeChangedEventHandler

Total 13,363,835 objects, Total size: 456,945,968

 

If you never manually add the listeners on the XmlDocument object, then it is mostly caused by GetElementsByTagName() operations. And we can find the memory is continuously increasing as time go on.

 

However, we cannot say this is a bug for GetElementsByTagName().The MS implementation of this function conforms to the W3C Level1 DOM spec. NodeLists and NamedNodeMaps in the DOM are "live", that is, changes to the underlying document structure are reflected in all relevant NodeLists and NamedNodeMaps. In other words, GetElementsByTagName is, according to the spec, supposed to return a ‘live list’ where changes to the underlying DOM are reflected in the returned NodeList.

 

For details please refer to https://msdn.microsoft.com/en-us/library/system.xml.xmlelement(VS.80).aspx

 

Solution

=============

To avoid this problem, please replace GetElementsByTagName with SelectNodes or SelectSingleNode. Another choice, don’t maintain the XmlDocument in memory for a long time.

 

Regards,

 

ZhiXing Lv