INF: How to retrieve data from a large XML string

Article
03/18/2011

We encountered an interesting issue some time back. We already had XML data inserted into a SQL Server 2008 R2 database however we were getting below error when retrieving (also called shredding) values from the XML data.

Msg 8621, Level 17, State 1, Line 1
The query processor ran out of stack space during query optimization. Please simplify the query.

Based on error, one obvious solution would be to rewrite the query by changing XML data but since we were getting the XML data from a different team/source so we did not have much control on the source XML data.

Example
======
We can also get the error reproduced in SQL Server 2005 (even in SQL Server 2008 and in SQL Server 2008 R2). The source XML was a single level (one element below base/root element) XML of below format however had around 1800 elements in this single level. For example, below XML has around 2000 elements called element0, element1…element1999. This is not a multi- level XML since it does not contain further nested XML:

<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
--removed elements in between so as to keep this brief
<element1999> dummy data 1999</element1999>
</root>

The expected output was in below format (the “…” indicates columns between element1 and element1999) however we were not reaching the output phase because query was not even getting compiled due to the 8621 error:

element0

element1

…

element1999

dummy data 0

dummy data 1

…

dummy data 1999

As an example with two elements and retrieving values from XML variable, we can re-write the above XML data into an XML variable using below and retrieve results using a SELECT query:

 DECLARE @x xml 
SET @x = CAST(N' 



<root> 



  <element0> dummy data 0</element0> 



  <element1> dummy data 1</element1> 



</root> 



' 



AS xml) 



--SELECT @x AS 'XmlData' /* uncomment this line if you want to see the XML */ 




 


Select a.value('element0[1]', 'nvarchar(max)') element0 



, a.value('element1[1]', 'nvarchar(max)') element1 



FROM @x.nodes('/*') nodeset(a)

As another example with two elements and retrieving values from a XML column in a table, we can insert the XML data into a XML columns using below and retrieve results using a SELECT query:

 USE tempdb 




-- drop the table if it already exists else create it. 




IF OBJECT_ID('XmlData') IS NOT NULL 




      DROP TABLE XmlData 




CREATE TABLE XmlData(Data xml) 




go 





 



-- insert xml data. 




INSERT INTO XmlData (Data) 




SELECT CAST(N' 




<root> 




  <element0> dummy data 0</element0> 




  <element1> dummy data 1</element1> 




</root> 




' AS xml) 




go 





 



-- Select data from table 




Select      Data.value('(/root/element0)[1]', 'nvarchar(max)') 'element0' 




            ,Data.value('(/root/element1)[1]', 'nvarchar(max)') 'element1' 




from XmlData 




go 




-- cleanup 




DROP TABLE XmlData

Another “optimal” way would be to use below query. It is optimal since combination of the nodes() and value() methods can be more efficient in generating the rowset when it has several columns and, perhaps, when the path expressions used in its generation are complex.

 USE tempdb 




-- drop the table if it already exists else create it. 




IF OBJECT_ID('XmlData') IS NOT NULL 




      DROP TABLE XmlData 




CREATE TABLE XmlData(Data xml) 




go 





 



-- insert dummy xml data. 




INSERT INTO XmlData (Data) 




SELECT CAST(N' 




<root> 




  <element0> dummy data 0</element0> 




  <element1> dummy data 1</element1> 




</root> 




' AS xml) 




go 





 



-- Select data from table 




SELECT nref.value('(/root/element0)[1]', 'nvarchar(max)') 'element0', 




       nref.value('(/root/element1)[1]', 'nvarchar(max)') 'element1' 




FROM   XmlData 




       CROSS APPLY data.nodes('/root') AS t2(nref) --'Data' is column of XML data type in table XmlData 





 



GO  




-- cleanup 




DROP TABLE XmlData

Simple! Huh? However, as mentioned earlier, our XML data had around 1800 elements. The SELECT query was around 100,000 characters since it retrieved each value with “.value” for each of the 1800 elements in the XML data. The output of this SELECT query was to be consumed by application per the requirement.

When we run a query retrieving XML data, SQL Server 2005 (and above SQL Server 2008 / SQL Server 2008 R2) use SQLXML. Beginning with SQL Server 2005, SQL Server natively supports XML typed data using the xml data type. Prior to SQL Server 2005, SQL Server 2000 did not have a XML data type and could only parse XML data so as to extract XML values using sp_xml_preparedocument / sp_xml_removedocument / OPENXML. This parsing was done using MSXML parser (Msxml2.dll) however had memory limitations as mentioned in https://msdn.microsoft.com/en-us/library/aa260385(SQL.80).aspx. Also SQL Server 2000 did not have an inbuilt XML data type which is present from SQL Server 2005. However, for this issue, despite using SQL Server 2005 (and even SQL Server 2008 or SQL Server 2008 R2), we were getting the 8621 error when SQL Server query optimizer parsed the query to retrieve values of the 1800 elements in the XML data.

Why we are getting error?
We were getting this error because SQL Server was running out of stack space when SQL Server query optimizer parsed the query. Processing a single SQL statement is the most basic way that SQL Server executes SQL statements and this is done using ‘SQL Server Query Optimizer’. ‘Parsing’ is one of the phases in query optimization prior to query execution.

For example, when were run below query, it returns two values (of element0 and of element1) and also displays the execution plan (in text format) used by SQL query optimizer:

 USE tempdb 




-- drop the table if it already exists else create it. 




IF OBJECT_ID('XmlData') IS NOT NULL 




      DROP TABLE XmlData 




CREATE TABLE XmlData(Data xml) 




Go 




-- insert dummy xml data. 




INSERT INTO XmlData (Data) 




SELECT CAST(N' 




<root> 




  <element0> dummy data 0</element0> 




  <element1> dummy data 1</element1> 




</root> 




' AS xml) 




GO 




SET STATISTICS PROFILE ON     --this to display the execution plan(text) 




GO 




SELECT nref.value('(/root/element0)[1]', 'nvarchar(max)') element0 




      , nref.value('(/root/element1)[1]', 'nvarchar(max)') element1 




FROM   XmlData 




      CROSS APPLY Data.nodes('/root') as T2(nref) --‘Data’ is column of XML data type in table XmlData 




GO 




SET STATISTICS PROFILE OFF 




GO 




DROP TABLE XmlData

The above query executed using SQL Server Management Studio will also display the execution plan which includes the below two Stream Aggregate / UDX nodes. The UDX (Extended Operators) nodes indicate implementation of XQuery and XPath operations in SQL Server:

  --Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
    |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
    | |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
    | | |--Table Scan(OBJECT:([tempdb].[dbo].[XmlData]))
    | | |--Filter(WHERE:(STARTUP EXPR([tempdb].[dbo].[XmlData].[Data] IS NOT NULL)))
    | | |--Table-valued function
    | |--Stream Aggregate(DEFINE:([Expr1021]=MIN(CASE WHEN [tempdb].[dbo].[XmlData].[Data]
    | |--UDX((XML Reader.[id], XML Reader.[nid], XML Reader.[tid], XML Reader.[value],
    | |--Nested Loops(Inner Join, OUTER REFERENCES:(XML Reader with XPath filter.
    | |--Top(TOP EXPRESSION:((1)))
    | | |--Compute Scalar(DEFINE:([Expr1012]=0x58, [Expr1038]=getdescenda
    | | |--Table-valued function
| |--Table-valued function
    |--Stream Aggregate(DEFINE:([Expr1036]=MIN(CASE WHEN [tempdb].[dbo].[XmlData].[Data] IS NU
         |--UDX((XML Reader.[id], XML Reader.[nid], XML Reader.[tid], XML Reader.[value], XML
              |--Nested Loops(Inner Join, OUTER REFERENCES:(XML Reader with XPath filter.[id],
                   |--Top(TOP EXPRESSION:((1)))
                   | |--Compute Scalar(DEFINE:([Expr1027]=0x58, [Expr1039]=getdescendantlim
                   | |--Table-valued function
                   |--Table-valued function

However the above execution plan is for query that’s only retrieving two values from XML data, namely element0 and element1 in our example.

Imagine a query that requires to retrieve 1800 values from XML data! This was our requirement. For retrieving such a large number of values, SQL Server would create one UDX node for each of the 1800 values and so we could potentially see around 1800 nodes. However, when SQL Server query optimizer creates these nodes, it uses a section of memory called the stack. Stack is memory space reserved for each thread that’s executed in Windows and this is a limited amount of space which can reach its limit for 1800 element query due to multiple nodes mentioned above and thus give error 8621. More details on Stack at https://msdn.microsoft.com/en-us/library/ms686774(VS.85).aspx .

This error 8621 means that the SQL Query Processor does not have enough space in the stack to accommodate a new row/frame for a new operator during processing. The issue happens because each call to .value() method for the element at the first hierarchical level of the XML document causes the stack size to grow during the query optimization until it runs out of stack space. This is the reason why when retrieving lesser number of values, the issue doesn’t occur:

User Mode default Stack Size Limitations in Windows

Architecture

OS thread stack size

SQL thread stack size

32bit

1 MB

0.5 MB

X64

2 MB

IA64

4 MB

A manual breakpoint memory dump would show that the XML Reader with XPath filter is actually making a remote query to the sqlxml DLL to parse and get the required data similar to a FTS query, and this query get repeated over and over again for each .value call in the TSQL. So, more the number of nodes, more the repetition of the above function code which will generate the following in the plan:

Resolution/Workaround

In this case, we were already on 64bit and the best/only way around is normally to rewrite the query (remember we can’t change source XML in our scenario). Re-writing query for XML data required using a workaround. We first retrieved the element values in an unpivoted form and then PIVOTed the data to give required output. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output. Please refer below for more details. Please feel free to test and run below query:

You can run below to get the 8621 error:

 /************ BELOW TO GET THE 8621 ERROR  ***************/ 




--<1> Generate large XML of single-level 




--this is an untyped XML since it is not associated with an XSD schema 




SET NOCOUNT ON 




DECLARE @GenerateXML nvarchar (max) 




DECLARE @ExecStr nvarchar(max) 




DECLARE @i int 




SET @GenerateXML = '<root>' 




SET @ExecStr = 'Select' 




SET @i = 0 




--create 2000 elements 




WHILE @i < 2000 




BEGIN 




      SET @GenerateXML = @GenerateXML + '<element' + CAST(@i AS nvarchar(4)) + '> dummy data ' + CAST(@i AS nvarchar(4)) + '</element' + CAST(@i AS nvarchar(4))+'>' 




      IF @i <> 0  --not 1st iteration 




            SET @ExecStr=@ExecStr+ ',' 




      SET @ExecStr = @ExecStr+ ' Data.value(' + CHAR(39) + 'element' + CAST(@i AS nvarchar(4)) + '[1]' + CHAR(39)+ ', ' + CHAR(39)+ 'nvarchar(max)' + CHAR(39)+ ') element' + CAST(@i AS nvarchar(4)) 




      SET @i = @i + 1 




END 




SET @GenerateXML = @GenerateXML + '</root>' 





 



--<2> Insert XML data into table 




USE tempdb 




IF OBJECT_ID('XmlData') IS NOT NULL 




      DROP TABLE XmlData 




CREATE TABLE XmlData(Data xml) 




INSERT INTO XmlData (Data) 




SELECT CAST(@GenerateXML AS xml) 





 



--SELECT * FROM XmlData 





 



--<3> Generate and XML query 




SET @ExecStr = @ExecStr + ' FROM XmlData' 




--SELECT @ExecStr 




EXEC sp_executesql @ExecStr 




DROP TABLE XmlData

--returned below error in around 35 seconds on my desktop

Msg 8621, Level 17, State 1, Line 1
The query processor ran out of stack space during query optimization. Please simplify the query.

Please run below workaround script to avoid the 8621 error:

 /************ BELOW IS THE RE-WRITTEN QUERY     ***************/ 




--restarted SQL Server service to ensure clean repro (tempdb initial size is 8MB default,pre-sizing may help) 




--<1> Generate large XML of single-level 




--this is an untyped XML since it is not associated with an XSD schema 




SET NOCOUNT ON 




DECLARE @GenerateXML nvarchar (max) 




DECLARE @ExecStr nvarchar(max) 




DECLARE @i int 




SET @GenerateXML = '<root>' 




SET @ExecStr = 'Select' 




SET @i = 0 




--create 2000 elements 




WHILE @i < 2000 




BEGIN 




      SET @GenerateXML = @GenerateXML + '<element' + CAST(@i AS nvarchar(4)) + '> dummy data ' + CAST(@i AS nvarchar(4)) + '</element' + CAST(@i AS nvarchar(4))+'>' 




      IF @i <> 0  --not 1st iteration 




            SET @ExecStr=@ExecStr+ ',' 




      SET @ExecStr = @ExecStr+ ' Data.value(' + CHAR(39) + 'element' + CAST(@i AS nvarchar(4)) + '[1]' + CHAR(39)+ ', ' + CHAR(39)+ 'nvarchar(max)' + CHAR(39)+ ') element' + CAST(@i AS nvarchar(4)) 




      SET @i = @i + 1 




END 




SET @GenerateXML = @GenerateXML + '</root>' 





 



--<2> Insert XML data into table 




USE tempdb 




IF OBJECT_ID('XmlData') IS NOT NULL 




      DROP TABLE XmlData 




CREATE TABLE XmlData(Data xml, DataID bigint IDENTITY(1, 1))      --*****add a unique column 




INSERT INTO XmlData (Data) 




SELECT CAST(@GenerateXML AS xml) 





 



--SELECT * FROM XmlData 





 



--<3> Generate and XML query 





 



--update statistics of source table if any 




UPDATE STATISTICS XmlData WITH FULLSCAN 





 



--above statement completes in 0 seconds 





 



--all below took around 1minute 13 seconds on my Core 2 Duo 3.00GHz x64 with 4 GB RAM) 





 



/**** Query table XmlData for each column and save result in keep_results     ****/ 




--we're saving the results so we can create an INDEX so as to optimize the PIVOT 





 



IF OBJECT_ID('keep_results') IS NOT NULL 




      DROP TABLE keep_results 





 



CREATE TABLE [dbo].[keep_results]( 




      [target_column_name] [nvarchar](250) NULL, 




      [value] [nvarchar](MAX) NULL, --this is data type of all columns in given query 




      [dataid] bigint NOT NULL      --this column so as to have an unique identifier for each row. This required for the PIVOT. 




) ON [PRIMARY]; 





 



--below is the workaround query however returns results which need to be PIVOTed through query further below 




WITH Data_CTE (name, value, dataid) --if using an unique column with existing data in original table 




AS 




( 




SELECT      nref.value('local-name(.)', 'nvarchar(max)') target_column_name 




      , nref.value('(./text())[1]', 'nvarchar(max)') value 




      , dataid    --can use existing column if its unique 




      FROM  XmlData CROSS APPLY Data.nodes('root/*')  AS R(nref)  --'Data' is name of column with XML data type in table XmlData 




      --PLEASE NOTE THAT "root" mentioned above per the original XML data is case-sensitive so "root" is not equal to "Root" 




      WHERE nref.value('(./text())[1]', 'nvarchar(max)') IS NOT NULL    --filtering NULL values if any since it drastically reduces time required for Index by PIVOT later since most of PIVOT cost is on Index Scan which is proportional to number of rows in this scenario 




) 




INSERT INTO tempdb.dbo.[keep_results] SELECT * FROM Data_CTE 





 



--optimize this table for PIVOT 




CREATE CLUSTERED INDEX idx_keep_results_DataID ON keep_results(DataID) 




CREATE NONCLUSTERED INDEX idx_keep_results_target_column_name ON keep_results(target_column_name) 




--above two queries complete in 2minutes 





 



/**** Use PIVOT to return the results     ****/ 





 



DECLARE @Xml xml 




SET @Xml = (SELECT TOP 1 Data FROM XmlData) 





 



DECLARE @cols NVARCHAR(MAX) 




SELECT  @cols = STUFF(( SELECT TOP 100 PERCENT 




                                '],[' + t2.target_column_name 




                        FROM   




                        ( 




                                          --below dynamically generates the columns for the PIVOT however assumes that the 1st XML has all required elements 




                                          SELECT      nref.value('local-name(.)', 'nvarchar(max)') target_column_name 




                                                FROM  @Xml.nodes('root/*')  AS R(nref) 




                        ) 




                         AS t2 




                        ORDER BY '],[' + t2.target_column_name 




                        FOR XML PATH('') 




                      ), 1, 2, '') + ']' 




--SELECT @cols 




--SELECT LEN(@cols) 





 



DECLARE @query NVARCHAR(MAX) 




SET @query = 




      N'SELECT * 




            FROM  




            (  




            SELECT dataid, target_column_name, value  




                  FROM tempdb.dbo.[keep_results] 




            ) AS source_data 




            PIVOT  




            ( 




            MIN(value) 




                  FOR target_column_name  




                  IN (' + @cols + ') 




            ) AS pvt' 




--SELECT @query 




EXECUTE(@query) 




DROP TABLE keep_results 




DROP TABLE XmlData

/************ please note ***************/

I’ve tested above on SQL Server 2005 Express and on SQL Server 2008 R2. If you get any errors, please retry above after running below:

 USE tempdb 




DROP TABLE keep_results 




DROP TABLE XmlData

Additional information unrelated to this issue

From https://msdn.microsoft.com/en-us/library/ms345117(SQL.90).aspx
Up to 128 levels of the XML hierarchy are accommodated; XML instances containing longer paths are rejected during insertion and modification.
Similarly, up to the first 128 bytes of a node's value are indexed; longer values are accommodated within the system and are not indexed.
But the above doesn’t apply directly to above example because everything in that XML is at level 1.

Happy Learning!

Regards,
Vijay Rodrigues
SE, SQL Server Support

Reviewed by
Balmukund Lakhani
TL, SQL Server Support

Shamik Ghosh
TL, SQL Server Support

Amit Banerjee
SEE, SQL Server Support

INF: How to retrieve data from a large XML string

Additional resources