Sub-tree pruning


While
preparing my talk for Teched, I figured out a trick for getting simple sub-tree pruning
when running XPath queries via SqlXml.  Sub-tree
pruning is the ability to filter sub-trees of an Xml hierarchy.  For
example, let’s say I had an EmployeeList element
with Employee sub
elements. I would like to return the EmployeeList element,
but only include Employee sub-elements
where the employee had greater then 10 years of experience.  "urn:schemas-microsoft-com:office:office" />


 

My first
attempt would be the following:


 

/EmployeeList/Employee[Experience>10]


 

But
that just returns Employee elements,
not the EmployeeList.  Next
I would try:


 

/EmployeeList[Employee[Experience>10]]


 

But
that actually returns all Employees from
the database since the XPath literally means return
all EmployeeList elements where there exists an Employee sub-element with an Experience
sub element where the text value is greater then ten
.


 

In fact,
you might not even be able to get that far with the current SqlXml support since most
likely the EmployeeList element
is marked as is-constant in the mapping schema.  In
that case, the “XPath: predicate on is-constant element (/EmployeeList) is not supported”,  Note,
that limitation could be removed from SqlXml, but doing so would really not solve
the initial problem.


 

So what
is the alternative?  Well, one could post
process the results with an XSLT, however that would mean bringing all the data down
from the server and would complicate the code.  However,
for simple cases where the containing element is-constant, the solution is to make
the sub-element (Employee) an top level element in the mapping schema and then query
the Employee elements directly.  Then,
via the root property, wrap the query results with an EmployeeList element.  E.g.


 

SqlXmlCommand
command = new SqlXmlCommand(_SqlXmlconnectionString);

command.CommandType
= SqlXmlCommandType.XPath;

command.SchemaPath
= “Employee.xsd”;

command.RootTag
= “EmployeeList”;

command.CommandText
= “/Employee[Experience>10]”;

XmlReader
reader = command.ExecuteXmlReader());


 

It
general, this solution is very limited since it only works when applying the predicate
one level under the root, plus the root needs to be a sqlxml constant element.  However,
at least in my experiences this pattern comes up a lot, and will be performant since
SqlXml applies the root while streaming out the results.  I
suppose it would be the not hard to write an XSLT to do the same thing, but in general
I prefer to only introduce the complexity of XSLT into my design when it solves and
significant problem.  In this case, the
“root hack” is adequate enough.  You can
also use the sql:hide annotation for cases when one wants to prune the entire sub-tree
but use content of the sub-tree in the original query.  See sql:hide
 for
more information.


 

In
general though, it seems that a more general solution is required.  Really,
I would prefer to exclude arbitrary parts of the query results without having to post
process with an XSLT and not have the data from then pruned sub-trees filtered out
on the server.   XQuery
will
achieve this, but the XQuery will be just as complicated as the XSLT.  So
is there a better solution?  Stay tuned.

 


Comments (8)

  1. Anonymous says:

    The solution should be the one that works in XPath, right?
    /EmployeeList[Employee/Experience>10]

  2. Anonymous says:

    The problem you describe is the reason SQLX is needed so that queries can directly return hierarchical results.
    I’m not sure when Microsoft plans to support SQLX, but it Oracle, I would write the query similar to the following and return the result set directly as a CLOB with your desired result set:

    SELECT
    xmlelement ("EmployeeList"
    , XMLAGG (
    xmlelement ("EmployeeList", xmlattributes (t1.EmpID)
    , xmlelement ("Employee", xmlattributes (t1.EmpID, t1.FName, t1.LName, t1.Experience))
    )
    )).getclobval () as result
    FROM
    EmployeeList t1
    , Employee t2
    WHERE
    t2.Experience > 10
    AND t2.EmpID = t1.EmpID

  3. Anonymous says:

    Please ignore my above post.

  4. Anonymous says:

    Hi Jonathan,

    SQL Server does support building hierarchy from SQL, and of course it would have been easy to do the above using SQL Server "FOR XML". However, the point of Andrew’s example was to do the subtree pruning on an XPath query — not a SQL query. It’s really a general issue with XPath, not a specific issue with SQL Server’s implementation.

  5. Anonymous says:

    Hi Josh,

    I must be missing something here. I looked at the documentation of "FOR XML" and only saw an example of returning a single rowset. The SQLX query that I suggested would return a hierarchical result with nested attributes and elements. How would I accomplish this with "FOR XML" without SQLX operator support?

    I understand that the example is designed to illustrate an XPATH technique, but his example also relies on using the SqlXml driver which may be doing something similar to SQLX. The issue is to have the ability to push the query down into the database rather than retrieving excess data and pruning the nested results in memory afterwards.

    To accomplish this performance for large databases, SQLXML would need to somehow compile the XPath expression into a native SQL queries first and then retag the result set back into XML. It would be more efficient to directly specify the SQL expression using the SQLX operator extensions so the database could take advantage of the SQLX operators to optimize the query.

    IBM has an approach for compiling nested XQueries and XQuery views and pushing them down into native database queries for DB2 that accomplishes this. You might want to look at the approach. You can download an implementation at (http://www.alphaworks.ibm.com/tech/xtable?Open&ca=daw-xmlp-01).

    There is also a similar research project at WPI called "Rainbow" (http://davis.wpi.edu/dsrg/rainbow/index.htm) that is database independant and will be released open source, but uses Oracle for the prototype. Their solution also supports updates although they haven’t released their implementation yet.

    Jonathan

  6. Anonymous says:

    This is a good example that demonstrates that XPATH is only useful for simple queries. XQuery might be complicated but it will at least do the job when you’re filtering grandchildren and it’s an improvement over XSLT. Thanks for taking the time to write this tip.