SQL Server XQuery: .nodes() method

Revisiting my XQuery discussion that I abandoned so long ago; with PASS Summit 2013 quickly approaching, I need to get back on the stick, and make some headway in this series. 

The .nodes() method is different than the other xml methods in SQL Server in that it:

  1. returns tabular data, and so must be used in the FROM clause as a table-valued function, and
  2. requires that  the .query() ,.value(), or .exist() method be used in the SELECT clause of a SQL statement.

I don’t want to spend a lot of time on explaining a table-valued function, so if you’re unfamiliar with them, I did a short writeup on them for my Stuff in the From Clause series.

The basic concept behind this method is that it splits an XML document into smaller documents according to the node specified in the method; if you have multiple instances of a node, an XML document is returned for each node.  That XML document must then be queried using one of the other XML methods.  For example, executing the following:

DECLARE @x XML
SET @x = 
'<root>
  <plant>
    <fruit>tomato</fruit>
    <fruit>banana</fruit>
    <vegetable>tomato</vegetable>
  </plant>
</root>' 

SELECT t.c.query('.') 
FROM @x.nodes('//fruit') AS T(c)

will return a data set that looks like this:

image

How does this work?  The xQuery statement (“//fruit”) provided to the .nodes() method found two nodes; the method returned to the SELECT statement two fragments containing the nodes from the original XML document (one for each instance of fruit).  The query method then looks at the root of those two new XML fragments, and returned them as an xml datatype, one on each row.  A lot of work got done for a very simple concept; taking a look at the execution plan, you can see how much work got done:

image

Notice the multiple calls to XML readers; using our bridge analogy, the SQL query first asked the XML engine to find the nodes that matched the xQuery.  The XML engine replied saying “I have 2 instances”; the SQL query then asked the XML engine to split them into rows, and the XML engine did so, returning those rows to the SQL engine.  The SQL engine then asked the XML engine to query those two documents, and the XML engine did so and returned the results.  A lot of back and forth between the engines.

The syntax changes slightly when you’re dealing with a table instead of a scalar value; in this case you must use the CROSS APPLY functionality in order to parse the information.  For example:

DECLARE @T TABLE ( ID INT, x XML )


INSERT  INTO @T
        ( ID, x )
VALUES    ( 1, '<root><plant><fruit>tomato</fruit><fruit>banana</fruit><vegetable>tomato</vegetable></plant></root>' )

SELECT ID, c.value('.', 'varchar(max)')
FROM @T
    CROSS APPLY x.nodes('//fruit') T(c)  

will return a two-column dataset:

image

and the execution plan looks similar in regards to the handoff between the engines; the only addition is the scan of our table variable:

image

Summary:

  1. The .nodes() method is an easy way to shred XML documents into rows based on an xQuery, and
  2. it’s expensive, since it requires the use of another XML method to return data.
Share