September 2013

SQL Server XQuery: .nodes() method

Revisiting my XQuery discussion that I abandoned so long ago; with PASS Summit 2013 quickly approaching, I need to get back on the stick, and make some headway in this series. 

The .nodes() method is different than the other xml methods in SQL Server in that it:

  1. returns tabular data, and so must be used in the FROM clause as a table-valued function, and
  2. requires that  the .query() ,.value(), or .exist() method be used in the SELECT clause of a SQL statement.

I don’t want to spend a lot of time on explaining a table-valued function, so if you’re unfamiliar with them, I did a short writeup on them for my Stuff in the From Clause series.

The basic concept behind this method is that it splits an XML document into smaller documents according to the node specified in the method; if you have multiple instances of a node, an XML document is returned for each node.  That XML document must then be queried using one of the other XML methods.  For example, executing the following:

DECLARE @x XML
SET @x = 
'<root>
  <plant>
    <fruit>tomato</fruit>
    <fruit>banana</fruit>
    <vegetable>tomato</vegetable>
  </plant>
</root>' 

SELECT t.c.query('.') 
FROM @x.nodes('//fruit') AS T(c)

will return a data set that looks like this:

image

How does this work?  The xQuery statement (“//fruit”) provided to the .nodes() method found two nodes; the method returned to the SELECT statement two fragments containing the nodes from the original XML document (one for each instance of fruit).  The query method then looks at the root of those two new XML fragments, and returned them as an xml datatype, one on each row.  A lot of work got done for a very simple concept; taking a look at the execution plan, you can see how much work got done:

image

Notice the multiple calls to XML readers; using our bridge analogy, the SQL query first asked the XML engine to find the nodes that matched the xQuery.  The XML engine replied saying “I have 2 instances”; the SQL query then asked the XML engine to split them into rows, and the XML engine did so, returning those rows to the SQL engine.  The SQL engine then asked the XML engine to query those two documents, and the XML engine did so and returned the results.  A lot of back and forth between the engines.

The syntax changes slightly when you’re dealing with a table instead of a scalar value; in this case you must use the CROSS APPLY functionality in order to parse the information.  For example:

DECLARE @T TABLE ( ID INT, x XML )


INSERT  INTO @T
        ( ID, x )
VALUES    ( 1, '<root><plant><fruit>tomato</fruit><fruit>banana</fruit><vegetable>tomato</vegetable></plant></root>' )

SELECT ID, c.value('.', 'varchar(max)')
FROM @T
    CROSS APPLY x.nodes('//fruit') T(c)  

will return a two-column dataset:

image

and the execution plan looks similar in regards to the handoff between the engines; the only addition is the scan of our table variable:

image

Summary:

  1. The .nodes() method is an easy way to shred XML documents into rows based on an xQuery, and
  2. it’s expensive, since it requires the use of another XML method to return data.

#passvotes Why I voted for Allen Kinsel–three times….

I’m a little late with this endorsement, in part because I debated whether or not that I should admit that I’m a serial voter.  In case you haven’t heard, PASS is holding it’s elections for the Board of Directors again; ballots went out on Wednesday, September 25 to “all members in good standing as of June 2013”.  Apparently, I’m a member thrice over, since I received three ballots at three different email addresses.

This has happened in every PASS election that I can remember, and I’ve always taken the moral high ground before, casting one vote and ignoring the other two; however, this time is different.  It’s different because Allen Kinsel’s on the ballot.

Lots of people have lots of good things to say about Allen, so I won’t rehash those; however, the thing that swayed me to vote for him (multiple times) was the following bullet point from his platform page:

    • Making an additional IT investment to bring PASS’s membership roster up-to-date. Updating the roster will allow PASS to have better knowledge of our worldwide membership and to provide members with better benefits.

Allen wants to update the database, and clean it up.  As a database professional, data integrity appeals to me; it’s a little sad that my professional organization has dirty data, and is relying on that dirty data to make key business decisions (like generating ballots).   So this time, I decided to leverage all of my electoral power and cast all of my votes for Allen because I believe he’s going to try and do something to prevent this from happening in the future.

I know; it seems wrong for me to cast three votes, but I figure I’m not the only one with multiple ballots.  Maybe some other members of the organization don’t have the same moral hesitation about “pulling the lever” at every opportunity.  Perhaps some devious prankster has registered 500 email addresses, and is slowly stealing the election away.  All I know is that this should change, and Allen’s the only candidate which has mentioned this as a priority.

Vote.

SQL In The City Atlanta, Oct 11, 2013

SITC13_Banner_300x250_SpeakerJust a quick note to say that I’m super excited to be presenting at the Red Gate SQL In the City event in Atlanta this year; I’ll be covering some of their development tools (many of which I use daily).  As far as I know, there are still some seats left, but I wouldn’t count on them being available too much longer.

It’s free, and you’ll get a chance to hear lots of great topics presented by Grant Fritchey, Steve Jones, and Tim Radney, as well as me. Click on the image for more information, and I hope to see you there.