XML

#SQLServer – Where does my index live?

Today, I got asked by one of my DBA’s about a recently deployed database that seemed to have a lot of filegroups with only a few tables.  He wanted to verify that one of the tables was correctly partition-aligned, as well as learn where all of the indexes for these tables were stored.  After a quick search of the Internets, I was able to fashion the following script to help.  The script below will find every index on every user table in a database, and then determine if it’s partitioned or not.  If it’s partitioned, the scheme name is returned; if not, the filegroup name.  The final column provides an XML list of filegroups (because schemes can span multiple filegroups) and file locations (because filegroups can span multiple files).

 WITH C AS ( SELECT ps.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
JOIN sys.destination_data_spaces dds ON dds.data_space_id = f.data_space_id
JOIN sys.partition_schemes ps ON ps.data_space_id = dds.partition_scheme_id
UNION
SELECT f.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
)
SELECT [ObjectName] = OBJECT_NAME(i.[object_id])
, [IndexID] = i.[index_id]
, [IndexName] = i.[name]
, [IndexType] = i.[type_desc]
, [Partitioned] = CASE WHEN ps.data_space_id IS NULL THEN 'No'
ELSE 'Yes'
END
, [StorageName] = ISNULL(ps.name, f.name)
, [FileGroupPaths] = CAST(( SELECT name AS "FileGroup"
, physical_name AS "DatabaseFile"
FROM C
WHERE i.data_space_id = c.data_space_id
FOR
XML PATH('')
) AS XML)
FROM [sys].[indexes] i
LEFT JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
LEFT JOIN sys.filegroups f ON f.data_space_id = i.data_space_id
WHERE OBJECTPROPERTY(i.[object_id], 'IsUserTable') = 1
ORDER BY [ObjectName], [IndexName] 

SQL Server XQuery: MORE deleting nodes using .modify()

So after my last post, my developer friend came back to me and noted that I hadn’t really demonstrated the situation we had discussed; our work was a little more challenging than the sample script I had provided.  In contrast to what I previously posted, the challenge was to delete nodes where a sub-node contained an attribute of interest.  Let me repost the same sample code as an illustration:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

If I wanted to delete the class nodes which contain a student node with a name of “Miller”, there are a couple of ways to do it; the first method involves two passes:

SET @X.modify('delete /root/class//.[@name = "Miller"]/../*')
SET @X.modify('delete /root/class[not (node())]')
SELECT @x

In this case, we walk the axis and find a node test of class (/root/class); we then apply a predicate to look for an attribute of name with a value of Miller ([@name=”Miller”]) in any node below the node of class (//.).  We then walk back up a node (/..), and delete all subnodes (/*).

That leaves us with an XML document that has three nodes for class, one of which is empty (the first one).  We then have to do a second pass through the XML document to delete any class node that does not have nodes below it (/root/class[not (node())]).

The second method accomplishes the same thing in a single pass:

SET @x.modify('delete /root/class[student/@name="Miller"]')
SELECT @x

In this case, walk the axis to class (/root/class), and then apply a predicate that looks for a node of student with an attribute of name with a value of Miller ([student/@name=”Miller”); the difference in this syntax is that the pointer for the context of the delete statement is left at the specific class as opposed to stepping down a node, and then back up.

SQL Server XQuery: deleting nodes using .modify()

Quick blog post; got asked today by a developer friend of mine about how to delete nodes in an XML fragment using the .modify() method.  After some head-scratching and some fumbling around (its been a few months since I’ve done any work with XML), we came up with a version of the following script:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

--delete the classes that belong to teacher Smith
SET @X.modify('delete /root/class/.[@teacher="Smith"]')
SELECT @X 

Now, let me try to explain it:

  1. Given a simple document that has a root with classes, and students in each class, we want to delete all classes that are being taught by a teacher named “Smith”.
  2. First, we delete the nodes under those classes that belong to Smith
    1. Using XPath, we walk the axis and use a node test to restrict to /root/class/. (the current node under class).
    2. We then apply a predicate looking for a teacher attribute with a value of “Smith”
    3. The .modify() clause applies the delete command to the @X variable, and updates the XML

PASS 2013 Summit Evals are out!

And I didn’t do too bad; wish I had done better.  I said that when I was done, I felt like it was a “B” level presentation, and it was; I got a 4 out of 5 on my evals.  If I had been a less experienced speaker, I would be thrilled with that; as it stands, I’m a little bummed.  I know that it’s tough to get accepted to speak at Summit, and I feel bad that I didn’t hit this one out of the park.

However, it was a great experience; 73 people attended my session, which is a big audience for me.  I struggled with my demos throughout (I don’t even want to listen to the audio because I’m worried about how bad it was), and I should have worked on finding ways to better connect with my audience.  The feedback I got was really constructive:

Was a good intro, just would have liked to have seen some broader examples. For example converting XML into relational tables, not in detail but just at a high level.

Lots of demos geared towards people who have already written a lot of XQuery. This should have been a 201 session. A discussion on why you’d even use the XML datatype would have been useful. What problem does the XML datatype even solve for people?

I think I would have benefitted from a hard copy (gasp) of the XML data.  I would have been able to see the data and compared it to your on screen results

Way too fast, too ambitious for a 101 session

Well put together and paced. Very clear and coherent

Scale back expectations if it really is a 101 level session

So it sounds like I didn’t do the best job of making my abstract clear; people had different expectations than what I had for what a 100 level course was supposed to be.  I do agree that it was too much content, and if I present on the topic again, I’ll be sure to go back to splitting this up to focus on the basics of XPath, and save a discussion of FLWOR for later.  Also, I really should have used demos much more judiciously; I kept running code and trying to work the magnifier, when I should have just used slides for the basics, and then done a much more thorough demo.

So what did I learn?  Connect with the audience first and foremost.  If I could have kept them engaged and entertained, I may have covered less material, but may have inspired them to do more research on their own (which in the end, is the point of this whole exercise).

quick blog from #sqlpass #summit13

Been a busy couple of days; hell, the last few weeks have been just nuts. I’m pausing for a few seconds to type a quick post from sunny Charlotte, and just fill in a few thoughts.

First, I think my XQuery session went reasonably well; I got bit by the demo gods, and shouldn’t have tried to use the magnifier without more practice, but I had a lot of questions and a few nods of approval. Overall, I think it was a B effort, and am hopeful that I can improve it.

Second, it’s always exciting to be back at Summit; kind of like New Year’s Day. I make lots of resolutions about how I want to get involved with an active and dynamic community. Let’s see how many of them stick. Mostly, I like being around smart people, and it’s been quite exciting to talk to some of the smartest people I know. I’ve had some great technical conversations with lots of people, and its given me a lot of things to mull over in terms of where I want to go and grow.

Third, I also got sucked into a lot of conversations about the whole PASS election/membership issue. My post about Allen Kinsel’s campaign seem to have kicked off more of a firestorm than I realized. I’ve had lots of people ask me what my thoughts were on the issue, and really, it’s kind of simple: We’re database people, and we need a plan to fix a data problem. I don’t have that plan, but there are lots of people who do (see my second point above about smart people).

Fourth, keynotes\sessions are awesome. I’m learning a lot, and I hope others are as well.

More to come soon.

SQL Server XQuery: Functions (sql:variable() & sql:column())

 

Like most query languages, XQuery has several functions that can be used to manipulate and query data.  SQL Server’s implementation supports a limited subset of the XQuery specification, but there’s a lot of power in the functions provided.  I hope to cover some of those functions in more detail at a later date, but for now I’d like to focus on a couple of very specific functions (sql:variable() & sql:column()).  These are proprietary extensions to XQuery, and they (like the xml methods I previously discussed) provide a bridge between the SQL engine and the XQuery engine.

For example, if you wanted to find the value of the third node of a simple XML document in SQL Server, you could do the following:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'
SELECT @x.value('(//alpha)[3]', 'varchar(1)')

The .value() method would return the letter “c” in the form of a varchar to the SQL engine.  However, if you wanted to do this dynamically, and specify which node to return based on a parameter, you would use the sql:variable() function, like so:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'

DECLARE @node INT = 3
SELECT @x.value('(//alpha)[sql:variable("@node")][1]', 'varchar(1)')

The sql:variable() function uses a string literal (a value surrounded by double quotes) to reference a SQL parameter (in this case, @node) and concatenates it to the XQuery string.  The above query is seen as:

(//alpha)[3][1]

by the XQuery engine.  In English, we are looking for the 3rd node named alpha.  You may wonder about the extra positional reference (“[1]”) ; the .value() method requires that a positional reference be explicitly defined.  In this situation, we are telling the XQuery engine to return the first instance of the third node of the alpha node.  Seems a bit clunky, but it works.  Looking at the execution plan, we can see that this is a relatively complex process, with multiple calls between the two sides of the query processor:

image

The sql:column() function is similar, but is used to refer to a column instead of a parameter; this allows for the dynamic querying of an XML column on a row by row basis.  For example:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 2, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 3, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' )

SELECT  ID, v=x.value('(//alpha)[sql:column("ID")][1]', 'varchar(1)')
FROM    @T

The above query will return a dataset like so:

image

Summary

SQL Server provides two functions for sharing information from the SQL engine to the XQuery engine: sql:variable() & sql:column().  The nature of these functions is pretty straight-forward; you pass the value of either a parameter or a column to an XML method, and it builds an XQuery string using the values of these functions. 

SQL Server XQuery: .modify() method

Continuing with XQuery, today’s post addresses the .modify() method against the xml data type; this method is used to update XML documents by applying one of three XML DML statements (insert, delete, or replace) via the use of a SQL SET statement.  Again, the idea is hand off the work between the XML engine and the SQL engine.  For example, given the following example:

DECLARE @X XML
SET @x = '<root><id>1</id></root>'

SET  @x.modify('insert <splat>hello world </splat>
                into /root[1]')

SELECT @x

the output of this SQL is the following XML:

<root>
  <id>1</id>
  <splat>hello world </splat>
</root>

So what happened?  We took a very simple XML document, and passed XML DML into the .modify() method which stated that we should:

  1. insert a new node “<splat>hello world </splat>” into the XML document
  2. into the first instance of the root node (/root[1]).

That’s it; welcome to the world of XML DML (Data Manipulation Language).  The syntax pattern for insert is pretty simple; action (insert) followed by an XML expression (<splat>hello world </splat> followed by the location of a second XML expression (into /root[1])With some minor variation in pattern, these elements hold true for the other two XML DDL statements (delete; replace value of):

action XML expression (1) XML expression (2) Sample
insert node {as first | as last}

into | after | before

XQuery

SET @x.modify(‘insert <splat>hello world!</splat>

                before (/root/id)[1]’)

delete node **not used** SET @x.modify(‘delete /root/splat[1]’)
replace value of node/value with value SET @x.modify(‘replace value of (/root/id/text())[1] with "2"’)

 

Of course, the variation in the pattern is what makes this set of commands a challenge to keep straight, so I’ll try to briefly explain the patterns in order of increasing complexity:

  • the delete pattern requires that a node singleton be identified for deletion.  The XML expression must evaluate to a node in the XML document that is being updated.
  • the insert pattern identifies a node to be added either into, before, or after a specific location path.  If an instance of that node already exists, you should specify as first or as last in order to place the new node in the correct sequence.
  • the replace value of requires two XML expressions; the first must evaluate to an existing value (as either an attribute or a text() value of an element).  The second must evaluate to a specific value; a later blog post will show how to use XQuery to dynamically set that value.

One more thing; these examples all use a single XML variable, and issue the SET command against it.  If you’re dealing with a table of XML documents, you can use the SQL UPDATE command to SET the XML column using this XQuery method and XML DML like so:

DECLARE @T TABLE (x XML)
INSERT INTO @T
VALUES ('<id>1</id>'), ('<id>2</id>')

UPDATE @T
SET x.modify('insert <new>sample</new>
                into /id[1]')

SELECT *
FROM @T

will return a result set like so:

image

Summary

To sum up:

  1. the .modify() method may be used either against an XML variable or an XML column in a table by using the SET statement (alone or in an UPDATE command, respectively), and
  2. there are three basic patterns for XML DML that are passed to the .modify() method, and
  3. the syntax can be confusing but allows for a great deal of fine-grained manipulation of the XML document when needed.

SQL Server XQuery: .nodes() method

Revisiting my XQuery discussion that I abandoned so long ago; with PASS Summit 2013 quickly approaching, I need to get back on the stick, and make some headway in this series. 

The .nodes() method is different than the other xml methods in SQL Server in that it:

  1. returns tabular data, and so must be used in the FROM clause as a table-valued function, and
  2. requires that  the .query() ,.value(), or .exist() method be used in the SELECT clause of a SQL statement.

I don’t want to spend a lot of time on explaining a table-valued function, so if you’re unfamiliar with them, I did a short writeup on them for my Stuff in the From Clause series.

The basic concept behind this method is that it splits an XML document into smaller documents according to the node specified in the method; if you have multiple instances of a node, an XML document is returned for each node.  That XML document must then be queried using one of the other XML methods.  For example, executing the following:

DECLARE @x XML
SET @x = 
'<root>
  <plant>
    <fruit>tomato</fruit>
    <fruit>banana</fruit>
    <vegetable>tomato</vegetable>
  </plant>
</root>' 

SELECT t.c.query('.') 
FROM @x.nodes('//fruit') AS T(c)

will return a data set that looks like this:

image

How does this work?  The xQuery statement (“//fruit”) provided to the .nodes() method found two nodes; the method returned to the SELECT statement two fragments containing the nodes from the original XML document (one for each instance of fruit).  The query method then looks at the root of those two new XML fragments, and returned them as an xml datatype, one on each row.  A lot of work got done for a very simple concept; taking a look at the execution plan, you can see how much work got done:

image

Notice the multiple calls to XML readers; using our bridge analogy, the SQL query first asked the XML engine to find the nodes that matched the xQuery.  The XML engine replied saying “I have 2 instances”; the SQL query then asked the XML engine to split them into rows, and the XML engine did so, returning those rows to the SQL engine.  The SQL engine then asked the XML engine to query those two documents, and the XML engine did so and returned the results.  A lot of back and forth between the engines.

The syntax changes slightly when you’re dealing with a table instead of a scalar value; in this case you must use the CROSS APPLY functionality in order to parse the information.  For example:

DECLARE @T TABLE ( ID INT, x XML )


INSERT  INTO @T
        ( ID, x )
VALUES    ( 1, '<root><plant><fruit>tomato</fruit><fruit>banana</fruit><vegetable>tomato</vegetable></plant></root>' )

SELECT ID, c.value('.', 'varchar(max)')
FROM @T
    CROSS APPLY x.nodes('//fruit') T(c)  

will return a two-column dataset:

image

and the execution plan looks similar in regards to the handoff between the engines; the only addition is the scan of our table variable:

image

Summary:

  1. The .nodes() method is an easy way to shred XML documents into rows based on an xQuery, and
  2. it’s expensive, since it requires the use of another XML method to return data.

SQL Server XQuery: .value() method

Back to XQuery; the third method I wanted to discuss in this series is the .value() method. Like the .query() method, .value() bridges the SQL Server database engines and XQuery engines by extracting information from an XML document.  The difference is that while .query() returns XML, .value() translates the specific information requested into a scalar SQL value.  For example:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<root><plant><fruit>tomato</fruit></plant></root>' )
,       ( 2, '<root><plant><fruit>banana</fruit></plant></root>' )
,       ( 3, '<root><plant><vegetable>tomato</vegetable></plant></root>' )

SELECT ID, x.value('(.)[1]', 'varchar(max)')
FROM @T t

Note that the .value() method has two parameters, an XQuery statement and a string representing the expected SQL Server type for the scalar value returned.  The results are as follows:

image

So, what is the XQuery statement requesting?  The axis is referring to the self-node of the XML fragment (and remember that in SQL Server’s implementation of XQuery, the query begins at the top of the document), there is no node test (so any node will satisfy the test), and the predicate is referring to the first position [1] of the path.  The .value() function returns the value found at that location, and converts it to a sql value of type varchar(max).  So what happens if there’s more than one value at a given XPath?

INSERT  INTO @T
        ( ID, x )
VALUES    ( 4, '<root><plant><fruit>tomato</fruit><fruit>banana</fruit><vegetable>tomato</vegetable></plant></root>' )


SELECT ID, x.value('(.)[1]', 'varchar(max)')
FROM @T t

The above script inserts a slightly more complex XML document than used before; there are now multiple nodes under the <plant> node (in addition to the three rows we already have); the output looks like:

image 

Note how row 4 has a single string that represents all of the values of the XML document.  Without a specific XQuery, the value() function returns the values that satisfy the request; in this case, all of them.  In the case where the XQuery cannot be satisfied, .value() will return a NULL value to the SQL Server engine, as illustrated by the following code examples:

SELECT ID, x.value('(//plant/fruit)[1]', 'varchar(max)')
FROM @T t 

SELECT ID, x.value('(//plant/fruit)[2]', 'varchar(max)')
FROM @T t 

The output should look like:

image

In the first query, we are looking for an axis of any node in the document, a node test of fruit under the <plant> node, and a positional predicate of 1.  Rows 1,2, and 4 satisfy this query; the XML document in row 3 doesn’t have a <fruit node>, and thus returns a NULL.  The second query is similar in that several of the XML documents in our table don’t satisfy the query; only the document in row 4 has node in the second [2] position of the <fruit> node.

Summary

Key takeaways:

  1. The .value() function allows you to specify an XQuery statement against an XML datatype looking for a specific value, and
  2. that returned value is cast to a SQL datatype.

Next up: .nodes()

SQL Server XQuery: .query() method

Continuing in my discussion of XQuery in SQL Server, let me discuss another XML data type method: .query().  The .query() method is a great illustration of the bridging concept I discussed earlier.  Simply put, this method allows you to run an XQuery against the XML data type, and return the results of that query as an XML document to SQL Server.  Using our example from the .exists() post:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<root><plant><fruit>tomato</fruit></plant></root>' )
,       ( 2, '<root><plant><fruit>banana</fruit></plant></root>' )
,       ( 3, '<root><plant><vegetable>tomato</vegetable></plant></root>' )

SELECT x.query('.') 
FROM @T 

Note that the parameter supplied to the method is an XQuery statement consisting of a single axis (.), which refers to the current node.  Since SQL Server’s implementation of XQuery starts at the top and works it’s way down the document, the current node is the entire document.  Running that query will give you the following output:

image

which isn’t very exciting.  However, if we begin to build out a XPath statement with a node test and a predicate, we can begin to narrow down the results quickly:

SELECT ID, x.query('.//fruit[text()="banana"]')
FROM @T t

And here’s the results:

image

While the results may be simple, there’s a lot going on behind the scenes, and it may have returned something different than what you expected.  Remember that when you execute an XQuery method against an XML column, there’s actually two queries happening:

For the XQuery portion, the XPath axis, node test, and predicate specified that we were looking for an XML node that was a fruit and had a value of a banana; this only existed in the XML in row 2, so the node was returned for row 2, but row 1 and row 3 were empty.  The SQL query then ran, and returned all rows in the table because there was no WHERE clause filtering the results.  If we only wanted to return the results where the XQuery returned a node, we could move the .query() method to the where clause.

One thing to always remember is that regardless of how specific your XQuery is, the type of the data being returned will always be XML.  For example, if you specify the text() function as part of the axis, and not the predicate:

SELECT ID, x.query('.//text()')
FROM @T t 

You’ll get what looks like a table with a column of character values representing the values of the various nodes:

image

However, the visual representation of the result set uses hyperlinks to indicate that it’s an XML column; if you click on the value (banana), the XML explorer will open up, showing you that it is an XML fragment, albeit one that has no tags.  If you need to return the data as the actual datatype, you’ll either need to use a SQL method (like CAST or CONVERT), or the .value() method (to be discussed soon).

Summary

Wrapping up with a few key points:

  1. The query() method allows you to specify an XQuery statement against an XML document, and
  2. It always returns an XML document.