Development

SQL Server XQuery: deleting nodes using .modify()

Quick blog post; got asked today by a developer friend of mine about how to delete nodes in an XML fragment using the .modify() method.  After some head-scratching and some fumbling around (its been a few months since I’ve done any work with XML), we came up with a version of the following script:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

--delete the classes that belong to teacher Smith
SET @X.modify('delete /root/class/.[@teacher="Smith"]')
SELECT @X 

Now, let me try to explain it:

  1. Given a simple document that has a root with classes, and students in each class, we want to delete all classes that are being taught by a teacher named “Smith”.
  2. First, we delete the nodes under those classes that belong to Smith
    1. Using XPath, we walk the axis and use a node test to restrict to /root/class/. (the current node under class).
    2. We then apply a predicate looking for a teacher attribute with a value of “Smith”
    3. The .modify() clause applies the delete command to the @X variable, and updates the XML

First few bites of the elephant: working with Hortonworks Hadoop

So a few weeks ago, I mentioned that I was starting to diversify my data interests in hopes of steering my career path a bit; I’ve built a home brewed server, and downloaded a copy of the Hortonworks Sandbox for Hadoop.  I’ve started working through a few tutorials, and thought I would share my experiences so far.

My setup….

I don’t have a lot of free cash to setup a super-duper learning environment, but I wanted to do something on-premise.  I know that Microsoft has HDInsight, the cloud-based version of Hortonworks, but I’m trying to understand the administrative side of Hadoop as well as the general interface.  I opted to upgrade my old fileserver to a newer rig; costs ran about $600 for the following:

ASUS|M5A97 R2.0 970 AM3+ Motherboard   
AMD|8-CORE FX-8350 4.0G 8M CPU   
8Gx4|GSKILL F3-1600C9Q-32GSR Memory   
DVD BURN SAMSUNG | SH-224DB/BEBE  DVD Burner

I already had a case, power supply, and a couple of SATA drives (sadly, my IDE’s no longer work; also the reason for purchasing a DVD burner).  I also had a licensed copy of Windows 7 64 bit, as well as a few development copies for Microsoft applications from a few years ago (oh, how I wish I was an MVP….).

As a sidebar, I will NEVER purchase computer equipment from Newegg again; their customer service was horrible.  A few pins were bent on the CPU, and it took nearly 30 days to get a replacement, and most of that time was spent with little or no notification.

I downloaded and installed the Hortonworks Sandbox using the VirtualBox version.  Of course, I had to reinstall after a few tutorials because I had skipped a few steps; after going back and following the instructions, everything is just peachy.  One of the nice benefits of the Virtualbox setup is that once I fire up the Hortonworks VM on my server, I can use a web browser on my laptop pointed to the server’s IP address with the appropriate port added (e.g., xxx.xxx.xxx.xxx:8888), and bam, I’m up and running.

Working my way through a few tutorials

First, I have to say, I really like the way the Sandbox is organized; it’s basically two frames: the tutorials on the left, and the actual interface into a working version of Hadoop on the right.  It makes it very easy to go through the steps of the tutorial.

image

The Sandbox has lots of links and video clips to help augment the experience, but it’s pretty easy to get up and running on Hadoop; after only a half-hour or so of clicking through the first couple of tutorials, I got some of the basics down for understanding what Hadoop is (and is not); below is a summary of my initial thoughts (WARNING: these may change as I learn more).

Summary:

  • Hadoop is comprised of several different data access components, all of which have their own history.  Unlike a tool like SQL Server Management Studio, the experience may vary depending on what tool you are using at a given time.  The tools include (but are not limited to):
    • Beeswax (Hive UI): Hive is a SQL-like language, and so the UI is probably the most familiar to those of us with RDBMS experience.  It’s a query editor.
    • Pig is a procedural language that abstracts the data manipulation away from MapReduce (the underlying engine of Hadoop).  Pig and Hive have some overlapping capabilities, but there are differences (many of which I’m still learning).
    • HCatalog is a relational abstraction of data across HDFS (Hadoop Distributed File System); think of it like the DDL of SQL.  It defines databases and tables from the files where your actual data is stored; Hive and Pig are like DML, interacting with the defined tables.
  • A single-node Hadoop cluster isn’t particularly interesting; the fun part will come later when I set up additional nodes.

The Evolution of the DBA

Recently, there’s been a couple of great posts about the Death of the Database Administrator, including a response by Steve Jones and a several reactions by the staff of SQL Server Pro; the central premise behind the supposed demise revolves around this one major thought:

 

The evil cloud has reduced the need for internal systems infrastructure, including database administration.  It’s a storm of needs for faster development (agility) and the rise of hosted services; who needs a database server, when you can rent space on Azure?   Please note that I’m not specifically anti-cloud, but I’m casting it as the villain when careers are on the line.

Furthermore, in shops where the cloud is banned (e.g., financial services),  developers are using tools like Entity Framework to write SQL for them. Tuning SQL thus becomes an application change as opposed to a stored procedure change; DBA’s who do performance tuning have to focus on index maintenance and hardware acquisition.  Code tuning is now part of the development domain, and the career of the pure SQL developer is gasping in comparison.   

Like all great controversial statements, there’s an element of truth; the cloud, agile approaches, and new technologies are reducing the need for traditional database administrators, but I think we’re a long way away from pulling the plug.  However, I will say that over the next decade, these trends will probably continue to grow, eating away at the availability of jobs that do strict database administration (and the SQL developer will probably expire altogether).  But not yet.

What this does mean is that if you are intending to be employed 10 years from now, and you’re a database administrator, you’ve got two choices to make today:

  1. Master a specialty.  If you’re planning on consulting for a living,  this is a great choice.  Get so intimate with the database product of your choice that you become the go-to person for problem-solving.  Companies that have large installations of SQL Server will need secondary support as the product becomes easier to maintain (and big problems get obfuscated by GUI’s).
  2. Expand your horizon.  Instead of focusing on super in-depth mastery of your database platform, broaden your perspective; if you’re a SQL Server guy like me, start learning a little bit about SSRS, SSAS, and SSIS (if you don’t already know it).  Spread out into Hadoop, and NoSQL; dabble in MySQL and SQLLite.  Understand what the cloud can do, and where it makes sense to use it.

So go deep or go broad, but go.  I wouldn’t start quaking in my boots just yet about the demise of your career, but change is coming; those who adapt, survive.

For me? I’m going broad.  I’ve built a home-brewed server, and downloaded a copy of the HortonWorks Hadoop Sandbox.  Stay tuned for my adventures with Hadoop.

SQLSaturday 285 (#sqlsatatl) pre-cons are now live!

SQL Saturday #285 is offering 3 preconference sessions on Friday, May 2 at the GSU campus in Alpharetta, site of SQL Saturday #285 on Saturday, May 3:
Kalen Delaney: What the Hekaton!? A Whole New Way to Think About Data Management

SQL Server Hekaton, Microsoft’s new In-Memory table technology being shipped as part of SQL Server 2014, will completely change the way you think about data management. As a DBA, you’ll need to analyze your memory and storage needs completely differently. All Hekaton data is always stored in memory, and the data stored on disk is basically just a REDO log used to regenerate the contents of your memory-optimized tables. In this full-day seminar, Kalen Delaney (a SQL Server MVP for over 20 years) will show you the in-memory architecture for your Hekaton data and indexes, and discuss what gets written to disk during checkpoints, as well as what gets logged. She will explain how the recovery process recreates your Hekaton tables. Finally, she’ll go into detail on just what it is that makes Hekaton so much FASTER!

 

Denny Cherry: SQL Performance Tuning & Optimization

In this session you will learn about SQL Server 2008 R2 and SQL Server 2012 performance tuning and optimization. Industry Expert Denny Cherry will guide you through tools and best practices for tuning queries and improving performance within Microsoft SQL Server.  This session will guide you through real life performance problems which have been gathered and tuned using industry standard best practices and real world skills.

 

 

Teo Lachev: Deep Dive into the Microsoft BI Semantic Model (BISM)

The chances are that your organization has a centralized data repository, such as ODS or a data warehouse, but you might not use it to the fullest. Join this insightful full-day event to understand the importance of having a semantic layer that bridges users and data. In the Microsoft BI world, BISM consists of Power Pivot, Tabular, and Multidimensional. 

All 3 presenters are published authors and Microsoft MVPs many times over. These sessions are a huge value to spend a day with an acknowledged SQL Server

thought leader.

Early registration is only $129 until March 1, when the price of any remaining seats will go up to $149. Follow @AtlantaMDF on Twitter and get a promo code for $10 off the early registration price (for Kalen Delaney or Denny Cherry)! We’ll tweet the promo code at 9am Thursday (Jan 23) – it’s only good for 10 uses (for each session), so be sure to check your Twitter feed tomorrow morning!

Back on the trail…. #sqlsatnash

I realize that I should probably be blogging about my New Year’s resolutions, but meh… I’ve been super busy surviving the holidays.  So busy in fact that I’ve failed to mention that I’ll be presenting at the SQLSaturday in Nashville on January 18, 2014.  I actually got selected to present TWO topics, which is HUGE for me.  Hoping that I can refine a presentation, and get ready for our own SQLSaturday in Atlanta.

Working with “Biggish Data”

Most database professionals know (from firsthand experience) that there continues to be a “data explosion”, and there’s been a lot of focus lately on “big data”. But what do you do when your data’s just kind of “biggish”? You’re managing Terabytes, not Petabytes, and you’re trying to squeeze out as much performance out of your aging servers as possible. The focus of this session is to identify some key guidelines for the design, management, and ongoing optimization of “larger-than-average” databases. Special attention will be paid to the following areas: * query design * logical and physical data structures * maintenance & backup strategies

Managing a Technical Team: Lessons Learned

I got promoted to management a year ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background; topics we’ll cover will include:*How to let go of your own solutions. *Why you aren’t the model you think you are, and *Why Venn diagrams are an effective tool for management.

PASS 2013 Summit Evals are out!

And I didn’t do too bad; wish I had done better.  I said that when I was done, I felt like it was a “B” level presentation, and it was; I got a 4 out of 5 on my evals.  If I had been a less experienced speaker, I would be thrilled with that; as it stands, I’m a little bummed.  I know that it’s tough to get accepted to speak at Summit, and I feel bad that I didn’t hit this one out of the park.

However, it was a great experience; 73 people attended my session, which is a big audience for me.  I struggled with my demos throughout (I don’t even want to listen to the audio because I’m worried about how bad it was), and I should have worked on finding ways to better connect with my audience.  The feedback I got was really constructive:

Was a good intro, just would have liked to have seen some broader examples. For example converting XML into relational tables, not in detail but just at a high level.

Lots of demos geared towards people who have already written a lot of XQuery. This should have been a 201 session. A discussion on why you’d even use the XML datatype would have been useful. What problem does the XML datatype even solve for people?

I think I would have benefitted from a hard copy (gasp) of the XML data.  I would have been able to see the data and compared it to your on screen results

Way too fast, too ambitious for a 101 session

Well put together and paced. Very clear and coherent

Scale back expectations if it really is a 101 level session

So it sounds like I didn’t do the best job of making my abstract clear; people had different expectations than what I had for what a 100 level course was supposed to be.  I do agree that it was too much content, and if I present on the topic again, I’ll be sure to go back to splitting this up to focus on the basics of XPath, and save a discussion of FLWOR for later.  Also, I really should have used demos much more judiciously; I kept running code and trying to work the magnifier, when I should have just used slides for the basics, and then done a much more thorough demo.

So what did I learn?  Connect with the audience first and foremost.  If I could have kept them engaged and entertained, I may have covered less material, but may have inspired them to do more research on their own (which in the end, is the point of this whole exercise).

quick blog from #sqlpass #summit13

Been a busy couple of days; hell, the last few weeks have been just nuts. I’m pausing for a few seconds to type a quick post from sunny Charlotte, and just fill in a few thoughts.

First, I think my XQuery session went reasonably well; I got bit by the demo gods, and shouldn’t have tried to use the magnifier without more practice, but I had a lot of questions and a few nods of approval. Overall, I think it was a B effort, and am hopeful that I can improve it.

Second, it’s always exciting to be back at Summit; kind of like New Year’s Day. I make lots of resolutions about how I want to get involved with an active and dynamic community. Let’s see how many of them stick. Mostly, I like being around smart people, and it’s been quite exciting to talk to some of the smartest people I know. I’ve had some great technical conversations with lots of people, and its given me a lot of things to mull over in terms of where I want to go and grow.

Third, I also got sucked into a lot of conversations about the whole PASS election/membership issue. My post about Allen Kinsel’s campaign seem to have kicked off more of a firestorm than I realized. I’ve had lots of people ask me what my thoughts were on the issue, and really, it’s kind of simple: We’re database people, and we need a plan to fix a data problem. I don’t have that plan, but there are lots of people who do (see my second point above about smart people).

Fourth, keynotes\sessions are awesome. I’m learning a lot, and I hope others are as well.

More to come soon.

SQL Server XQuery: Functions (sql:variable() & sql:column())

 

Like most query languages, XQuery has several functions that can be used to manipulate and query data.  SQL Server’s implementation supports a limited subset of the XQuery specification, but there’s a lot of power in the functions provided.  I hope to cover some of those functions in more detail at a later date, but for now I’d like to focus on a couple of very specific functions (sql:variable() & sql:column()).  These are proprietary extensions to XQuery, and they (like the xml methods I previously discussed) provide a bridge between the SQL engine and the XQuery engine.

For example, if you wanted to find the value of the third node of a simple XML document in SQL Server, you could do the following:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'
SELECT @x.value('(//alpha)[3]', 'varchar(1)')

The .value() method would return the letter “c” in the form of a varchar to the SQL engine.  However, if you wanted to do this dynamically, and specify which node to return based on a parameter, you would use the sql:variable() function, like so:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'

DECLARE @node INT = 3
SELECT @x.value('(//alpha)[sql:variable("@node")][1]', 'varchar(1)')

The sql:variable() function uses a string literal (a value surrounded by double quotes) to reference a SQL parameter (in this case, @node) and concatenates it to the XQuery string.  The above query is seen as:

(//alpha)[3][1]

by the XQuery engine.  In English, we are looking for the 3rd node named alpha.  You may wonder about the extra positional reference (“[1]”) ; the .value() method requires that a positional reference be explicitly defined.  In this situation, we are telling the XQuery engine to return the first instance of the third node of the alpha node.  Seems a bit clunky, but it works.  Looking at the execution plan, we can see that this is a relatively complex process, with multiple calls between the two sides of the query processor:

image

The sql:column() function is similar, but is used to refer to a column instead of a parameter; this allows for the dynamic querying of an XML column on a row by row basis.  For example:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 2, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 3, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' )

SELECT  ID, v=x.value('(//alpha)[sql:column("ID")][1]', 'varchar(1)')
FROM    @T

The above query will return a dataset like so:

image

Summary

SQL Server provides two functions for sharing information from the SQL engine to the XQuery engine: sql:variable() & sql:column().  The nature of these functions is pretty straight-forward; you pass the value of either a parameter or a column to an XML method, and it builds an XQuery string using the values of these functions. 

SQL Server XQuery: .modify() method

Continuing with XQuery, today’s post addresses the .modify() method against the xml data type; this method is used to update XML documents by applying one of three XML DML statements (insert, delete, or replace) via the use of a SQL SET statement.  Again, the idea is hand off the work between the XML engine and the SQL engine.  For example, given the following example:

DECLARE @X XML
SET @x = '<root><id>1</id></root>'

SET  @x.modify('insert <splat>hello world </splat>
                into /root[1]')

SELECT @x

the output of this SQL is the following XML:

<root>
  <id>1</id>
  <splat>hello world </splat>
</root>

So what happened?  We took a very simple XML document, and passed XML DML into the .modify() method which stated that we should:

  1. insert a new node “<splat>hello world </splat>” into the XML document
  2. into the first instance of the root node (/root[1]).

That’s it; welcome to the world of XML DML (Data Manipulation Language).  The syntax pattern for insert is pretty simple; action (insert) followed by an XML expression (<splat>hello world </splat> followed by the location of a second XML expression (into /root[1])With some minor variation in pattern, these elements hold true for the other two XML DDL statements (delete; replace value of):

action XML expression (1) XML expression (2) Sample
insert node {as first | as last}

into | after | before

XQuery

SET @x.modify(‘insert <splat>hello world!</splat>

                before (/root/id)[1]’)

delete node **not used** SET @x.modify(‘delete /root/splat[1]’)
replace value of node/value with value SET @x.modify(‘replace value of (/root/id/text())[1] with "2"’)

 

Of course, the variation in the pattern is what makes this set of commands a challenge to keep straight, so I’ll try to briefly explain the patterns in order of increasing complexity:

  • the delete pattern requires that a node singleton be identified for deletion.  The XML expression must evaluate to a node in the XML document that is being updated.
  • the insert pattern identifies a node to be added either into, before, or after a specific location path.  If an instance of that node already exists, you should specify as first or as last in order to place the new node in the correct sequence.
  • the replace value of requires two XML expressions; the first must evaluate to an existing value (as either an attribute or a text() value of an element).  The second must evaluate to a specific value; a later blog post will show how to use XQuery to dynamically set that value.

One more thing; these examples all use a single XML variable, and issue the SET command against it.  If you’re dealing with a table of XML documents, you can use the SQL UPDATE command to SET the XML column using this XQuery method and XML DML like so:

DECLARE @T TABLE (x XML)
INSERT INTO @T
VALUES ('<id>1</id>'), ('<id>2</id>')

UPDATE @T
SET x.modify('insert <new>sample</new>
                into /id[1]')

SELECT *
FROM @T

will return a result set like so:

image

Summary

To sum up:

  1. the .modify() method may be used either against an XML variable or an XML column in a table by using the SET statement (alone or in an UPDATE command, respectively), and
  2. there are three basic patterns for XML DML that are passed to the .modify() method, and
  3. the syntax can be confusing but allows for a great deal of fine-grained manipulation of the XML document when needed.

Cleaning up orphaned users in SQL Server

Short blog post here; I’m working with my team to clean up some older SQL Servers, and we’re removing several logins that have been dormant or disabled for some time.  Removing the logins is easy, but it leaves orphaned users in several of our databases.  I created the script below to cycle through each of the databases on the server and generate a series of DROP USER commands for each database.

DECLARE @t TABLE
    (
      db VARCHAR(100)
    , u VARCHAR(1000)
    , s INT
    )

DECLARE @sql NVARCHAR(4000)
SET @sql = 'USE [?];

SELECT  DB_NAME()
      , dp.name
      , s = CASE WHEN s.Schema_id IS NOT NULL THEN 1
                 ELSE 0
            END
FROM    sys.database_principals AS dp
        LEFT JOIN sys.server_principals AS sp ON dp.sid = sp.sid
        LEFT JOIN sys.schemas s ON s.principal_id = dp.principal_id
WHERE   dp.type IN ( ''U'', ''S'' )
        AND sp.sid IS NULL
        AND dp.authentication_type IN ( 1,  3 )
        AND dp.name <> ''dbo'''

INSERT  INTO @t
        EXEC sp_msforeachdb @sql

SELECT  'USE [' + db + ']; '
        + CASE WHEN s = 1
               THEN 'ALTER AUTHORIZATION ON SCHEMA::[' + u + '] TO dbo; '
               ELSE ''
          END + ' DROP USER [' + u + ']'
FROM    @t

If the script discovers an orphaned user, the output will look something like:

USE [AdventureWorks2012]; DROP USER [olduser]

If that user owns a schema in the database, an ALTER AUTHORIZATION step is added to first transfer the schema to dbo, and then drop the user:

USE [AdventureWorks2012]; ALTER AUTHORIZATION ON SCHEMA::[olduser] TO dbo; DROP USER [olduser]

I’m sure there are other ways to accomplish the same thing, but it works well for this task.