SQL

Steel City SQL Users Group–March 18, 2014– @SteelCitySQL

Next Tuesday, I’m loading up Big Blue, and driving over to Birmingham to present at the Steel City SQL Users Group.  I’ll be talking about the Agile DBA.  Should be fun!

http://www.steelcitysql.org/

Featured Presentation

The Agile DBA: Managing your To-Do List

Speaker: Stuart Ainsworth

Summary: Agile development is all the rage, but how do the principles apply to database administrators? This presentation will introduce the basics of the Agile Manifesto, and explain how they can be applied to non-development IT work, such as database administration, maintenance, and support. We’ll cover scrum (one of the most popular development methodologies) and kanban, and identify some of the common struggles with implementing them in an organization. This is an interactive discussion; please bring your tales of success and your horror stories.

About Stuart: Stuart Ainsworth (MA, MEd) is a manager working in the realm of financial information security. Over the past 15 years, he’s worked as a research analyst, a report writer, a DBA, a programmer, and a public speaking professor. In his current role, he’s responsible for the maintenance of a data analysis operation that processes several hundred million rows of data per day.

SQL Server XQuery: deleting nodes using .modify()

Quick blog post; got asked today by a developer friend of mine about how to delete nodes in an XML fragment using the .modify() method.  After some head-scratching and some fumbling around (its been a few months since I’ve done any work with XML), we came up with a version of the following script:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

--delete the classes that belong to teacher Smith
SET @X.modify('delete /root/class/.[@teacher="Smith"]')
SELECT @X 

Now, let me try to explain it:

  1. Given a simple document that has a root with classes, and students in each class, we want to delete all classes that are being taught by a teacher named “Smith”.
  2. First, we delete the nodes under those classes that belong to Smith
    1. Using XPath, we walk the axis and use a node test to restrict to /root/class/. (the current node under class).
    2. We then apply a predicate looking for a teacher attribute with a value of “Smith”
    3. The .modify() clause applies the delete command to the @X variable, and updates the XML

Back on the trail…. #sqlsatnash

I realize that I should probably be blogging about my New Year’s resolutions, but meh… I’ve been super busy surviving the holidays.  So busy in fact that I’ve failed to mention that I’ll be presenting at the SQLSaturday in Nashville on January 18, 2014.  I actually got selected to present TWO topics, which is HUGE for me.  Hoping that I can refine a presentation, and get ready for our own SQLSaturday in Atlanta.

Working with “Biggish Data”

Most database professionals know (from firsthand experience) that there continues to be a “data explosion”, and there’s been a lot of focus lately on “big data”. But what do you do when your data’s just kind of “biggish”? You’re managing Terabytes, not Petabytes, and you’re trying to squeeze out as much performance out of your aging servers as possible. The focus of this session is to identify some key guidelines for the design, management, and ongoing optimization of “larger-than-average” databases. Special attention will be paid to the following areas: * query design * logical and physical data structures * maintenance & backup strategies

Managing a Technical Team: Lessons Learned

I got promoted to management a year ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background; topics we’ll cover will include:*How to let go of your own solutions. *Why you aren’t the model you think you are, and *Why Venn diagrams are an effective tool for management.

SQL Server XQuery: Functions (sql:variable() & sql:column())

 

Like most query languages, XQuery has several functions that can be used to manipulate and query data.  SQL Server’s implementation supports a limited subset of the XQuery specification, but there’s a lot of power in the functions provided.  I hope to cover some of those functions in more detail at a later date, but for now I’d like to focus on a couple of very specific functions (sql:variable() & sql:column()).  These are proprietary extensions to XQuery, and they (like the xml methods I previously discussed) provide a bridge between the SQL engine and the XQuery engine.

For example, if you wanted to find the value of the third node of a simple XML document in SQL Server, you could do the following:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'
SELECT @x.value('(//alpha)[3]', 'varchar(1)')

The .value() method would return the letter “c” in the form of a varchar to the SQL engine.  However, if you wanted to do this dynamically, and specify which node to return based on a parameter, you would use the sql:variable() function, like so:

DECLARE @x XML ='<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>'

DECLARE @node INT = 3
SELECT @x.value('(//alpha)[sql:variable("@node")][1]', 'varchar(1)')

The sql:variable() function uses a string literal (a value surrounded by double quotes) to reference a SQL parameter (in this case, @node) and concatenates it to the XQuery string.  The above query is seen as:

(//alpha)[3][1]

by the XQuery engine.  In English, we are looking for the 3rd node named alpha.  You may wonder about the extra positional reference (“[1]”) ; the .value() method requires that a positional reference be explicitly defined.  In this situation, we are telling the XQuery engine to return the first instance of the third node of the alpha node.  Seems a bit clunky, but it works.  Looking at the execution plan, we can see that this is a relatively complex process, with multiple calls between the two sides of the query processor:

image

The sql:column() function is similar, but is used to refer to a column instead of a parameter; this allows for the dynamic querying of an XML column on a row by row basis.  For example:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 2, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' ),
        ( 3, '<alpha>a</alpha><alpha>b</alpha><alpha>c</alpha>' )

SELECT  ID, v=x.value('(//alpha)[sql:column("ID")][1]', 'varchar(1)')
FROM    @T

The above query will return a dataset like so:

image

Summary

SQL Server provides two functions for sharing information from the SQL engine to the XQuery engine: sql:variable() & sql:column().  The nature of these functions is pretty straight-forward; you pass the value of either a parameter or a column to an XML method, and it builds an XQuery string using the values of these functions. 

SQL Server XQuery: .modify() method

Continuing with XQuery, today’s post addresses the .modify() method against the xml data type; this method is used to update XML documents by applying one of three XML DML statements (insert, delete, or replace) via the use of a SQL SET statement.  Again, the idea is hand off the work between the XML engine and the SQL engine.  For example, given the following example:

DECLARE @X XML
SET @x = '<root><id>1</id></root>'

SET  @x.modify('insert <splat>hello world </splat>
                into /root[1]')

SELECT @x

the output of this SQL is the following XML:

<root>
  <id>1</id>
  <splat>hello world </splat>
</root>

So what happened?  We took a very simple XML document, and passed XML DML into the .modify() method which stated that we should:

  1. insert a new node “<splat>hello world </splat>” into the XML document
  2. into the first instance of the root node (/root[1]).

That’s it; welcome to the world of XML DML (Data Manipulation Language).  The syntax pattern for insert is pretty simple; action (insert) followed by an XML expression (<splat>hello world </splat> followed by the location of a second XML expression (into /root[1])With some minor variation in pattern, these elements hold true for the other two XML DDL statements (delete; replace value of):

action XML expression (1) XML expression (2) Sample
insert node {as first | as last}

into | after | before

XQuery

SET @x.modify(‘insert <splat>hello world!</splat>

                before (/root/id)[1]’)

delete node **not used** SET @x.modify(‘delete /root/splat[1]’)
replace value of node/value with value SET @x.modify(‘replace value of (/root/id/text())[1] with "2"’)

 

Of course, the variation in the pattern is what makes this set of commands a challenge to keep straight, so I’ll try to briefly explain the patterns in order of increasing complexity:

  • the delete pattern requires that a node singleton be identified for deletion.  The XML expression must evaluate to a node in the XML document that is being updated.
  • the insert pattern identifies a node to be added either into, before, or after a specific location path.  If an instance of that node already exists, you should specify as first or as last in order to place the new node in the correct sequence.
  • the replace value of requires two XML expressions; the first must evaluate to an existing value (as either an attribute or a text() value of an element).  The second must evaluate to a specific value; a later blog post will show how to use XQuery to dynamically set that value.

One more thing; these examples all use a single XML variable, and issue the SET command against it.  If you’re dealing with a table of XML documents, you can use the SQL UPDATE command to SET the XML column using this XQuery method and XML DML like so:

DECLARE @T TABLE (x XML)
INSERT INTO @T
VALUES ('<id>1</id>'), ('<id>2</id>')

UPDATE @T
SET x.modify('insert <new>sample</new>
                into /id[1]')

SELECT *
FROM @T

will return a result set like so:

image

Summary

To sum up:

  1. the .modify() method may be used either against an XML variable or an XML column in a table by using the SET statement (alone or in an UPDATE command, respectively), and
  2. there are three basic patterns for XML DML that are passed to the .modify() method, and
  3. the syntax can be confusing but allows for a great deal of fine-grained manipulation of the XML document when needed.

Cleaning up orphaned users in SQL Server

Short blog post here; I’m working with my team to clean up some older SQL Servers, and we’re removing several logins that have been dormant or disabled for some time.  Removing the logins is easy, but it leaves orphaned users in several of our databases.  I created the script below to cycle through each of the databases on the server and generate a series of DROP USER commands for each database.

DECLARE @t TABLE
    (
      db VARCHAR(100)
    , u VARCHAR(1000)
    , s INT
    )

DECLARE @sql NVARCHAR(4000)
SET @sql = 'USE [?];

SELECT  DB_NAME()
      , dp.name
      , s = CASE WHEN s.Schema_id IS NOT NULL THEN 1
                 ELSE 0
            END
FROM    sys.database_principals AS dp
        LEFT JOIN sys.server_principals AS sp ON dp.sid = sp.sid
        LEFT JOIN sys.schemas s ON s.principal_id = dp.principal_id
WHERE   dp.type IN ( ''U'', ''S'' )
        AND sp.sid IS NULL
        AND dp.authentication_type IN ( 1,  3 )
        AND dp.name <> ''dbo'''

INSERT  INTO @t
        EXEC sp_msforeachdb @sql

SELECT  'USE [' + db + ']; '
        + CASE WHEN s = 1
               THEN 'ALTER AUTHORIZATION ON SCHEMA::[' + u + '] TO dbo; '
               ELSE ''
          END + ' DROP USER [' + u + ']'
FROM    @t

If the script discovers an orphaned user, the output will look something like:

USE [AdventureWorks2012]; DROP USER [olduser]

If that user owns a schema in the database, an ALTER AUTHORIZATION step is added to first transfer the schema to dbo, and then drop the user:

USE [AdventureWorks2012]; ALTER AUTHORIZATION ON SCHEMA::[olduser] TO dbo; DROP USER [olduser]

I’m sure there are other ways to accomplish the same thing, but it works well for this task.

SQL Server XQuery: .value() method

Back to XQuery; the third method I wanted to discuss in this series is the .value() method. Like the .query() method, .value() bridges the SQL Server database engines and XQuery engines by extracting information from an XML document.  The difference is that while .query() returns XML, .value() translates the specific information requested into a scalar SQL value.  For example:

DECLARE @T TABLE ( ID INT, x XML )

INSERT  INTO @T
        ( ID, x )
VALUES  ( 1, '<root><plant><fruit>tomato</fruit></plant></root>' )
,       ( 2, '<root><plant><fruit>banana</fruit></plant></root>' )
,       ( 3, '<root><plant><vegetable>tomato</vegetable></plant></root>' )

SELECT ID, x.value('(.)[1]', 'varchar(max)')
FROM @T t

Note that the .value() method has two parameters, an XQuery statement and a string representing the expected SQL Server type for the scalar value returned.  The results are as follows:

image

So, what is the XQuery statement requesting?  The axis is referring to the self-node of the XML fragment (and remember that in SQL Server’s implementation of XQuery, the query begins at the top of the document), there is no node test (so any node will satisfy the test), and the predicate is referring to the first position [1] of the path.  The .value() function returns the value found at that location, and converts it to a sql value of type varchar(max).  So what happens if there’s more than one value at a given XPath?

INSERT  INTO @T
        ( ID, x )
VALUES    ( 4, '<root><plant><fruit>tomato</fruit><fruit>banana</fruit><vegetable>tomato</vegetable></plant></root>' )


SELECT ID, x.value('(.)[1]', 'varchar(max)')
FROM @T t

The above script inserts a slightly more complex XML document than used before; there are now multiple nodes under the <plant> node (in addition to the three rows we already have); the output looks like:

image 

Note how row 4 has a single string that represents all of the values of the XML document.  Without a specific XQuery, the value() function returns the values that satisfy the request; in this case, all of them.  In the case where the XQuery cannot be satisfied, .value() will return a NULL value to the SQL Server engine, as illustrated by the following code examples:

SELECT ID, x.value('(//plant/fruit)[1]', 'varchar(max)')
FROM @T t 

SELECT ID, x.value('(//plant/fruit)[2]', 'varchar(max)')
FROM @T t 

The output should look like:

image

In the first query, we are looking for an axis of any node in the document, a node test of fruit under the <plant> node, and a positional predicate of 1.  Rows 1,2, and 4 satisfy this query; the XML document in row 3 doesn’t have a <fruit node>, and thus returns a NULL.  The second query is similar in that several of the XML documents in our table don’t satisfy the query; only the document in row 4 has node in the second [2] position of the <fruit> node.

Summary

Key takeaways:

  1. The .value() function allows you to specify an XQuery statement against an XML datatype looking for a specific value, and
  2. that returned value is cast to a SQL datatype.

Next up: .nodes()

#TSQL2sDay.. For Whom Do You Log?

This month’s T-SQL Tuesday is being hosted by Aaron Nelson [blog | twitter]; since I’m posting this late in the day, I got a chance to sneak a peek at some of the other entries, and most of them were great technical discussions on how to log, what to log, and why you should log.  I’m not feeling technical today; today’s a conceptual day.  So, rather than write about the pros and cons of logging, I thought I would ask you to step back and consider who is your audience?

At my company, we monitor logs for our customers; I’ve had to reverse engineer a bunch of different log formats in my day, and there are some basic principles behind good logging practices; here’s my stab at them:

1. Logging without analysis is useless storage of information.

I realize that our jobs as data professionals have gotten more exciting because of policy requirements that insist upon the storage of all kinds of information for exceedingly long periods of time.  I recently read a requirement that said a company must maintain source code for 20 years; that’s a heckuva long time to keep a log of changes around.  Unfortunately, if no one ever looks at the log, then why store it?  If you’re going to be storing information, you need to have some process that consumes that information in order to get the most value out of it.  Good logging processes assume that someone will be reviewing the log at some point, and using that information to act. 

2. Logging structures should be flexible.

If you are logging information with the mindset that someone will be reviewing the log in the future, then you need to balance precision (i.e, gathering adequate information to describe the logged event) with saturation (don’t over-log; not every event is always important).  For example, if you’re building an audit log to track changes to a customer account, you want to be able to isolate “risky” behavior from normal account maintenance.  If your logs become lengthy result-sets filled with change statements, it’s easy to overlook important events such as a bad command.

Most logging structures attempt to overcome this by having some sort of categorical typing appended to the log; in other words, if we think in tabular terms, the columns of a log dataset might look like:

  • DateOfEvent – datetime
  • Category – Classification of the event (Error, Information, etc)
  • Severity – Some warning of how important the event is
  • ErrorCode – Some short (usually numeric) code that has meaning extrensic to the system
  • Message – The payload; a string description of what happened.

It becomes relatively easy to isolate the error messages from informational messages; however, how do you search non-categorical information with the message itself?  For example, if you want to determine that there was a specific error code associated with a specific logon, and that logon information is embedded in the message of your log file, how do you search it?  The easy answer is to use wildcards, but is there a better way?  In my experience, good logs use some form of intra-message tagging to isolate key elements within the message; the log file remains simple for quick searches, but can easily be adapted for more in-depth searches.  I like XML attributes for payload logging; it’s easy to implement, and can be parsed quickly.  For example:

acct=”Stuart Ainsworth” msg=”Access Denied” object=”SuperSecretPage.aspx”

is an easy message to shred and look for all denied attempts on SuperSecretPage.aspx.  If I wanted to look for all activity by Stuart Ainsworth, I could do that as well.

3.  Logging should have a maintenance plan.

If you log a lot, you know that logs fill up quickly.  How long do you retain your logs (and the answer shouldn’t be “until the drive fills up”)?  Too much information that is easily accessible is both a security risk and a distraction; if you’re trying to find out about a recent transaction by a user, do you really need to know that they’ve been active for the last 10 years?  Also, if your log file does fill up, is your only option to “nuke it” in order to keep it running and collecting new information?

A good logging strategy will have some sort of retention plan, and a method of maintaining older log records separately from new ones.  Look at SQL Server error logs, for example; every time the service starts, a new log file is created.   Is that the best strategy?  I don’t think so, but it does isolate older logs from newer ones.   If you’re designing a logging method, be sure to figure out a way to keep old records separate from new ones; periodically archive your logs, and find a way to focus your resources on the most recent and relevant information.

#SQLSat111 “Biggish Data”

Trying not to be to obnoxious about this, but I’m presenting on “Biggish Data” at SQL Saturday 111 in Atlanta tomorrow; if you’re there, swing by and see me.  I presented this on Monday at AtlantaMDF, and am attempting to incorporate the feedback from that presentation into this one.  Should be good, but I’m still working on the slides.

However, if you attend my session, you’ll get a free koozie!

koozie

#TSQL2sDay 28–Jack of All Trades, Master of None

This month’s topic was chosen by Argenis Fernandez (blog | twitter): "Jack of All Trades, Master of None?".  It’s interesting to me lately because I seem to keep encountering blog posts and articles that ask some variant of this question.  I’m not sure if it’s spurred in part by the recent release of SQL Server 2012 (which is going to change a lot of the ways we manage data), or if it’s something in the collective waters bubbling under the Internet.  Or perhaps, it’s just the fact that I’m trying to define myself in a new role at my work.  Don’t know. 

I’ve always been a data person; one of my first jobs in IT was working for a forestry agent doing data entry; my job was to take the handwritten note cards from the tree counter, translate their coding system into digits, and enter it into an old Macintosh computer (this was the late 80’s, and Macs were super sophisticated at the time).  These numbers were important; they represented an estimate of how much timber (and what kind) was on a property, so that the value of that property could be accurately assessed.  It had nothing to do with SQL; wasn’t really even any sort of programming.  Just look at a card, mentally transform the code into a number and tree type, and punch it in.

Later, in college, I got involved in doing social science research; this eventually led to a position as in public health & communication research.  Still digging on numbers; I was using SPSS at the time.  I went to bed at night dreaming of statistical relationships and discovering new stuff.  When our department encountered resistance from the IT department for managing one our studies, I started hacking into Microsoft Access to build a contact management platform.  This was a transition point for me; I went from entering numbers to evaluating numbers to finally programming.  However, it was always data.

I made a career change and dove into true IT work shortly after that; my job title said “Access Report Developer”, but I became the DBA (because there wasn’t one).  I loved SQL 2000, but when the IT department at that company shrank, I became not only a programmer, a report developer, and the DBA, but also the guy that fixed printers.  When I had an opportunity to jump ship to another company to become strictly a DBA, I took it.

I’ve been here for 9 years, now.  In that 9 years, I’ve been the DBA, a database developer, a data architect, and now a manager.  Although I’ve primarily worked with SQL Server, I’ve dabbled in mySQL, XML and Visual Basic.  I’ve looked at backup jobs and maintenance plans, DTS and SSIS (and other solutions); my T-SQL is strong.  I used to think I was a specialist in database development, but I’ve come to realize that there’s people that are a lot more specialized than me.  I’ve also come to realize that data without context is meaningless.

I guess I’m growing up a little; when I was kid counting trees, I didn’t really care about how efficient the system was.  I was just earning spending money at a job that was better than mowing the lawn.   When I was doing research, I realized how important my work was, but even then there was little that I could do to change long-term outcomes; I was just an analyst.  In IT, I could finally get to a point where I could take the numbers that I saw and do something about it.  I’m just not collecting data for a reason; I’m using that data to help make decisions that make my company more profitable (which ultimately leads to more money in my pocket). I collect, I analyze, I interpret, and I react.

So what does this little trip down my history mean in terms of the post topic? For me it means that I’ve got a niche, but it’s a pretty big one.  Most of work is done with a certain tool, but the skills I’ve learned along the way don’t restrict me to that tool.  I’ve worked with people who get the specifics of the platform a lot better than I do, but miss the big picture; I’ve also worked with visionaries who get lost when I discuss normalization and efficiencies.   I’m neither a Jack of All Trades, nor a Master of just One.  I’m just a data guy, and I like it.