#SQLServer – Where does my index live?

Today, I got asked by one of my DBA’s about a recently deployed database that seemed to have a lot of filegroups with only a few tables.  He wanted to verify that one of the tables was correctly partition-aligned, as well as learn where all of the indexes for these tables were stored.  After a quick search of the Internets, I was able to fashion the following script to help.  The script below will find every index on every user table in a database, and then determine if it’s partitioned or not.  If it’s partitioned, the scheme name is returned; if not, the filegroup name.  The final column provides an XML list of filegroups (because schemes can span multiple filegroups) and file locations (because filegroups can span multiple files).

 WITH C AS ( SELECT ps.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
JOIN sys.destination_data_spaces dds ON dds.data_space_id = f.data_space_id
JOIN sys.partition_schemes ps ON ps.data_space_id = dds.partition_scheme_id
UNION
SELECT f.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
)
SELECT [ObjectName] = OBJECT_NAME(i.[object_id])
, [IndexID] = i.[index_id]
, [IndexName] = i.[name]
, [IndexType] = i.[type_desc]
, [Partitioned] = CASE WHEN ps.data_space_id IS NULL THEN 'No'
ELSE 'Yes'
END
, [StorageName] = ISNULL(ps.name, f.name)
, [FileGroupPaths] = CAST(( SELECT name AS "FileGroup"
, physical_name AS "DatabaseFile"
FROM C
WHERE i.data_space_id = c.data_space_id
FOR
XML PATH('')
) AS XML)
FROM [sys].[indexes] i
LEFT JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
LEFT JOIN sys.filegroups f ON f.data_space_id = i.data_space_id
WHERE OBJECTPROPERTY(i.[object_id], 'IsUserTable') = 1
ORDER BY [ObjectName], [IndexName] 

Hadoop for the SQL Server DBA – Initial Challenges

I’ve been intrigued by the whole concept of Big Data lately, and have started actually presenting a couple of different sessions on it (one of which was accepted for PASS Summit 2014).  Seems only right that I should actually *gasp* blog about some of the concepts in order to firm up some of my explanations.  Getting started with Hadoop can be quite daunting, especially if you’re used to relational databases (especially the commercial ones); I hope that this series of posts can help clear up some of the mystery for the administrative side of the house.  Before we dive in, I think it’s only fair to lay out some of the initial challenges with discussing Big Data in general, and Hadoop specifically.  Depending on your background, some of these may be more challenging than others.

Rapid Evolution

Welcome to the wild, wild west.  If you come from a commercial database background (like SQL Server), you’re probably accustomed to a mature product.  For Microsoft SQL Server, a new version gets released on what appears to be a 2-4 year schedule (SQL 2005 -> 2008 -> 2012 -> 2014); of course, there’s always the debate as to what constitutes a major release (2008 R2?), but in general, the core product gets shipped with new functionality, and there’s some time before additional new functionality is released.

Hadoop’s approach to the release cycle is much looser; in 2014 alone, there have been two “major” releases with new features and functionality included.  Development for the Hadoop engine is distributed, so the release and packaging of new functions may vary within the ecosystem (more on that in a bit).  For developers, this is exciting; for admins, this is scary.   Depending on how acceptable change is within your operational department, the concept of rolling out an upgraded database engine every 3-4 months may be daunting.

Ecosystems, not products

Hadoop is an open-source product, so if you’re experienced with other open-source products like Linux, you probably already understand what that means; open-source licensing means that vendors can package the core product into their product, as long as they allow open access to the final package.  This usually means that commercial providers will either bundle an open-source product with their own proprietary side-by-side software (“we interface with MySQL” or “we run on Linux”), or they release their modified version of the software in a completely open fashion and earn revenue from a support contract (e.g., Red Hat).  In either case, it’s an ecosystem, not a canned product.

Hadoop technically consists of four modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

However, take a look at the following framework from Hortonworks (the Microsoft partner for Hadoop):

hortonworks Lots of stuff in there that’s being developed, but isn’t officially Hadoop.  it could become part of this official stack at some point, or it may not.  Other vendors may adopt it, or they may not.   Each of these components has their own update schedule (again, change!), but there is some flexibility in this approach (you can upgrade only the individual components); it does make the road map complex compared to traditional database platforms.

Big Data doesn’t always mean Big Data.

Perhaps the hardest thing to embrace about Big Data in general (not just Hadoop) is that the nomenclature doesn’t necessarily line up with the driving factors; a Big Data approach may be the best approach for smaller data sets as well.   In essence, data can be described in terms of the 4 V’s:

  1. Volume – The amount of data held
  2. Velocity – The speed at which the data should be processed
  3. Variety – The variable sources, processing mechanisms and destinations required
  4. Value – The amount of data that is viewed as not redundant, unique, and actionable

A distributed approach (like Hadoop) is usually appropriate when tackling more than 1 of these four v’s; if your data’s just large, but low velocity, variety, or value, a single installation of SQL Server (with a lot of disk space) may be appropriate.  However, if your data has a lot of variety and a lot of velocity even if it’s small, a Big Data approach may yield considerable efficiency.  The point is that big data alone is not necessarily the impetus for using Hadoop at all.

Summary

Big Data & Hadoop are complex topics, and they’re difficult to understand if you approach them from a traditional RDBMS mentality.  However, understanding the fundamentals of how Big Data approaches are evolving, disparate, and generally applicable to more than just volume can lay a foundation for tackling the platforms.

 

 

Managing a Technical Team: Act Like a Good Developer

This is one of my favorite pieces of advice from my Managing a Technical Team presentation that I’ve been doing at several SQLSaturdays and other conferences: act like a good developer, with a different focus.  Most new managers, especially if they’ve been promoted from within (the Best Operator) model don’t know how to improve their management skills.  However, if you were to ask managers what makes a good developer, you’ll probably get a series of answers that are similar to the following broad categories:

Good Developers have:

  • a desire to learn,
  • a desire to collaborate, and
  • a desire for efficiency.

I could probably say that this is true for all good employees, but as a former developer, I know that the culture in software development places a lot of focus on these traits; system administrators usually have different focus points.  However, all technical managers SHOULD emulate these three traits in order to be effective.  Let me explain.

Desire to Learn

Let’s imagine Stacy, a C# developer in your company; by most accounts, she’s successful at her job.  She always seems to be up on the latest technology, has great ideas, and always seems to have a new tool in her toolkit.  If you ask her how she got started programming, she’d tell you that she picked it up as hobby while in college, and then figured out how to make a career out of it.  She’s an active member of her user group, and frequently spends her weekends reading and polishing her craft; while not a workaholic, she does spend a great deal of her personal time improving her skills.  She’s on a fast track to managing a team, in part because of her desire to learn.

One day, she gets promoted, and is now managing the development team; she struggles with the corporate culture, the paperwork, laying out a vision, and can’t seem to figure out how to motivate her team to the same level of success that she was achieving as a developer.  The problem is that her desire to learn no longer syncs up with her career objectives;  Stacy needs to invest her educational energies into learning about management.

Ask a new IT manager what books they’re reading, and typically the response will be either none at all, or a book on the latest technology.  We tend to cling to that which is familiar, and if you’ve got a technical background, it’s easy and interesting to try and keep focusing on that background.  However, if you’re serious about being a manager, you need to commit to applying the same desire to learn that you had as an employee to learning more about management.  Sure, pick up a book on Big Data, but balance it out with a book on Relationship Development.  Podcasts?  There’s management ones out there that are just as fun as the development ones.  Webinars? Boom.

Desire to Collaborate

Bob’s a data architect.  Everybody loves Bob, because he really listens to your concerns, and tries to design solutions that meet those concerns; if he’s wrong about something, he’s quick to own up to the mistake, and moves on.  He works well with others, acknowledging their contributions and adapting to them.  In short, Bob is NOT a jerk; nobody wants to work with a jerk.

Bob gets promoted to a management position, and he too struggles; he’s still hanging out with his former teammates, and is still going to the same conferences.  Everybody still likes Bob, but he’s having trouble guiding his team in an effective manner.  He hasn’t really built relationships with his new peers (other managers that report to his director), and hasn’t found ways to manage more effectively.  He’s collaborating, but with the wrong people.

As a new manager, you should continue to maintain relationships with your directs, but you need to build a relationship with your new team of peers.  Understand their visions, and find ways to make your team valuable resources to them. Reach out to other managers at user groups and conferences; build a buddy system of people based on your management path, not just your technical one.

Desire for Efficiency

If you sat down and had a conversation with any development team that was effective and producing results and asked them about their methodology, it wouldn’t be long before they started talking about frameworks.  Efficiency in development is derived from reusable patterns and approaches to problems; they’re tough to implement at first, but the long term gain is enormous.

As you’ve probably guessed, there’s management frameworks that can be very effective in a technical environment; investing time in implementing them can yield great efficiencies when faced with making decisions.  In my current environment, I use three:

  1. MARS – my own self-rolled approach to system operations; it’s not perfect, but it helps focus efforts.
  2. Kanban – allows me to see what our WIP (Work In Progress) is, and helps queue up items for work
  3. ITIL – we’re just starting to adopt this, but we’re working on isolating Incident Management from root cause analysis, as well as implementing robust change control processes.

The challenge with management frameworks is similar to that of development frameworks: bloat.  It’s too easy to get bound up in process and procedures when lighter touches can be used, but in most cases, the efficiency gained by having a repeatable approach to decisions allows you to respond quickly to a changing environment.

Summary

Management is tough, but it’s especially tough if you continue to focus on your technical chops as opposed to your leadership abilities.  Act like a good developer, and apply those same basic principles to your team.

Determining the Primary Key of a table without knowing it’s name

So, I’ve been trying to find more technical fodder for blog posts lately in order to get in the habit of blogging on a regular basis, so I thought I would explore a few of my higher-ranked answers on StackOverflow and provide a little more detail than that site provides.  This is my highest-rated answer (sad, I know, compared to some posters on the site):

Determine a table’s primary key using TSQL

I’d like to determine the primary key of a table using TSQL (stored procedure or system table is fine). Is there such a mechanism in SQL Server (2005 or 2008)?

My answer:

This should get you started:

SELECT *
    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
        JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name
    WHERE tc.CONSTRAINT_TYPE = 'Primary Key'

 

Besides the obvious faux pas of using SELECT * in a query, you’ll note that I used the INFORMATION_SCHEMA views; I try to use these views wherever possible because of the portability factor.  In theory, I should be able to apply this exact same query to a MySQL or Oracle database, and get similar results from the system schema.

The added benefit of this query is that it allows you to discover other things about your PRIMARY KEYs, like looking for system named keys:

SELECT  tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME, ccu.COLUMN_NAME
FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
        JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name
WHERE   tc.CONSTRAINT_TYPE = 'Primary Key'
    AND ccu.CONSTRAINT_NAME LIKE 'PK%/_/_%' ESCAPE '/'

or finding PRIMARY KEYs which have more than one column defined:

SELECT  tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME
FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
        JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name
WHERE   tc.CONSTRAINT_TYPE = 'Primary Key'
GROUP BY tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME
HAVING COUNT(*) > 1

Error 574 Upgrading SQL Server 2012 to SP1

This blog post is way overdue (check the dates in the errror log below), but I promised our sysadmin that I would write it, so here it is.  Hopefully, it’ll help some of you with this aggravating issue.  During an upgrade of our SQL cluster, we ran into the following error as we attempted to upgrade one of the instances:

2014-04-15 23:50:14.45 spid14s     Error: 574, Severity: 16, State: 0.

2014-04-15 23:50:14.45 spid14s     CONFIG statement cannot be used inside a user transaction.

2014-04-15 23:50:14.45 spid14s     Error: 912, Severity: 21, State: 2.

2014-04-15 23:50:14.45 spid14s     Script level upgrade for database ‘master’ failed because upgrade step ‘msdb110_upgrade.sql’ encountered error 574, state 0, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

2014-04-15 23:50:14.45 spid14s     Error: 3417, Severity: 21, State: 3.

2014-04-15 23:50:14.45 spid14s     Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Google wasn’t helpful; there’s apparently lots of potential fixes for this, none of which helped.  The closest match we found was that we had some orphaned users in a few databases (not system databases), which we corrected; the upgrade still failed.  We eventually had to contact Microsoft support, and work our way up the second level technician.  Before I reveal the fix, let me give a little more background on how we got orphaned users.

You see, shortly after we upgraded to SQL 2012 (about a year ago), we did what many companies do; we phased out a service offering.  That service offering that we phased out required several database components, including a SQL login associated with users in the database, and several maintenance jobs that were run by SQL Agent.  When we phased out the service, those jobs were disabled, but not deleted.  Our security policy tracks the last time a login was used; if a login isn’t used within 60 days, it’s disabled.  30 days after that (if no one notices), the login is deleted.  Unfortunately, our implementation of this process missed two key steps:

  1. The associated user in each database was not dropped with the login (leaving an orphan), and
  2. any job that was owned by that login was also not dropped or transferred to a sysadmin.

The latter was the key to our particular situation; the upgrade detected an orphaned job even though that job was disabled, and blocked the upgrade from going forward.  Using trace flag –T902, we were able to start the server instance and delete the disabled job.  We then restarted the server without the trace flag, and the upgrade finished successfully.

 

Resources:

Find and fix all orphaned users for all databases.

Brent Ozar Unlimited’s sp_blitz will find jobs that are owned by users other than sa.

A tiny step in the right direction #SQLPASS

Ah, summertime; time for the annual “community crisis” for the Professional Association for SQL Server.  I’ve tried to stay clear of controversies for the last couple of years, but it’s very hard to be a member of such a passionate group of professionals and not have an opinion of the latest subject d’jour.   The short form of the crisis is that there’s questions about how and why sessions get selected to present at the highly competitive Summit this year (disclaimer: I got selected to present this year).  For more details, here’s a few blog posts on the subject:

The point of my post is not to rehash the issue or sway your opinion, dear reader, but rather to focus on a single tiny step in the right direction that I’ve decided to make.  One of the big issues that struck me about the whole controversy is the lack of a repeatable objective tool for speaker evaluations.  As a presenter, I don’t always get feedback, and when I do, the feedback form varies from event to event, meeting to meeting.  Selection committees are forced to rely on my abstract-writing skills and/or my reputation as a presenter; you can obfuscate my identity on the abstract, but it’s tough to factor in reputation if do that.

While I agree that there are questions about the process that should be asked and ultimately answered, there’s very little that I can do to make a difference in the way sessions get selected.  However, as a presenter, and a chapter leader for one of the largest chapters in the US, I can do a little something.

  1. I am personally committing to listing every presentation I make on SpeakerRate.com, and soliciting feedback on every presentation.  To quote Bleachers, “I wanna get better”.
  2. I will personally encourage every presenter at AtlantaMDF to set up a profile and evaluation at SpeakerRate for all presentations going forward.
  3. We will find ways to make feedback electronic and immediate at the upcoming Atlanta SQLSaturday so that presenters can use that information going forward.
  4. I will champion the evaluation process with my chapter members and speakers, and continue to seek out methods to improve and standardize the feedback process.

Do I have all of the right answers? No.  For example, SpeakerRate.com seems to be barely holding on to life; no mobile interface, and a lack of commitment from its members seems to indicate that the site is dying a slow death.  However, I haven’t found an alternative to provide a standard, uniform measure of presentation performance.

Do I think this will provide a major change to the PASS Summit selection?  Nope.  But I do think that a sea change has to start somewhere, and if enough local chapters get interested in a building a culture of feedback and evaluation, that could begin to flow up to the national level.

Speaking at CodeStock 2014

So, this announcement’s way overdue; in about 2 weeks (July 11-12,2014), I’ll be presenting a couple of sessions at CodeStock 2014 in lovely Knoxville, TN.   I haven’t been to CodeStock since 2009, so it’ll be interesting to see how it’s grown.

The Elephant in the Room; A DBA’s Guide to Hadoop & Big Data by Stuart Ainsworth

DATE & TIME

Jul 11th at 9:55 AM until 11:05 AM

TRACK

LOCATION

400b

RATING (0 VOTES)

0

Speaker(s): Stuart Ainsworth
The term "Big Data" has risen to popularity in the last few years, and encompasses data platforms outside of the traditional RDBMS (like SQL Server). The purpose of this session is to introduce SQL Server DBA’s to Hadoop, and to promote understanding of how schema-less data can be collected and accessed in coordination with the traditional SQL models. We’ll cover the basic vocabulary of the Hadoop platform, Microsoft’s integration efforts, and demonstrate how to get started with "big data".

 

Managing a Technical Team: Lessons Learned by Stuart Ainsworth

DATE & TIME

Jul 12th at 11:10 AM until 12:20 PM

TRACK

LOCATION

400b

RATING (0 VOTES)

0

Speaker(s): Stuart Ainsworth
I got promoted to management a couple of years ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background.

Speaking at the #SQLPASS #Summit14

I know I’m a day late with this announcement, but I haven’t blogged in months, so what’s the rush?  I am very excited, however, about presenting a full session and a lightning talk at the PASS Summit in Seattle in November.

THE ELEPHANT IN THE ROOM: A DBA’S GUIDE TO HADOOP AND BIG DATA

Speaker(s)Stuart Ainsworth

Duration: 75 minutes

Track: BI Platform Architecture, Development & Administration

You’re a SQL Server DBA working at Contoso and your boss calls you out of your cubicle one day and tells you that the development team is interested in implementing a Hadoop-based solution to your customers. She wants you to help plan for the implementation and ongoing administration. Where do you begin?

This session will cover the foundations of Hadoop and how it fundamentally differs from the relational approach. The goal is to provide a map between your current skill set and "big data.” Although we’ll talk about basic techniques for querying data, the focus is on basic understanding how Hadoop works, how to plan for growth, and what you need to do to start maintaining a Hadoop cluster.

You won’t walk out of this session a Hadoop administrator, but you’ll understand what questions to ask and where to start looking for answers.

 

TEN-MINUTE KANBAN

Speaker(s)Stuart Ainsworth

Duration: 10 minutes

Track: Professional Development

The goal of this Lightning Talk is to cover the basic principles of the Lean IT movement, and demonstrate how Kanban can be used by Administrators as well as developers. Speaker Stuart Ainsworth will cover the basic concepts of Kanban, where to begin, and how it works.

Kanban boards can be used to highlight bottlenecks in resource and task management, as well as identify priorities and communicate expectations. All this can be done by using some basic tools that can be purchased at an office supply store (or done for free online).

Steel City SQL Users Group–March 18, 2014– @SteelCitySQL

Next Tuesday, I’m loading up Big Blue, and driving over to Birmingham to present at the Steel City SQL Users Group.  I’ll be talking about the Agile DBA.  Should be fun!

http://www.steelcitysql.org/

Featured Presentation

The Agile DBA: Managing your To-Do List

Speaker: Stuart Ainsworth

Summary: Agile development is all the rage, but how do the principles apply to database administrators? This presentation will introduce the basics of the Agile Manifesto, and explain how they can be applied to non-development IT work, such as database administration, maintenance, and support. We’ll cover scrum (one of the most popular development methodologies) and kanban, and identify some of the common struggles with implementing them in an organization. This is an interactive discussion; please bring your tales of success and your horror stories.

About Stuart: Stuart Ainsworth (MA, MEd) is a manager working in the realm of financial information security. Over the past 15 years, he’s worked as a research analyst, a report writer, a DBA, a programmer, and a public speaking professor. In his current role, he’s responsible for the maintenance of a data analysis operation that processes several hundred million rows of data per day.

SQL Server XQuery: MORE deleting nodes using .modify()

So after my last post, my developer friend came back to me and noted that I hadn’t really demonstrated the situation we had discussed; our work was a little more challenging than the sample script I had provided.  In contrast to what I previously posted, the challenge was to delete nodes where a sub-node contained an attribute of interest.  Let me repost the same sample code as an illustration:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

If I wanted to delete the class nodes which contain a student node with a name of “Miller”, there are a couple of ways to do it; the first method involves two passes:

SET @X.modify('delete /root/class//.[@name = "Miller"]/../*')
SET @X.modify('delete /root/class[not (node())]')
SELECT @x

In this case, we walk the axis and find a node test of class (/root/class); we then apply a predicate to look for an attribute of name with a value of Miller ([@name=”Miller”]) in any node below the node of class (//.).  We then walk back up a node (/..), and delete all subnodes (/*).

That leaves us with an XML document that has three nodes for class, one of which is empty (the first one).  We then have to do a second pass through the XML document to delete any class node that does not have nodes below it (/root/class[not (node())]).

The second method accomplishes the same thing in a single pass:

SET @x.modify('delete /root/class[student/@name="Miller"]')
SELECT @x

In this case, walk the axis to class (/root/class), and then apply a predicate that looks for a node of student with an attribute of name with a value of Miller ([student/@name=”Miller”); the difference in this syntax is that the pointer for the context of the delete statement is left at the specific class as opposed to stepping down a node, and then back up.