So, Iâ€™ve been trying to find more technical fodder for blog posts lately in order to get in the habit of blogging on a regular basis, so I thought I would explore a few of my higher-ranked answers on StackOverflow and provide a little more detail than that site provides. This is my highest-rated answer (sad, I know, compared to some posters on the site):
I’d like to determine the primary key of a table using TSQL (stored procedure or system table is fine). Is there such a mechanism in SQL Server (2005 or 2008)?
This should get you started:
SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name WHERE tc.CONSTRAINT_TYPE = 'Primary Key'
Besides the obvious faux pas of using SELECT * in a query, youâ€™ll note that I used the INFORMATION_SCHEMA views; I try to use these views wherever possible because of the portability factor. In theory, I should be able to apply this exact same query to a MySQL or Oracle database, and get similar results from the system schema.
The added benefit of this query is that it allows you to discover other things about your PRIMARY KEYs, like looking for system named keys:
SELECT tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME, ccu.COLUMN_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name WHERE tc.CONSTRAINT_TYPE = 'Primary Key' AND ccu.CONSTRAINT_NAME LIKE 'PK%/_/_%' ESCAPE '/'
or finding PRIMARY KEYs which have more than one column defined:
SELECT tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON tc.CONSTRAINT_NAME = ccu.Constraint_name WHERE tc.CONSTRAINT_TYPE = 'Primary Key' GROUP BY tc.TABLE_CATALOG, tc.TABLE_SCHEMA, tc.TABLE_NAME, ccu.CONSTRAINT_NAME HAVING COUNT(*) > 1
This blog post is way overdue (check the dates in the errror log below), but I promised our sysadmin that I would write it, so here it is. Hopefully, itâ€™ll help some of you with this aggravating issue. During an upgrade of our SQL cluster, we ran into the following error as we attempted to upgrade one of the instances:
2014-04-15 23:50:14.45 spid14s Error: 574, Severity: 16, State: 0.
2014-04-15 23:50:14.45 spid14s CONFIG statement cannot be used inside a user transaction.
2014-04-15 23:50:14.45 spid14s Error: 912, Severity: 21, State: 2.
2014-04-15 23:50:14.45 spid14s Script level upgrade for database ‘master’ failed because upgrade step ‘msdb110_upgrade.sql’ encountered error 574, state 0, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2014-04-15 23:50:14.45 spid14s Error: 3417, Severity: 21, State: 3.
2014-04-15 23:50:14.45 spid14s Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
Google wasnâ€™t helpful; thereâ€™s apparently lots of potential fixes for this, none of which helped. The closest match we found was that we had some orphaned users in a few databases (not system databases), which we corrected; the upgrade still failed. We eventually had to contact Microsoft support, and work our way up the second level technician. Before I reveal the fix, let me give a little more background on how we got orphaned users.
You see, shortly after we upgraded to SQL 2012 (about a year ago), we did what many companies do; we phased out a service offering. That service offering that we phased out required several database components, including a SQL login associated with users in the database, and several maintenance jobs that were run by SQL Agent. When we phased out the service, those jobs were disabled, but not deleted. Our security policy tracks the last time a login was used; if a login isnâ€™t used within 60 days, itâ€™s disabled. 30 days after that (if no one notices), the login is deleted. Unfortunately, our implementation of this process missed two key steps:
- The associated user in each database was not dropped with the login (leaving an orphan), and
- any job that was owned by that login was also not dropped or transferred to a sysadmin.
The latter was the key to our particular situation; the upgrade detected an orphaned job even though that job was disabled, and blocked the upgrade from going forward. Using trace flag â€“T902, we were able to start the server instance and delete the disabled job. We then restarted the server without the trace flag, and the upgrade finished successfully.
Ah, summertime; time for the annual â€ścommunity crisisâ€ť for the Professional Association for SQL Server. Iâ€™ve tried to stay clear of controversies for the last couple of years, but itâ€™s very hard to be a member of such a passionate group of professionals and not have an opinion of the latest subject dâ€™jour. The short form of the crisis is that thereâ€™s questions about how and why sessions get selected to present at the highly competitive Summit this year (disclaimer: I got selected to present this year). For more details, hereâ€™s a few blog posts on the subject:
The point of my post is not to rehash the issue or sway your opinion, dear reader, but rather to focus on a single tiny step in the right direction that Iâ€™ve decided to make. One of the big issues that struck me about the whole controversy is the lack of a repeatable objective tool for speaker evaluations. As a presenter, I donâ€™t always get feedback, and when I do, the feedback form varies from event to event, meeting to meeting. Selection committees are forced to rely on my abstract-writing skills and/or my reputation as a presenter; you can obfuscate my identity on the abstract, but itâ€™s tough to factor in reputation if do that.
While I agree that there are questions about the process that should be asked and ultimately answered, thereâ€™s very little that I can do to make a difference in the way sessions get selected. However, as a presenter, and a chapter leader for one of the largest chapters in the US, I can do a little something.
- I am personally committing to listing every presentation I make on SpeakerRate.com, and soliciting feedback on every presentation. To quote Bleachers, â€śI wanna get betterâ€ť.
- I will personally encourage every presenter at AtlantaMDF to set up a profile and evaluation at SpeakerRate for all presentations going forward.
- We will find ways to make feedback electronic and immediate at the upcoming Atlanta SQLSaturday so that presenters can use that information going forward.
- I will champion the evaluation process with my chapter members and speakers, and continue to seek out methods to improve and standardize the feedback process.
Do I have all of the right answers? No. For example, SpeakerRate.com seems to be barely holding on to life; no mobile interface, and a lack of commitment from its members seems to indicate that the site is dying a slow death. However, I havenâ€™t found an alternative to provide a standard, uniform measure of presentation performance.
Do I think this will provide a major change to the PASS Summit selection? Nope. But I do think that a sea change has to start somewhere, and if enough local chapters get interested in a building a culture of feedback and evaluation, that could begin to flow up to the national level.
So, this announcementâ€™s way overdue; in about 2 weeks (July 11-12,2014), Iâ€™ll be presenting a couple of sessions at CodeStock 2014 in lovely Knoxville, TN. I havenâ€™t been to CodeStock since 2009, so itâ€™ll be interesting to see how itâ€™s grown.
The Elephant in the Room; A DBA’s Guide to Hadoop & Big Data by Stuart Ainsworth
DATE & TIME
Jul 11th at 9:55 AM until 11:05 AM
RATING (0 VOTES)
Speaker(s): Stuart Ainsworth
The term "Big Data" has risen to popularity in the last few years, and encompasses data platforms outside of the traditional RDBMS (like SQL Server). The purpose of this session is to introduce SQL Server DBA’s to Hadoop, and to promote understanding of how schema-less data can be collected and accessed in coordination with the traditional SQL models. We’ll cover the basic vocabulary of the Hadoop platform, Microsoft’s integration efforts, and demonstrate how to get started with "big data".
Managing a Technical Team: Lessons Learned by Stuart Ainsworth
DATE & TIME
Jul 12th at 11:10 AM until 12:20 PM
RATING (0 VOTES)
Speaker(s): Stuart Ainsworth
I got promoted to management a couple of years ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background.
I know Iâ€™m a day late with this announcement, but I havenâ€™t blogged in months, so whatâ€™s the rush? I am very excited, however, about presenting a full session and a lightning talk at the PASS Summit in Seattle in November.
THE ELEPHANT IN THE ROOM: A DBA’S GUIDE TO HADOOP AND BIG DATA
Speaker(s): Stuart Ainsworth
Duration: 75 minutes
Track: BI Platform Architecture, Development & Administration
You’re a SQL Server DBA working at Contoso and your boss calls you out of your cubicle one day and tells you that the development team is interested in implementing a Hadoop-based solution to your customers. She wants you to help plan for the implementation and ongoing administration. Where do you begin?
This session will cover the foundations of Hadoop and how it fundamentally differs from the relational approach. The goal is to provide a map between your current skill set and "big data.â€ť Although we’ll talk about basic techniques for querying data, the focus is on basic understanding how Hadoop works, how to plan for growth, and what you need to do to start maintaining a Hadoop cluster.
You won’t walk out of this session a Hadoop administrator, but you’ll understand what questions to ask and where to start looking for answers.
Speaker(s): Stuart Ainsworth
Duration: 10 minutes
Track: Professional Development
The goal of this Lightning Talk is to cover the basic principles of the Lean IT movement, and demonstrate how Kanban can be used by Administrators as well as developers. Speaker Stuart Ainsworth will cover the basic concepts of Kanban, where to begin, and how it works.
Kanban boards can be used to highlight bottlenecks in resource and task management, as well as identify priorities and communicate expectations. All this can be done by using some basic tools that can be purchased at an office supply store (or done for free online).
Next Tuesday, Iâ€™m loading up Big Blue, and driving over to Birmingham to present at the Steel City SQL Users Group. Iâ€™ll be talking about the Agile DBA. Should be fun!
The Agile DBA: Managing your To-Do List
Speaker: Stuart Ainsworth
Summary: Agile development is all the rage, but how do the principles apply to database administrators? This presentation will introduce the basics of the Agile Manifesto, and explain how they can be applied to non-development IT work, such as database administration, maintenance, and support. Weâ€™ll cover scrum (one of the most popular development methodologies) and kanban, and identify some of the common struggles with implementing them in an organization. This is an interactive discussion; please bring your tales of success and your horror stories.
About Stuart: Stuart Ainsworth (MA, MEd) is a manager working in the realm of financial information security. Over the past 15 years, he’s worked as a research analyst, a report writer, a DBA, a programmer, and a public speaking professor. In his current role, he’s responsible for the maintenance of a data analysis operation that processes several hundred million rows of data per day.
So after my last post, my developer friend came back to me and noted that I hadnâ€™t really demonstrated the situation we had discussed; our work was a little more challenging than the sample script I had provided. In contrast to what I previously posted, the challenge was to delete nodes where a sub-node contained an attribute of interest. Let me repost the same sample code as an illustration:
DECLARE @X XML = '<root> <class teacher="Smith" grade="5"> <student name="Ainsworth" /> <student name="Miller" /> </class> <class teacher="Jones" grade="5"> <student name="Davis" /> <student name="Mark" /> </class> <class teacher="Smith" grade="4"> <student name="Baker" /> <student name="Smith" /> </class> </root>' SELECT @x
If I wanted to delete the class nodes which contain a student node with a name of â€śMillerâ€ť, there are a couple of ways to do it; the first method involves two passes:
SET @X.modify('delete /root/class//.[@name = "Miller"]/../*') SET @X.modify('delete /root/class[not (node())]') SELECT @x
In this case, we walk the axis and find a node test of class (/root/class); we then apply a predicate to look for an attribute of name with a value of Miller ([@name=â€ťMillerâ€ť]) in any node below the node of class (//.). We then walk back up a node (/..), and delete all subnodes (/*).
That leaves us with an XML document that has three nodes for class, one of which is empty (the first one). We then have to do a second pass through the XML document to delete any class node that does not have nodes below it (/root/class[not (node())]).
The second method accomplishes the same thing in a single pass:
SET @x.modify('delete /root/class[student/@name="Miller"]') SELECT @x
In this case, walk the axis to class (/root/class), and then apply a predicate that looks for a node of student with an attribute of name with a value of Miller ([student/@name=â€ťMillerâ€ť); the difference in this syntax is that the pointer for the context of the delete statement is left at the specific class as opposed to stepping down a node, and then back up.
Quick blog post; got asked today by a developer friend of mine about how to delete nodes in an XML fragment using the .modify() method. After some head-scratching and some fumbling around (its been a few months since Iâ€™ve done any work with XML), we came up with a version of the following script:
DECLARE @X XML = '<root> <class teacher="Smith" grade="5"> <student name="Ainsworth" /> <student name="Miller" /> </class> <class teacher="Jones" grade="5"> <student name="Davis" /> <student name="Mark" /> </class> <class teacher="Smith" grade="4"> <student name="Baker" /> <student name="Smith" /> </class> </root>' SELECT @x --delete the classes that belong to teacher Smith SET @X.modify('delete /root/class/.[@teacher="Smith"]') SELECT @X
Now, let me try to explain it:
- Given a simple document that has a root with classes, and students in each class, we want to delete all classes that are being taught by a teacher named â€śSmithâ€ť.
- First, we delete the nodes under those classes that belong to Smith
- Using XPath, we walk the axis and use a node test to restrict to /root/class/. (the current node under class).
- We then apply a predicate looking for a teacher attribute with a value of â€śSmithâ€ť
- The .modify() clause applies the delete command to the @X variable, and updates the XML
So a few weeks ago, I mentioned that I was starting to diversify my data interests in hopes of steering my career path a bit; Iâ€™ve built a home brewed server, and downloaded a copy of the Hortonworks Sandbox for Hadoop. Iâ€™ve started working through a few tutorials, and thought I would share my experiences so far.
I donâ€™t have a lot of free cash to setup a super-duper learning environment, but I wanted to do something on-premise. I know that Microsoft has HDInsight, the cloud-based version of Hortonworks, but Iâ€™m trying to understand the administrative side of Hadoop as well as the general interface. I opted to upgrade my old fileserver to a newer rig; costs ran about $600 for the following:
ASUS|M5A97 R2.0 970 AM3+ Motherboard
AMD|8-CORE FX-8350 4.0G 8M CPU
8Gx4|GSKILL F3-1600C9Q-32GSR Memory
DVD BURN SAMSUNG | SH-224DB/BEBE DVD Burner
I already had a case, power supply, and a couple of SATA drives (sadly, my IDEâ€™s no longer work; also the reason for purchasing a DVD burner). I also had a licensed copy of Windows 7 64 bit, as well as a few development copies for Microsoft applications from a few years ago (oh, how I wish I was an MVPâ€¦.).
As a sidebar, I will NEVER purchase computer equipment from Newegg again; their customer service was horrible. A few pins were bent on the CPU, and it took nearly 30 days to get a replacement, and most of that time was spent with little or no notification.
I downloaded and installed the Hortonworks Sandbox using the VirtualBox version. Of course, I had to reinstall after a few tutorials because I had skipped a few steps; after going back and following the instructions, everything is just peachy. One of the nice benefits of the Virtualbox setup is that once I fire up the Hortonworks VM on my server, I can use a web browser on my laptop pointed to the serverâ€™s IP address with the appropriate port added (e.g., xxx.xxx.xxx.xxx:8888), and bam, Iâ€™m up and running.
Working my way through a few tutorials
First, I have to say, I really like the way the Sandbox is organized; itâ€™s basically two frames: the tutorials on the left, and the actual interface into a working version of Hadoop on the right. It makes it very easy to go through the steps of the tutorial.
The Sandbox has lots of links and video clips to help augment the experience, but itâ€™s pretty easy to get up and running on Hadoop; after only a half-hour or so of clicking through the first couple of tutorials, I got some of the basics down for understanding what Hadoop is (and is not); below is a summary of my initial thoughts (WARNING: these may change as I learn more).
- Hadoop is comprised of several different data access components, all of which have their own history. Unlike a tool like SQL Server Management Studio, the experience may vary depending on what tool you are using at a given time. The tools include (but are not limited to):
- Beeswax (Hive UI): Hive is a SQL-like language, and so the UI is probably the most familiar to those of us with RDBMS experience. Itâ€™s a query editor.
- Pig is a procedural language that abstracts the data manipulation away from MapReduce (the underlying engine of Hadoop). Pig and Hive have some overlapping capabilities, but there are differences (many of which Iâ€™m still learning).
- HCatalog is a relational abstraction of data across HDFS (Hadoop Distributed File System); think of it like the DDL of SQL. It defines databases and tables from the files where your actual data is stored; Hive and Pig are like DML, interacting with the defined tables.
- A single-node Hadoop cluster isnâ€™t particularly interesting; the fun part will come later when I set up additional nodes.
Recently, thereâ€™s been a couple of great posts about the Death of the Database Administrator, including a response by Steve Jones and a several reactions by the staff of SQL Server Pro; the central premise behind the supposed demise revolves around this one major thought:
The evil cloud has reduced the need for internal systems infrastructure, including database administration. Itâ€™s a storm of needs for faster development (agility) and the rise of hosted services; who needs a database server, when you can rent space on Azure? Please note that Iâ€™m not specifically anti-cloud, but Iâ€™m casting it as the villain when careers are on the line.
Furthermore, in shops where the cloud is banned (e.g., financial services), developers are using tools like Entity Framework to write SQL for them. Tuning SQL thus becomes an application change as opposed to a stored procedure change; DBAâ€™s who do performance tuning have to focus on index maintenance and hardware acquisition. Code tuning is now part of the development domain, and the career of the pure SQL developer is gasping in comparison.
Like all great controversial statements, thereâ€™s an element of truth; the cloud, agile approaches, and new technologies are reducing the need for traditional database administrators, but I think weâ€™re a long way away from pulling the plug. However, I will say that over the next decade, these trends will probably continue to grow, eating away at the availability of jobs that do strict database administration (and the SQL developer will probably expire altogether). But not yet.
What this does mean is that if you are intending to be employed 10 years from now, and youâ€™re a database administrator, youâ€™ve got two choices to make today:
- Master a specialty. If youâ€™re planning on consulting for a living, this is a great choice. Get so intimate with the database product of your choice that you become the go-to person for problem-solving. Companies that have large installations of SQL Server will need secondary support as the product becomes easier to maintain (and big problems get obfuscated by GUIâ€™s).
- Expand your horizon. Instead of focusing on super in-depth mastery of your database platform, broaden your perspective; if youâ€™re a SQL Server guy like me, start learning a little bit about SSRS, SSAS, and SSIS (if you donâ€™t already know it). Spread out into Hadoop, and NoSQL; dabble in MySQL and SQLLite. Understand what the cloud can do, and where it makes sense to use it.
So go deep or go broad, but go. I wouldnâ€™t start quaking in my boots just yet about the demise of your career, but change is coming; those who adapt, survive.
For me? Iâ€™m going broad. Iâ€™ve built a home-brewed server, and downloaded a copy of the HortonWorks Hadoop Sandbox. Stay tuned for my adventures with Hadoop.