SQLServerPedia Syndication

Speaking at CodeStock 2014

So, this announcement’s way overdue; in about 2 weeks (July 11-12,2014), I’ll be presenting a couple of sessions at CodeStock 2014 in lovely Knoxville, TN.   I haven’t been to CodeStock since 2009, so it’ll be interesting to see how it’s grown.

The Elephant in the Room; A DBA’s Guide to Hadoop & Big Data by Stuart Ainsworth

DATE & TIME

Jul 11th at 9:55 AM until 11:05 AM

TRACK

LOCATION

400b

RATING (0 VOTES)

0

Speaker(s): Stuart Ainsworth
The term "Big Data" has risen to popularity in the last few years, and encompasses data platforms outside of the traditional RDBMS (like SQL Server). The purpose of this session is to introduce SQL Server DBA’s to Hadoop, and to promote understanding of how schema-less data can be collected and accessed in coordination with the traditional SQL models. We’ll cover the basic vocabulary of the Hadoop platform, Microsoft’s integration efforts, and demonstrate how to get started with "big data".

 

Managing a Technical Team: Lessons Learned by Stuart Ainsworth

DATE & TIME

Jul 12th at 11:10 AM until 12:20 PM

TRACK

LOCATION

400b

RATING (0 VOTES)

0

Speaker(s): Stuart Ainsworth
I got promoted to management a couple of years ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background.

Speaking at the #SQLPASS #Summit14

I know I’m a day late with this announcement, but I haven’t blogged in months, so what’s the rush?  I am very excited, however, about presenting a full session and a lightning talk at the PASS Summit in Seattle in November.

THE ELEPHANT IN THE ROOM: A DBA’S GUIDE TO HADOOP AND BIG DATA

Speaker(s)Stuart Ainsworth

Duration: 75 minutes

Track: BI Platform Architecture, Development & Administration

You’re a SQL Server DBA working at Contoso and your boss calls you out of your cubicle one day and tells you that the development team is interested in implementing a Hadoop-based solution to your customers. She wants you to help plan for the implementation and ongoing administration. Where do you begin?

This session will cover the foundations of Hadoop and how it fundamentally differs from the relational approach. The goal is to provide a map between your current skill set and "big data.” Although we’ll talk about basic techniques for querying data, the focus is on basic understanding how Hadoop works, how to plan for growth, and what you need to do to start maintaining a Hadoop cluster.

You won’t walk out of this session a Hadoop administrator, but you’ll understand what questions to ask and where to start looking for answers.

 

TEN-MINUTE KANBAN

Speaker(s)Stuart Ainsworth

Duration: 10 minutes

Track: Professional Development

The goal of this Lightning Talk is to cover the basic principles of the Lean IT movement, and demonstrate how Kanban can be used by Administrators as well as developers. Speaker Stuart Ainsworth will cover the basic concepts of Kanban, where to begin, and how it works.

Kanban boards can be used to highlight bottlenecks in resource and task management, as well as identify priorities and communicate expectations. All this can be done by using some basic tools that can be purchased at an office supply store (or done for free online).

Steel City SQL Users Group–March 18, 2014– @SteelCitySQL

Next Tuesday, I’m loading up Big Blue, and driving over to Birmingham to present at the Steel City SQL Users Group.  I’ll be talking about the Agile DBA.  Should be fun!

http://www.steelcitysql.org/

Featured Presentation

The Agile DBA: Managing your To-Do List

Speaker: Stuart Ainsworth

Summary: Agile development is all the rage, but how do the principles apply to database administrators? This presentation will introduce the basics of the Agile Manifesto, and explain how they can be applied to non-development IT work, such as database administration, maintenance, and support. We’ll cover scrum (one of the most popular development methodologies) and kanban, and identify some of the common struggles with implementing them in an organization. This is an interactive discussion; please bring your tales of success and your horror stories.

About Stuart: Stuart Ainsworth (MA, MEd) is a manager working in the realm of financial information security. Over the past 15 years, he’s worked as a research analyst, a report writer, a DBA, a programmer, and a public speaking professor. In his current role, he’s responsible for the maintenance of a data analysis operation that processes several hundred million rows of data per day.

SQL Server XQuery: MORE deleting nodes using .modify()

So after my last post, my developer friend came back to me and noted that I hadn’t really demonstrated the situation we had discussed; our work was a little more challenging than the sample script I had provided.  In contrast to what I previously posted, the challenge was to delete nodes where a sub-node contained an attribute of interest.  Let me repost the same sample code as an illustration:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

If I wanted to delete the class nodes which contain a student node with a name of “Miller”, there are a couple of ways to do it; the first method involves two passes:

SET @X.modify('delete /root/class//.[@name = "Miller"]/../*')
SET @X.modify('delete /root/class[not (node())]')
SELECT @x

In this case, we walk the axis and find a node test of class (/root/class); we then apply a predicate to look for an attribute of name with a value of Miller ([@name=”Miller”]) in any node below the node of class (//.).  We then walk back up a node (/..), and delete all subnodes (/*).

That leaves us with an XML document that has three nodes for class, one of which is empty (the first one).  We then have to do a second pass through the XML document to delete any class node that does not have nodes below it (/root/class[not (node())]).

The second method accomplishes the same thing in a single pass:

SET @x.modify('delete /root/class[student/@name="Miller"]')
SELECT @x

In this case, walk the axis to class (/root/class), and then apply a predicate that looks for a node of student with an attribute of name with a value of Miller ([student/@name=”Miller”); the difference in this syntax is that the pointer for the context of the delete statement is left at the specific class as opposed to stepping down a node, and then back up.

SQL Server XQuery: deleting nodes using .modify()

Quick blog post; got asked today by a developer friend of mine about how to delete nodes in an XML fragment using the .modify() method.  After some head-scratching and some fumbling around (its been a few months since I’ve done any work with XML), we came up with a version of the following script:

DECLARE @X XML = 
'<root>
  <class teacher="Smith" grade="5">
    <student name="Ainsworth" />
    <student name="Miller" />
  </class>
  <class teacher="Jones" grade="5">
    <student name="Davis" />
    <student name="Mark" />
  </class>
  <class teacher="Smith" grade="4">
    <student name="Baker" />
    <student name="Smith" />
  </class>
</root>'

SELECT  @x

--delete the classes that belong to teacher Smith
SET @X.modify('delete /root/class/.[@teacher="Smith"]')
SELECT @X 

Now, let me try to explain it:

  1. Given a simple document that has a root with classes, and students in each class, we want to delete all classes that are being taught by a teacher named “Smith”.
  2. First, we delete the nodes under those classes that belong to Smith
    1. Using XPath, we walk the axis and use a node test to restrict to /root/class/. (the current node under class).
    2. We then apply a predicate looking for a teacher attribute with a value of “Smith”
    3. The .modify() clause applies the delete command to the @X variable, and updates the XML

First few bites of the elephant: working with Hortonworks Hadoop

So a few weeks ago, I mentioned that I was starting to diversify my data interests in hopes of steering my career path a bit; I’ve built a home brewed server, and downloaded a copy of the Hortonworks Sandbox for Hadoop.  I’ve started working through a few tutorials, and thought I would share my experiences so far.

My setup….

I don’t have a lot of free cash to setup a super-duper learning environment, but I wanted to do something on-premise.  I know that Microsoft has HDInsight, the cloud-based version of Hortonworks, but I’m trying to understand the administrative side of Hadoop as well as the general interface.  I opted to upgrade my old fileserver to a newer rig; costs ran about $600 for the following:

ASUS|M5A97 R2.0 970 AM3+ Motherboard   
AMD|8-CORE FX-8350 4.0G 8M CPU   
8Gx4|GSKILL F3-1600C9Q-32GSR Memory   
DVD BURN SAMSUNG | SH-224DB/BEBE  DVD Burner

I already had a case, power supply, and a couple of SATA drives (sadly, my IDE’s no longer work; also the reason for purchasing a DVD burner).  I also had a licensed copy of Windows 7 64 bit, as well as a few development copies for Microsoft applications from a few years ago (oh, how I wish I was an MVP….).

As a sidebar, I will NEVER purchase computer equipment from Newegg again; their customer service was horrible.  A few pins were bent on the CPU, and it took nearly 30 days to get a replacement, and most of that time was spent with little or no notification.

I downloaded and installed the Hortonworks Sandbox using the VirtualBox version.  Of course, I had to reinstall after a few tutorials because I had skipped a few steps; after going back and following the instructions, everything is just peachy.  One of the nice benefits of the Virtualbox setup is that once I fire up the Hortonworks VM on my server, I can use a web browser on my laptop pointed to the server’s IP address with the appropriate port added (e.g., xxx.xxx.xxx.xxx:8888), and bam, I’m up and running.

Working my way through a few tutorials

First, I have to say, I really like the way the Sandbox is organized; it’s basically two frames: the tutorials on the left, and the actual interface into a working version of Hadoop on the right.  It makes it very easy to go through the steps of the tutorial.

image

The Sandbox has lots of links and video clips to help augment the experience, but it’s pretty easy to get up and running on Hadoop; after only a half-hour or so of clicking through the first couple of tutorials, I got some of the basics down for understanding what Hadoop is (and is not); below is a summary of my initial thoughts (WARNING: these may change as I learn more).

Summary:

  • Hadoop is comprised of several different data access components, all of which have their own history.  Unlike a tool like SQL Server Management Studio, the experience may vary depending on what tool you are using at a given time.  The tools include (but are not limited to):
    • Beeswax (Hive UI): Hive is a SQL-like language, and so the UI is probably the most familiar to those of us with RDBMS experience.  It’s a query editor.
    • Pig is a procedural language that abstracts the data manipulation away from MapReduce (the underlying engine of Hadoop).  Pig and Hive have some overlapping capabilities, but there are differences (many of which I’m still learning).
    • HCatalog is a relational abstraction of data across HDFS (Hadoop Distributed File System); think of it like the DDL of SQL.  It defines databases and tables from the files where your actual data is stored; Hive and Pig are like DML, interacting with the defined tables.
  • A single-node Hadoop cluster isn’t particularly interesting; the fun part will come later when I set up additional nodes.

The Evolution of the DBA

Recently, there’s been a couple of great posts about the Death of the Database Administrator, including a response by Steve Jones and a several reactions by the staff of SQL Server Pro; the central premise behind the supposed demise revolves around this one major thought:

 

The evil cloud has reduced the need for internal systems infrastructure, including database administration.  It’s a storm of needs for faster development (agility) and the rise of hosted services; who needs a database server, when you can rent space on Azure?   Please note that I’m not specifically anti-cloud, but I’m casting it as the villain when careers are on the line.

Furthermore, in shops where the cloud is banned (e.g., financial services),  developers are using tools like Entity Framework to write SQL for them. Tuning SQL thus becomes an application change as opposed to a stored procedure change; DBA’s who do performance tuning have to focus on index maintenance and hardware acquisition.  Code tuning is now part of the development domain, and the career of the pure SQL developer is gasping in comparison.   

Like all great controversial statements, there’s an element of truth; the cloud, agile approaches, and new technologies are reducing the need for traditional database administrators, but I think we’re a long way away from pulling the plug.  However, I will say that over the next decade, these trends will probably continue to grow, eating away at the availability of jobs that do strict database administration (and the SQL developer will probably expire altogether).  But not yet.

What this does mean is that if you are intending to be employed 10 years from now, and you’re a database administrator, you’ve got two choices to make today:

  1. Master a specialty.  If you’re planning on consulting for a living,  this is a great choice.  Get so intimate with the database product of your choice that you become the go-to person for problem-solving.  Companies that have large installations of SQL Server will need secondary support as the product becomes easier to maintain (and big problems get obfuscated by GUI’s).
  2. Expand your horizon.  Instead of focusing on super in-depth mastery of your database platform, broaden your perspective; if you’re a SQL Server guy like me, start learning a little bit about SSRS, SSAS, and SSIS (if you don’t already know it).  Spread out into Hadoop, and NoSQL; dabble in MySQL and SQLLite.  Understand what the cloud can do, and where it makes sense to use it.

So go deep or go broad, but go.  I wouldn’t start quaking in my boots just yet about the demise of your career, but change is coming; those who adapt, survive.

For me? I’m going broad.  I’ve built a home-brewed server, and downloaded a copy of the HortonWorks Hadoop Sandbox.  Stay tuned for my adventures with Hadoop.

MaTT: The MARS framework for DBA’s

So, as usual, I’m struggling to sit down and write; I know I should (it’s good for the soul), but frankly, I’m struggling to put words to web page.  As a method of jumpstarting my brain, I thought I would write about something simple and relevant to what I’m doing these days in my primary capacity: management.

When I took over the reins of a newly formed department of DBA’s, I knew that I needed to do something quick to demonstrate the value of our department to our recently re-organized company.  We’re a small division in our company, but we manage some relatively large databases (17 TB of data, with a relatively high daily change rate; approximately 10 TB of data change daily).  My team was comprised of senior DBA’s who were inundated with support requests (from “I need to know this information” to “I need help cleaning up this client’s data”); while they had monitoring structures in place, it wasn’t uncommon for things to go unnoticed (unplanned database growth, poor performing queries, etc.).   One of my first acts as a manager was to put in a system of classification I called MARS; all of our work as DBA’s needed to be categorized into one of four broad groups.

Maintenance, Architecture, Research, and Support

The premise is simple; by categorizing efforts, we could measure where the focus of our department was, and begin to allocate resources into the proper arenas.  I defined each of the four areas of work as such:

  • Maintenance – the efforts needed to keep the system performing well; backups, security, pro-active query tuning, and general monitoring are examples.
  • Architecture – work associated with the deployment of new features, functionality, or hardware; data sizing estimates, upgrades to SQL 2012, installation of Analysis services are examples.  To be honest, Infrastructure may have been a better term, but MIRS sounded stupid.
  • Research – the efforts to understand and improve employee skills; I’m a former teacher, and I put a pretty high value on lifelong learning.  I want my team to be recognized as experts, and the only way that can happen is if the expectation is there for them to learn.  
  • Support – the 800 lb gorilla in our shop; support efforts focus on incident management (to use ITIL terms) and problem resolution.  Support is usually instigated by some other group; for example, sales may request a contact list from our CRM that they can’t get through the interface, or we may get asked to explain why a ticket didn’t get generated for a customer.

After about a year of data gathering, I went back and thought about the categories a bit more, and realized that I could associate some descriptive adjectives with the workload to demonstrate where the heart of our efforts lies.  I took my cues from the JoHari window, and came up with the two axes: Actions: Proactive – Reactive, and Results: Delayed-Immediate.   I then arranged my four categories along those lines, like so:

 image

In other words, Maintenance was Proactive, but had Delayed results (you need to monitor your system for a while before you grasp the full impact of changes).  Research was more Reactive, because we tend to research issues that are spawned by some stimulus (“what’s the best way to implement Analysis Services in a clustered environment?” came up as a Research topic because we have a pending BI project). 

Immediate results came from Architectural changes; adding more spindles to our SAN changed our performance quickly, but there was Proactive planning involved before we made the change.  Support is Reactive, but has Immediate results; the expectation is that support issues get prioritized, so we try to resolve those quickly as part of our Operational Level Agreements with other departments. 

After a couple of months looking at our work load (using Kanban), I see that we still spend a lot of time in Support, but that effort is trending downward;  I continue to push Maintenance, Architecture, and Research over Support, and we’re becoming much more proactive in our approaches.  I’m not sure if this quadrant approach is the best way to represent workload, but it does give me a general rule-of thumb in helping guide our efforts.

SQLSaturday 285 (#sqlsatatl) pre-cons are now live!

SQL Saturday #285 is offering 3 preconference sessions on Friday, May 2 at the GSU campus in Alpharetta, site of SQL Saturday #285 on Saturday, May 3:
Kalen Delaney: What the Hekaton!? A Whole New Way to Think About Data Management

SQL Server Hekaton, Microsoft’s new In-Memory table technology being shipped as part of SQL Server 2014, will completely change the way you think about data management. As a DBA, you’ll need to analyze your memory and storage needs completely differently. All Hekaton data is always stored in memory, and the data stored on disk is basically just a REDO log used to regenerate the contents of your memory-optimized tables. In this full-day seminar, Kalen Delaney (a SQL Server MVP for over 20 years) will show you the in-memory architecture for your Hekaton data and indexes, and discuss what gets written to disk during checkpoints, as well as what gets logged. She will explain how the recovery process recreates your Hekaton tables. Finally, she’ll go into detail on just what it is that makes Hekaton so much FASTER!

 

Denny Cherry: SQL Performance Tuning & Optimization

In this session you will learn about SQL Server 2008 R2 and SQL Server 2012 performance tuning and optimization. Industry Expert Denny Cherry will guide you through tools and best practices for tuning queries and improving performance within Microsoft SQL Server.  This session will guide you through real life performance problems which have been gathered and tuned using industry standard best practices and real world skills.

 

 

Teo Lachev: Deep Dive into the Microsoft BI Semantic Model (BISM)

The chances are that your organization has a centralized data repository, such as ODS or a data warehouse, but you might not use it to the fullest. Join this insightful full-day event to understand the importance of having a semantic layer that bridges users and data. In the Microsoft BI world, BISM consists of Power Pivot, Tabular, and Multidimensional. 

All 3 presenters are published authors and Microsoft MVPs many times over. These sessions are a huge value to spend a day with an acknowledged SQL Server

thought leader.

Early registration is only $129 until March 1, when the price of any remaining seats will go up to $149. Follow @AtlantaMDF on Twitter and get a promo code for $10 off the early registration price (for Kalen Delaney or Denny Cherry)! We’ll tweet the promo code at 9am Thursday (Jan 23) – it’s only good for 10 uses (for each session), so be sure to check your Twitter feed tomorrow morning!

Back on the trail…. #sqlsatnash

I realize that I should probably be blogging about my New Year’s resolutions, but meh… I’ve been super busy surviving the holidays.  So busy in fact that I’ve failed to mention that I’ll be presenting at the SQLSaturday in Nashville on January 18, 2014.  I actually got selected to present TWO topics, which is HUGE for me.  Hoping that I can refine a presentation, and get ready for our own SQLSaturday in Atlanta.

Working with “Biggish Data”

Most database professionals know (from firsthand experience) that there continues to be a “data explosion”, and there’s been a lot of focus lately on “big data”. But what do you do when your data’s just kind of “biggish”? You’re managing Terabytes, not Petabytes, and you’re trying to squeeze out as much performance out of your aging servers as possible. The focus of this session is to identify some key guidelines for the design, management, and ongoing optimization of “larger-than-average” databases. Special attention will be paid to the following areas: * query design * logical and physical data structures * maintenance & backup strategies

Managing a Technical Team: Lessons Learned

I got promoted to management a year ago, and despite what I previously believed, there were no fluffy pillows and bottles of champagne awaiting me. My team liked me, but they didn’t exactly stoop and bow when I entered the room. I’ve spent the last year relearning everything I thought I knew about management, and what it means to be a manager of a technical team. This session is intended for new managers, especially if you’ve come from a database (or other technical) background; topics we’ll cover will include:*How to let go of your own solutions. *Why you aren’t the model you think you are, and *Why Venn diagrams are an effective tool for management.