Where does my index live? YouTube edition–#SQLServer

I was recently contacted by Webucator, an online training services provider, and asked if they could turn a recent post of mine (#SQLServer – Where does my index live?) into a video.  They are promoting their SQL Server classes by doing a free series called SQL Server Solutions from the Web using (with permission) different blog posts from around the web.   Wish I’d thought of this sooner; enjoy!

December 9, 2014 · stuart · No Comments
Tags:  · Posted in: SQL, SQLServerPedia Syndication

#SQLPASS – Fluffy Bunnies

File:Fluffy white bunny rabbit.jpgI didn’t really have a good name for this post, so I thought I’d just try to pick something as non-offensive as possible.  Everybody likes bunnies, right?  Anyway, the last series of posts that I’ve made regarding the Professional Association for SQL Server has raised a number of questions that I thought I’d strive to answer; since I’m still mulling over my next post, I figured this was as good a time as any. 

What’s your motivation in writing this series?

Believe it or not, I’m trying to help.  For several years, it seems like there’s been one controversy after the other within the Professional Association of SQL Server, and those controversies dissipate and re-emerge.  My goal is to document what I perceive to be the root causes for some of these issues so that we can work toward a solution.

 

Why are you bashing PASS?

I’m trying very hard to NOT “bash” the Professional Association for SQL Server; I’ve been a member for several years now, and I’ve seen controversies get personal.  I really don’t feel like I’m doing that; I still believe that the Board of Directors are a great bunch of people that are making decisions based on their circumstance at the time.  I’m trying to wrap a schema around those circumstances, so I can understand those decisions. 

I’m trying to articulate my own perspective on what I think is wrong, so that I can be better prepared to have an honest discussion about those perceptions.  Relationships aren’t about ignoring issues; the first step in addressing an issue is to identify the issue.

You mentioned “transparency” as an issue with the BoD before; what do you mean by that?

That’s more complicated to answer than I thought it would be.  At first, I thought it was more information; as an active community member, I felt like I needed to be more involved in the decision-making process.  However, I can’t really name a specific reform that I would make for the BoD to be more transparent.  I don’t want to read meeting minutes, and most of the form letter emails I get from the Association go straight to the trash; I’m too busy for them to be transparent.

On further consideration, what I’ve been calling transparency is more about leadership and up-front communication than it is about revealing information.   As an example, I help run one of the largest SQL Server chapters in the world; if I had realized that the Professional Association for SQL Server was planning a major re-branding exercise, I feel like I could have contributed some “notes from the field” on what that would mean, and how to best prepare our membership for it.  Instead of appearing defensive about a controversy, the Board would appear to be very proactive.

It’s that feeling of being left out of the conversation that bothers me, I think.  I realize that there’s some details that can’t be shared, but I feel like the onus is on the BoD to find better ways to communicate with members (and to me, chapters are an underused resources).  The information flow is very unidirectional; Summit keynotes and email blasts are not an effective way to discuss an issue.   Maybe those conversations are happening with other people, but if so, few people have stepped forward discussing them.

It’s also an issue of taking action when an issue is raised; for example, the recent passwordsecurity controversy was documented by Brent Ozar.  He states that he and several other security-oriented members had several private conversation with board members, and yet no comprehensive action was taken until it became a public issue.  The perception is that the only way to get the BoD to act is to publically shame them.   That’s not healthy in the long run.

What’s your vision for the Association?

That’s the subject of my next post, so I’m going to hold off on that one.  I do think that the unidirectional flow of communication is not just an issue with the organization, but that it’s a tone set by our relationship with Microsoft.  As I pointed out in my last post, if the Association knew more about the needs and desires of the membership, that knowledge becomes a very valuable resource.   Instead of being a fan club for a product, we could become strong partners in the development of features for that product.

How do you feel about the name change?

Frankly, I feel a little sad about the change in direction, but I understand it.  I predict that SQL Server as on premise-platform is going to become a niche product, regardless of the number of installations.  It just makes sense for Microsoft to broaden their data platform offerings.  I do think that we (the Association) need to do a better job of preparing membership for this new frontier, and it’s going to take efforts to transform administration skills into analytic skills. Without providing that guidance, it seems like we’re abandoning the people who helped build this organization.

Where is this series headed?  What’s the final destination?

To be honest, I don’t know.  When I first started writing these posts, I was sure that I could describe exactly what was wrong, and suggest a few fixes.  The more I write, the easier it is to articulate my observations; I’m surprised to find that my observations are not what I thought they would be.  Thanks for joining me for the ride; hopefully, it leads to some interesting conversations at Summit.

October 23, 2014 · stuart · 8 Comments
Tags: , ,  · Posted in: Blogging is FUN!, PASS, SQL Server, SQLServerPedia Syndication

#SQLPASS–Data Professionals?

So, in my last post, I described the financial pressures of community building; two companies benefit from building a community organization.  I’ve tried to stay away from assumptions, but I am assuming that their influence must factor into the Board of Directors’ decision making process (Microsoft has a seat on the board; C&C is dependent on the decisions that the BoD makes).  The metrics that matter most to Microsoft are the breadth of people interested in their product line, not the depth of knowledge attained by those people.

Influence isn’t a bad thing per se, but in my mind, it does explain why good people continue to make bad decisions, regardless of who gets elected to the board.   What do I mean by a bad decision?  In general, the Professional Association for SQL Server BoD remains a non-committal and opaque organization.  Board members have personally promised me that “they would look into something”, and yet the follow-thru never materialized; the opacity of the decision making process is documented by other other bloggers in posts like the following:

http://www.sqlservercentral.com/blogs/steve_jones/2010/06/30/pass_2C00_-don_1920_t-waste-my-time/

https://ozar.me/2014/09/bigger-passvotes-problem-password-shared/

http://sqlblog.com/blogs/andy_leonard/archive/2014/06/27/pass-and-summit-2014-session-selections.aspx

SIDEBAR: I will say that the Board continues to work on the transparency problem; Jen Stirrup and Tom LaRock have both stepped forward to explain decisions made.  However, such explanations are usually given after a controversy has occurred.

For a specific example, I want to focus on the branding decision (the decision to remove SQL Server from marketing material for the Professional Association of SQL Server and to be know simply as PASS); the decision to move the organization away from its lingua franca of SQL Server to a new common language of “all things (Microsoft) data” is not in and of itself a bad thing.  Recent marketing trends from Microsoft indicate that the traditional role of the DBA is continuing to evolve; as individuals, we need to evolve as well.

However, as database professionals (or data professionals), we should be inclined to make decisions based on data.  As Jen Stirrup herself says:

I think it’s important to have a data-based, fact based look at the business analytics sphere generally. What does the industry say about where the industry is going? What does the data say? We can then look at how PASS fits in with this direction.

Jen’s post goes on to state some great statistics about the nature of the industry as a whole, but then uses some less concrete measures (growth of the BA/BI Virtual Chapters) to identify support within the organization.  I generally agree with her conclusions, but I’m concerned about several unanswered questions, most of them stemming from two numbers:

  • Association marketing materials claim we have reached over 100,000 professionals, and
  • 11,305 members were eligible to vote (a poor measure of involvement, but does indicate recent interaction).

I look at those two numbers and wonder why that gap is there; just for simplicity’s sake, let’s say that 90% of “members” have not updated their profile.  Why?  What could the Association have done to reach those members?  Who are those members?   What are their interests?  What’s a better metric for gauging active membership?

Of course, once I start asking questions, I begin to ask more questions: How many members don’t fit into Microsoft’s vision of cloud-based computing?   How many members use multiple technologies from the Microsoft data analysis stack? What skills should they be taught?  What skills do they have?  What features do they want?  The short answer: we don’t know.

As far as I know, there has been no large scale data collection effort by the Board of Directors to help guide their decisions; in the absence of data, good managers make a decision based on experience, but then strive to collect data to help with future decisions.  Continuing to rely on experience and marketing materials without investing in understanding member concern, desires, and input is simply put, a bad decision.

Shifting an organization that shared a common love for a particular technology to an organization that is more generally interested in data is a huge undertaking; overlooking the role that the community should have in determining the path of that transition is an oversight.   I don’t think the Professional Association for SQL Server is going to revert back to a technology-specific focus; that would be inconsistent with the changing nature of our profession.  However, the Board needs to continue to understand who the membership is, and how the organization can help a huge number of SQL Server professionals transition to “data professionals”.   Building a bigger umbrella may help the organization grow; investing in existing community members will help the organization succeed.

October 21, 2014 · stuart · 7 Comments
Tags: , ,  · Posted in: Blogging is FUN!, PASS, SQLServerPedia Syndication, The Social Web

#SQLPASS–Who’s Making It Rain?

 

As promised in my previous post (#SQLPASS–Good people, bad behavior…), I’d like to start diving in to some of the controversies that have cropped up in the last year and critically analyze what I consider to be “bad decisions”.  This first one is complex, so let me try to sum up the players involved first (with yet another post to follow about the actual decision).  Please note that I am NOT a fan of conspiracy theories (no evil masterminds plotting to rule SQL Server community), so I’m trying to avoid inferring too much about motive, and instead focusing on observable events.

A lot of the hubbub over the last couple of weeks about the Professional Association for SQL Server wasn’t just about the election or the password controversy, but about the decision to become simply PASS in all marketing materials (gonna need a new hashtag for twitter). So much controversy, in fact, that Tom LaRock, current Board President, wrote an excellent blog post about building a bigger umbrella for Mike.  I applaud Tom for doing this; it’s a vision, and that’s a great thing to have.  However, I wanted to take this metaphor, and turn it on its side; if we need umbrellas, then who’s making it rain?  Let’s take a look at the pieces of the puzzle.

 

Community as Commodity

To figure out the rainmakers, we need to define what the value of the Professional Association for SQL Server is.  If you’re reading this post, I bet you can look in a mirror and figure it out.  It’s you.  Your passion, your excitement, your interest in connecting and learning about SQL Server is the commodity provided by the organization.  We (the community) have reached a certain maturity in our growth as a commodity; we recruit new members through our enthusiasm, and we contribute a lot of free material to the knowledge base for SQL Server.  At this point, it’s far easier to grow our ranks than it would be to start over.   

However, the question I would ask is: what do YOU get out of membership?  For most of us, it’s low-to-no cost training (most of which is provided by other community members).   The association provides a conduit to connect us.   The value to you increases when you grow. Exposure to new ideas, new topics, a deeper understanding of the technology you use; all of these are fuel for growth.  In short, as individuals, community members profit most from DEPTH of knowledge.

The more active you are in the community, the more likely you’ll be able to forage out valuable insight; how many of you are active in the Professional Association of SQL Server?   According to this tweet from the official twitter account, 11,305 people have active profiles with the organization.  While that’s not a great metric for monitoring knowledge seekers, it does provide some baseline of measure for people who care enough to change their profiles when prompted. 

 

Microsoft Needs A New Storm

The Professional Association for SQL Server was founded to build a community of database professionals with an interest in learning more about Microsoft SQL Server; the founding members of the organization were Microsoft and Computer Associates, who obviously saw the commodity in building a community of people excited about SQL Server.  The more knowledge about SQL Server in the wild, the more likely that software licenses and training will increase.  Giving away training and knowledge for a lost cost yields great dividends in the end.

This is not a bad thing at all; it’s exciting to have a vendor that gives away free stuff like training.  However, it appears that Microsoft is making a slight shift away from a focus on SQL Server.  What makes me think this?

  • It’s getting cloudy (boy, I could stretch this rain metaphor): software as a service (including SQL as a service) is a lot more profitable in the long run than software licensing.  By focusing more on cloud services (Azure), Microsoft is positioning itself as a low-to-no administration provider.  
  • Electricity (Power BIQuery): Microsoft is focusing pretty heavily on the presentation layer of traditional business intelligence, and touting how simple it is to access and analyze data from anywhere in Excel “databases”.  Who needs SQL Server when your data is drag-and-drop
  • The rebranding of SQL Server Parallel Data Warehouse: Data warehouse sounds like a database; Analytics Platform System sounds sexier, implying that your data structures are irrelevant.  Focus on what you want to do, not how to do it.

The challenge that Microsoft faces is that is has access to a commodity of SQL Server enthusiasts who don’t exactly fit the model of software-as-a-service; those of us that are comfortable with SQL Server on premise haven’t exactly made the leap to the cloud.  Also, many DBA’s dabble in Excel; they’re not Analytics practitioners.  In short, Microsoft has Joe DBA, but is looking for Mike Rosoft (see what I did there?), the Business Analyst.  Mike uses Microsoft tools to do things with data, not necessarily databases.  The problem?  Mike doesn’t have a home.   In order to maximize profits, Microsoft needs to invest in the growth of a larger and more diverse commodity.  In short, Microsoft wants a BROADER audience, but they want them to be excited and passionate about their technology.

Rain Dancing With C&C

The Professional Association for SQL Server has been managed by Christianson & Company since 2007.  While the Professional Association for SQL Server Board of Directors is made up of community volunteers, C&C is a growing corporation with the traditional goal of any good for-profit company: to make money.  How does C&C make money? They grow and sell a commodity.  If the Professional Association for SQL Server grows as an organization, C&C’s management of a larger commodity increases in value.   As far as I can tell, the Professional Association for SQL Server is C&C’s only client that is managed in this way.

The community gets free/low-cost training; C&C helps manage that training while diverting the cost to other players (i.e., Microsoft and other sponsors).  If Microsoft is looking for a broader commodity, C&C will be most successful if they can serve that BROADER audience.   The Professional Association for SQL Server’s website claims to serve a membership of 100,000+; that number includes every email address that has ever been used to register for any form of training from the association, including SQLSaturday’s, 24HOP, and Summit.  Bigger numbers means increased value when trying to build a bigger umbrella.

Yet, this 100,000+ membership is rarely reflected in anything other than marketing material.  Only 11,305 of them are eligible to vote; less (1,570) actually voted in the last election.  5,000 members are estimated to attend Summit 2014.  Perhaps the biggest measure of activity is the number of attendees at SQLSaturdays (18,362).  Any way you slice it, it seems to me that the number of people that are actively seeking DEEPER interactions are far fewer than the BROAD spectrum presented as members.  Furthermore, it would seem that reaching more than 100,000 members is challenging; if only 11,000 members are active in the community, and they’re the ones recruiting new members, how do you keep growing?  You reach out to a different audience.

 

Summary

I feel like it’s important to understand the commercial aspect of community building.  In short:

  • Microsoft needs to reach a broader audience by shifting focus from databases to simply data;
  • Christianson & Company will be able to grow as a company if they can help the Professional Association for SQL Server grow as a commodity;
  • The community has reached critical mass; it’s far easier to add to our community than it would be to build a new one.
  • The association has reached several members of the community (100,000+); far fewer of them are active  (11,305).

Where am I going with this?  That’s coming up in my next post.  While I don’t deny the altruism in the decision by the Board of Directors to reach out to a broader audience, I also think we (the commodity) should understand the financial benefits of building a bigger umbrella.

October 20, 2014 · stuart · 12 Comments
Tags: , , , ,  · Posted in: Blogging is FUN!, PASS, SQLServerPedia Syndication

#SQLPASS–Good people, bad behavior…

I’ve written and rewritten this post in my mind 100 times over the last couple of weeks, and I still don’t think it’s right.  However, I feel the need to speak up on the recent controversies brewing with the Professional Association for SQL Server’s BoD.  Frankly, as I’ve read most of the comments and discussions regarding the recent controversies (over the change in name, the election communication issues, and the password issues), my mind keeps wandering back to my time on the NomCom.

In 2010, when I served on the NomCom, I was excited to contribute to the electoral process; that excitement turned to panic and self-justification when I took a stance on the defensive side of a very unpopular decision.  I’m not trying to drag up a dead horse (mixed metaphor, I know), but I started out standing in a spot that I still believe is right:

The volunteers for the Professional Association of SQL Server serve with integrity.

Our volunteers act with best intentions, even when the outcomes of their decisions don’t sit well with the community at large.  However, we humans are often flawed in our fundamental attributions. When WE make a mistake, it’s because of the situation we are in; when somebody else makes a mistake, we tend to blame them.  We need to move past that, and start questioning decisions while empathizing with the people making those decisions.

In my case, feeling defensive as I read comments about “the NomCom’s lack of integrity” and conspiracy theories about the BoD influencing our decision, I moved from defending good people to defending a bad decision.  This is probably the first time that I’ve publically admitted this, but I believe that we in the NomCom made a mistake; I think that Steve Jones would have probably made a good Director.  Our intention was good, but something was flawed in our process.

However, this blog post is NOT about 2010; it’s about now.  I’ve watched as the Board of Directors continue to make bad decisions (IMO; separate blog forthcoming about decisions I think are bad ones), and some people have questioned their professionalism.  Others have expressed anger, while some suggest that we should put it all behind us and come together.  All of these responses are healthy as long as they separate the decisions made from the people making them, and that we figure out ways to make positive changes.  Good people make mistakes; good people talk about behaviors, and work to address them.

So, how do we work to address them?  The first step is admitting that there’s a problem, and it’s not the people.  Why am I convinced that it’s not the people?  Because every year we elect new people to the board, and every year there’s some fresh controversy brewing.  Changing who gets elected to the board doesn’t seem to seem to stimulate transparency or proactive communication with the community (two of the biggest issues with the Professional Association for SQL Server’s BoD).  In short, the system is not malleable enough to be influenced by good people.

I don’t really have a way to sum this post up; I wish I did.  All I know is that I’m inspired by the people who want change, and it saddens me that change seems to be stunted regardless of who gets elected.  Something’s gotta give at some point.

*************
Addendum: you may have noticed that I use the organization’s full legal name when referring to the Professional Association for SQL Server.  Think of it as my own (admittedly petty) response to the “we’re changing the name, but keeping our focus” decision.

October 2, 2014 · stuart · 8 Comments
Tags: , , ,  · Posted in: PASS, Professional Development, SQLServerPedia Syndication, The Social Web

What I Wished I’d Known Sooner As A DBA

Mike Walsh has put together an excellent topic for discussion, and I’ve read several responses to this; all of them are great, but most them are a little optimistic for my experiences.  I like to think that most of my friends and peers in the #sqlfamily are happy people, so it’s understandable that their guidance is gentle and well-intentioned.  Me? I’m happy.  I’m also pretty suspicious of other people, so without further ado, the 4 dark truths I wished I’d known sooner.

  1. Some people are out to get you.  Call it insecurity or good intentions gone bad, but some of your coworkers can’t take responsibility for their own actions and look to blame others.  If you’re moderately successful at your job, there’s probably at least one person that is jealous of your success and is looking for ways to bring you down.  Most people aren’t like this, and the trick is learning who’s a friend and who’s not.  Spend time with friends, and defend yourself against enemies.
  2. The best solution isn’t always the best solution.  Engineers love to SOLVE problems; we don’t just like to get close to an answer.  We want to beat it down.  Unfortunately, in the real world, perfect is the enemy of the good; solutions that are exhaustive and comprehensive on paper are usually time-sucks to implement. Don’t get so wed to a solution that you overlook the cost of implementation.
  3.  At some point, someone will judge your work and it won’t pass the bar.  It’s really easy to pick apart bad code from a vendor (or perhaps from your enemy from point 1 above); it’s hard to make that some sort of critical judgment about your own code.  However, if you’re not making mistakes today, then you’ve got nowhere to grow tomorrow.  Take it easy on other people, point out the flaws in a constructive fashion, and hope that somebody does the same to you someday.
  4. The customer isn’t always right, but they always think they are.  I’ve had customers argue with me for days about something that I could demonstrate was 100% wrong; it doesn’t matter, and at the end of the day the relationship with them as a customer was irreconcilable because of the argument rather than the initial facts.  I’m not saying that you should capitulate to every whim of the customer; however, it’s less important to be right than it is to build a relationship.  Relationships are built on truth and giving a little.  Compromise and move on (or decide that it is better to move apart).

August 28, 2014 · stuart · One Comment
Tags: , , ,  · Posted in: Blogging is FUN!, SQLServerPedia Syndication

Managing a Technical Team: Building Better

Heard a great podcast the other day from the team at Manager Tools, entitled “THE Development Question”.  I’m sorry to say that I can’t find it on their website, but it did show up in Podcast Addict for me, so hopefully you can pick it up and give it a listen.  I’ll sum up the gist here, but it’s really intended to be a starting point for this blog post.  In essence, Manager Tools says that when a direct approaches you (the manager) with a question, one of the best responses you can offer is another question:

“What do you think we should do?”

Their point is not that management is a game of Questions Only, but that leaders want to develop others and development comes through actions; employees have lots of reasons for asking questions, but a good manager should realize that employees need to be empowered and able to take action for most situations.  If an employee is constantly waiting on approval from the manager, then the manager becomes the bottleneck.

Mulling on this a couple days made me realize that there’s a potential hazard for most new technical managers related to the issue of employee development; are we doing enough to make our employees better engineers than we were?  Let me walk you through my thinking:

  1. Most new technical managers were promoted to their position from within their company, and it was usually because they were the best operator (i.e, someone who was skilled at their job as an engineer).
  2. Most new technical managers have a tough time separating themselves from their prior responsibilities, particularly if those prior responsibilities were very hands-on with a product\service\effort that’s still in use today (e.g., as a developer, John wrote most of the code for the current application; as a manager, John still finds himself supporting that code).
  3. If you were the best at what you did, that means that the people you now manage weren’t.  Actual skill level is debatable, but most of us take a lot of pride in what we do.  Pride can overemphasize our own accomplishments, while downplaying the accomplishments of others.

This is a problem for technical management, because the goal of a good manager is NOT to solve problems, but rather to increase efficiency.   Efficiency is best achieved by distribution; in other words, you as a technical manager could learn how to improve your own technical skills by 10%, but if your employees don’t grow, your team’s not really making progress.  On the other hand, if you invest in your directs’ growth and each f them improves their technical skills by 10%, it’s a bigger bang for your buck (unless you only have one employee; if that’s the case, polish your resume).

Here’s the kicker: Sacrificing your technical skills while building the skills of your employees will pay off more in the long run than continuing to build your own technical knowledge alone.  You WANT your employees to be better engineers than you were because you gain the advantage of the increased skills that brings to the table distributed and magnified by the number of employees you have.   I’m not saying that you should completely give up your passion for technology; it’s helpful for managers to understand the challenges their employee’s face without necessarily being an expert (that’s a fundamental principle of Lean Thinking; “go see” management).  However, you should strive to be the least technical person on your team by encouraging the growth of the rest of your team.

So let me ask you: “What are you doing to develop your employees today?”

August 18, 2014 · stuart · No Comments
Tags: , ,  · Posted in: Professional Development, SQLServerPedia Syndication

#SQLServer – Where does my index live?

Today, I got asked by one of my DBA’s about a recently deployed database that seemed to have a lot of filegroups with only a few tables.  He wanted to verify that one of the tables was correctly partition-aligned, as well as learn where all of the indexes for these tables were stored.  After a quick search of the Internets, I was able to fashion the following script to help.  The script below will find every index on every user table in a database, and then determine if it’s partitioned or not.  If it’s partitioned, the scheme name is returned; if not, the filegroup name.  The final column provides an XML list of filegroups (because schemes can span multiple filegroups) and file locations (because filegroups can span multiple files).

 WITH C AS ( SELECT ps.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
JOIN sys.destination_data_spaces dds ON dds.data_space_id = f.data_space_id
JOIN sys.partition_schemes ps ON ps.data_space_id = dds.partition_scheme_id
UNION
SELECT f.data_space_id
, f.name
, d.physical_name
FROM sys.filegroups f
JOIN sys.database_files d ON d.data_space_id = f.data_space_id
)
SELECT [ObjectName] = OBJECT_NAME(i.[object_id])
, [IndexID] = i.[index_id]
, [IndexName] = i.[name]
, [IndexType] = i.[type_desc]
, [Partitioned] = CASE WHEN ps.data_space_id IS NULL THEN 'No'
ELSE 'Yes'
END
, [StorageName] = ISNULL(ps.name, f.name)
, [FileGroupPaths] = CAST(( SELECT name AS "FileGroup"
, physical_name AS "DatabaseFile"
FROM C
WHERE i.data_space_id = c.data_space_id
FOR
XML PATH('')
) AS XML)
FROM [sys].[indexes] i
LEFT JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
LEFT JOIN sys.filegroups f ON f.data_space_id = i.data_space_id
WHERE OBJECTPROPERTY(i.[object_id], 'IsUserTable') = 1
ORDER BY [ObjectName], [IndexName] 

August 14, 2014 · stuart · No Comments
Tags: , , , ,  · Posted in: Code, SQL, SQL Server, SQLServerPedia Syndication, XML

Hadoop for the SQL Server DBA – Initial Challenges

I’ve been intrigued by the whole concept of Big Data lately, and have started actually presenting a couple of different sessions on it (one of which was accepted for PASS Summit 2014).  Seems only right that I should actually *gasp* blog about some of the concepts in order to firm up some of my explanations.  Getting started with Hadoop can be quite daunting, especially if you’re used to relational databases (especially the commercial ones); I hope that this series of posts can help clear up some of the mystery for the administrative side of the house.  Before we dive in, I think it’s only fair to lay out some of the initial challenges with discussing Big Data in general, and Hadoop specifically.  Depending on your background, some of these may be more challenging than others.

Rapid Evolution

Welcome to the wild, wild west.  If you come from a commercial database background (like SQL Server), you’re probably accustomed to a mature product.  For Microsoft SQL Server, a new version gets released on what appears to be a 2-4 year schedule (SQL 2005 -> 2008 -> 2012 -> 2014); of course, there’s always the debate as to what constitutes a major release (2008 R2?), but in general, the core product gets shipped with new functionality, and there’s some time before additional new functionality is released.

Hadoop’s approach to the release cycle is much looser; in 2014 alone, there have been two “major” releases with new features and functionality included.  Development for the Hadoop engine is distributed, so the release and packaging of new functions may vary within the ecosystem (more on that in a bit).  For developers, this is exciting; for admins, this is scary.   Depending on how acceptable change is within your operational department, the concept of rolling out an upgraded database engine every 3-4 months may be daunting.

Ecosystems, not products

Hadoop is an open-source product, so if you’re experienced with other open-source products like Linux, you probably already understand what that means; open-source licensing means that vendors can package the core product into their product, as long as they allow open access to the final package.  This usually means that commercial providers will either bundle an open-source product with their own proprietary side-by-side software (“we interface with MySQL” or “we run on Linux”), or they release their modified version of the software in a completely open fashion and earn revenue from a support contract (e.g., Red Hat).  In either case, it’s an ecosystem, not a canned product.

Hadoop technically consists of four modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

However, take a look at the following framework from Hortonworks (the Microsoft partner for Hadoop):

hortonworks Lots of stuff in there that’s being developed, but isn’t officially Hadoop.  it could become part of this official stack at some point, or it may not.  Other vendors may adopt it, or they may not.   Each of these components has their own update schedule (again, change!), but there is some flexibility in this approach (you can upgrade only the individual components); it does make the road map complex compared to traditional database platforms.

Big Data doesn’t always mean Big Data.

Perhaps the hardest thing to embrace about Big Data in general (not just Hadoop) is that the nomenclature doesn’t necessarily line up with the driving factors; a Big Data approach may be the best approach for smaller data sets as well.   In essence, data can be described in terms of the 4 V’s:

  1. Volume – The amount of data held
  2. Velocity – The speed at which the data should be processed
  3. Variety – The variable sources, processing mechanisms and destinations required
  4. Value – The amount of data that is viewed as not redundant, unique, and actionable

A distributed approach (like Hadoop) is usually appropriate when tackling more than 1 of these four v’s; if your data’s just large, but low velocity, variety, or value, a single installation of SQL Server (with a lot of disk space) may be appropriate.  However, if your data has a lot of variety and a lot of velocity even if it’s small, a Big Data approach may yield considerable efficiency.  The point is that big data alone is not necessarily the impetus for using Hadoop at all.

Summary

Big Data & Hadoop are complex topics, and they’re difficult to understand if you approach them from a traditional RDBMS mentality.  However, understanding the fundamentals of how Big Data approaches are evolving, disparate, and generally applicable to more than just volume can lay a foundation for tackling the platforms.

 

 

August 13, 2014 · stuart · No Comments
Tags: , , ,  · Posted in: Hadoop, SQL, SQLServerPedia Syndication

Managing a Technical Team: Act Like a Good Developer

This is one of my favorite pieces of advice from my Managing a Technical Team presentation that I’ve been doing at several SQLSaturdays and other conferences: act like a good developer, with a different focus.  Most new managers, especially if they’ve been promoted from within (the Best Operator) model don’t know how to improve their management skills.  However, if you were to ask managers what makes a good developer, you’ll probably get a series of answers that are similar to the following broad categories:

Good Developers have:

  • a desire to learn,
  • a desire to collaborate, and
  • a desire for efficiency.

I could probably say that this is true for all good employees, but as a former developer, I know that the culture in software development places a lot of focus on these traits; system administrators usually have different focus points.  However, all technical managers SHOULD emulate these three traits in order to be effective.  Let me explain.

Desire to Learn

Let’s imagine Stacy, a C# developer in your company; by most accounts, she’s successful at her job.  She always seems to be up on the latest technology, has great ideas, and always seems to have a new tool in her toolkit.  If you ask her how she got started programming, she’d tell you that she picked it up as hobby while in college, and then figured out how to make a career out of it.  She’s an active member of her user group, and frequently spends her weekends reading and polishing her craft; while not a workaholic, she does spend a great deal of her personal time improving her skills.  She’s on a fast track to managing a team, in part because of her desire to learn.

One day, she gets promoted, and is now managing the development team; she struggles with the corporate culture, the paperwork, laying out a vision, and can’t seem to figure out how to motivate her team to the same level of success that she was achieving as a developer.  The problem is that her desire to learn no longer syncs up with her career objectives;  Stacy needs to invest her educational energies into learning about management.

Ask a new IT manager what books they’re reading, and typically the response will be either none at all, or a book on the latest technology.  We tend to cling to that which is familiar, and if you’ve got a technical background, it’s easy and interesting to try and keep focusing on that background.  However, if you’re serious about being a manager, you need to commit to applying the same desire to learn that you had as an employee to learning more about management.  Sure, pick up a book on Big Data, but balance it out with a book on Relationship Development.  Podcasts?  There’s management ones out there that are just as fun as the development ones.  Webinars? Boom.

Desire to Collaborate

Bob’s a data architect.  Everybody loves Bob, because he really listens to your concerns, and tries to design solutions that meet those concerns; if he’s wrong about something, he’s quick to own up to the mistake, and moves on.  He works well with others, acknowledging their contributions and adapting to them.  In short, Bob is NOT a jerk; nobody wants to work with a jerk.

Bob gets promoted to a management position, and he too struggles; he’s still hanging out with his former teammates, and is still going to the same conferences.  Everybody still likes Bob, but he’s having trouble guiding his team in an effective manner.  He hasn’t really built relationships with his new peers (other managers that report to his director), and hasn’t found ways to manage more effectively.  He’s collaborating, but with the wrong people.

As a new manager, you should continue to maintain relationships with your directs, but you need to build a relationship with your new team of peers.  Understand their visions, and find ways to make your team valuable resources to them. Reach out to other managers at user groups and conferences; build a buddy system of people based on your management path, not just your technical one.

Desire for Efficiency

If you sat down and had a conversation with any development team that was effective and producing results and asked them about their methodology, it wouldn’t be long before they started talking about frameworks.  Efficiency in development is derived from reusable patterns and approaches to problems; they’re tough to implement at first, but the long term gain is enormous.

As you’ve probably guessed, there’s management frameworks that can be very effective in a technical environment; investing time in implementing them can yield great efficiencies when faced with making decisions.  In my current environment, I use three:

  1. MARS – my own self-rolled approach to system operations; it’s not perfect, but it helps focus efforts.
  2. Kanban – allows me to see what our WIP (Work In Progress) is, and helps queue up items for work
  3. ITIL – we’re just starting to adopt this, but we’re working on isolating Incident Management from root cause analysis, as well as implementing robust change control processes.

The challenge with management frameworks is similar to that of development frameworks: bloat.  It’s too easy to get bound up in process and procedures when lighter touches can be used, but in most cases, the efficiency gained by having a repeatable approach to decisions allows you to respond quickly to a changing environment.

Summary

Management is tough, but it’s especially tough if you continue to focus on your technical chops as opposed to your leadership abilities.  Act like a good developer, and apply those same basic principles to your team.

August 4, 2014 · stuart · No Comments
Tags: , , , , ,  · Posted in: Professional Development, SQLServerPedia Syndication