Development

SQLSaturday 41 Status Update

I know many people have been worried about me because of the recent personal issues that I’ve been dealing with, but things have finally started to stabilize.  I know I’ve promised that before, so no promises to return to blogging or getting more involved in the community, but I’m starting to climb out of the pit (and hey, I have light bulbs)!

Anyway, despite me, SQLSaturday 41 on April 24, 2010 is plugging along, thanks to a group of dedicated volunteers that have really pushed me to keep this on track.  Thankfully, I’ve been able to give them the information they need, and they’re quite capable of making this happen.  We’re a little more than a month out, and we’re almost full with our speaker’s list, and are sitting at nearly 60% registration.  Seats ARE filling up, so if you haven’t registered, now’s the time to do so.

Here’s a short list of topics so far (in no particular order):

A Lap Around SQL Server 2008 Master Data Services
Whitney Weaver
Beginner

Advanced Parameters in SQL Server Reporting Servic
Mike Davis
Intermediate

Can you control your reports?
Ryan Duclos
Intermediate

Common Table Expressions
Ryan Duclos
Beginner

Data Warehouse Assessments – What,Why, and How
Noah Subrin
Beginner

Database Design Fundamentals
Louis Davidson
Intermediate

Database Design Patterns
Louis Davidson
Intermediate

De-mystifying Execution Plan Analysis
Dave Turpin
Intermediate

Dynamically Configuring Packages
Mike Davis
Intermediate

Full Text Searching – A Guide for DBAs & Devs
Robert Cain
Beginner

Introduction to Data Warehousing / BI
Robert Cain
Beginner

Introduction to Performance Tuning
Mike Femenella
Beginner

Introduction to Performance Tuning
Mike Femenella
Beginner

Introduction to Transactional Replication
Troy Gallant
Beginner

Loading Data In Real Time
Mike Femenella
Intermediate

Off and Running with PowerPivot for Excel 2010
Robert Cain
Beginner

PowerShell for the Data Professional
Aaron Nelson
Intermediate

RESTful Data
Chris Eargle
Beginner

Slowly Changing Dimensions–Done Well.
Julie Smith
Beginner

Solving Real World Problems With DMVs
Whitney Weaver
Intermediate

SQL Server 2008 R2 Overview – Session 1
David Rodriguez
Beginner

SQL Server 2008 R2- BI Drill Down Session 2
David Rodriguez
Beginner

SQL Server 2008 R2- DBA Drill Down Session 3
David Rodriguez
Beginner

SS2008 Data Mining with Excel 2010 and PowerPivot
Mark Tabladillo
Intermediate

Survey of Windows Azure Platform Storage Options
Glen Gordon
Intermediate

The Art and Science of Data Modeling
Audrey Hammonds
Beginner

Tuna Helper for SQL Server DBA’s
Janis Griffin
Intermediate

Using Event Handlers in SSIS for Auditing and Noti
Mike Davis
Intermediate

Virtualize This!
Aaron Nelson
Beginner

Wait-Time Based SQL Server Performance Management
Janis Griffin
Intermediate

When GEO meets SQL: Hotwiring Data to Locations
Michael Clifford
Beginner

#TSQL2sDay 003: Maslow and relational design

Rob Farley is hosting the third installment of TSQL Tuesday, and it’s a fun one: relationships (in honor of Valentine’s Day).   While I’m not currently in much of a mood to opine on the virtues of love and databases, I did think I wanted to post something a bit more esoteric this time.  Not many of you may know that I don’t have a formal background in Information Technology (some of my more sarcastic friends just held their tongues at that one); I actually have a Master of Arts in Communication, and a Master’s of Education in Instructional Technology.  I tripped into IT when I failed my comprehensive exams for the doctoral program in Speech Communication at the University of Georgia.  Awful time, but ultimately one of the best things to ever happen to me.

Anyway, why is this relevant?  Because the goal of this post is to attempt to extend one of the more famous models of social psychology and communication to database design; bear with me (I’m assuming that many of you either have no background in social psych or slept through it), but I’m hoping that this extension to the metaphor will benefit you in terms of your application design.

Maslow: the crash course.

The following is a BRIEF introduction to the theory; if you want more details, Wikipedia is your friend. In a nutshell, Abraham Maslow proposed that humans, as a social animal, were driven to fulfill certain basic needs in a quest for self-actualization or enlightenment.  He proposed a pyramidic model of five sets (or stages) of these needs, with the four lowest ones being required to achieve before attempting the fifth; few people ever attain the fifth level, but the quest to reach that is part of our collective experience.  I’ve defined the five stages below:

maslows_hierarchy_of_needssvg Physiological:

The basic requirements for human existence; food, water, etc.

Safety:

This often translates into security, but it’s different than the term we use in information technology careers; safety is the ability to acquire and maintain goods for ongoing existence.  The Physiological elements are immediate needs; Safety elements are the ability to fulfill those immediate needs at a future date.

Social:

Where do we belong?  How do we interact with others who need us (and we need)?  What is our role, and how does that affect our definition of the self?

Esteem:

Esteem stems from the social need; once our relationship with others has been established, we can truly begin to define ourselves and the virtue of our importance in the world.

Self-Actualization:

Self-actualization is the ultimate fulfillment of one’s potential; to be what one is, without need for constant reinforcement from other beings, yet able to exist in harmony with purpose.  Few people have ever attained this stage, but as stated before, the quest to reach the top of the pyramid drives human development.

So what does this mean to the database designer?

Why is all of this important?  This is not a perfect analogy, but if we extend Maslow’s model to the area of database design, some interesting questions arise (particularly in the third and fourth stages, which is why I felt like this point would be relevant to the TSQL Tuesday challenge of relationships).  Let’s take each stage, and step through them again.

Physiological:

While applications don’t have physiological needs, they DO require certain basic elements for long term survival.  Questions to consider at this stage are things like: How much space will I need?  What are the server requirements?  Can my database live in cloud or a mobile device?   What sort of I/O concerns do I have?

Safety:

Recall that safety is NOT security (in terms of who has access to the data), but it is security in terms of long-term survival of the application.  Is the database you’re designing intended for a long-term project, or is it “throw-away” code?  Have you designed it in such a way so that it’s easy to replace without impacting the dependent application?

Social:

Speaking of dependent applications (and herein lies the relationship aspect of this post), is your database application designed so that it is loosely related and decoupled from the application?  Does the database fulfill the needed role within the relationship (data storage), without treading too far into business logic?  Can the database handle multiple relationships with various applications (UI/reporting/business services).

Esteem:

Closely related to the social nature of the database within the application stack is the need for self-esteem within the database; can the database meet the the needs of the dependent applications WHILE retaining enough information to establish new relationships?  A classic example of this is the lookup table; a database with low self-esteem will only store the enumerated values provided to it by some other application. 

Without the enabling application, the database lacks sufficient internal definition to validate meaning; in practical terms, this means that the database is not decoupled from the application enough to enable the development of alternate accessing applications.  For example, my day job is to reverse engineer vendor databases; few things in the world are more disturbing than a table full of numbers without any sort of category associated with that number.  The application designer decided to store that enumeration in the application; security through obfuscation IS a method of securing your database, but not a very effective one.

A high-self esteem database will store all of the appropriate lookup values (complete with constraints) in order to provide complete validity within the structure.  The database can then be reused by several different applications, without requiring a complete set of business rules to determine those relationships.    The data layer is definitional; the business layer should be procedural.

hal[1] Self-Actualization:

I have to admit that discussing self-actualization in regards to application design makes me think of HAL.  “I’m sorry, Dave….”

To try and stay on track with this metaphor, self-actualization is the basic premise of BI; when your database can start providing you knowledge instead of just data, it has attained the highest level of potential.  Few apps make it that far without requiring substantial redesign, but the ones that do are invaluable to the enterprise they support.

So where are we?

Dunno.  I hope this little exercise made your brain hurt just a bit, and opened up a new metaphor for understanding database design issues within the application stack.   If you have more questions than answers, that’s a good place to be.

SQL Saturday 41 is official!

It’s live; we’re limited to 250 seats, so register now for a great day of training in Atlanta on April 24.  We’re also looking for speakers and sponsors, so please feel free to spread the word.  I’ll continue working on the stub in order to finish the site out, so check back often.

This will be a new challenge for me, since we’re planning on running this by committee (which has both benefits and challenges); I’ll keep you posted as to how that’s working out.

Stay tuned!

Shhhh! SQLSaturday Atlanta 2010 request has been submitted.

We’re looking at April 24 at the Microsoft facility in Alpharetta, GA.  I just submitted the request on the website tonight, so it probably won’t be official for a few days, but I a) needed something to blog about tonight, and b) wanted to get the word out to start some buzz.

Like last year, there will be a waiting list; we’re limited on space, and it will probably book quickly, so keep an eye out on this website for the official announcement.  We’re hoping to have several tracks again, as well as a mixture of experienced speakers and newcomers.

Watch and wait 🙂

#TSQL2sDay 001: Date/Time Issues and Data Modeling

This is my first post for T-SQL Tuesday (thanks, Adam, for the idea!); I’m hoping this will be a continuing block party, and hopefully connect a lot of us SQL bloggers together.  There’s a lot of great content out there, and sometimes it’s just tough to find.

Just a few notes before I get started on this; for this particular post, I’m pulling from my experiences as a data architect for the last few years.  I spend a lot of time reverse-engineering other databases in order to centralize and analyze data from a variety of sources.  Most of my efforts pre-date SQL 2008, so I don’t have a lot of experience with the newer datetime data types introduced in that version; however, I’ll point you in that direction if I think it might address an issue that I raise.  Also, there’s no particular order to my list of notes; think of it as a random collection of date-time issues.  Some of them may be specific to T-SQL and SQL Server, and others may just be procedural notes about data integration (and thus may be applicable to other platforms).

Ready?  Let’s get started.

1.  If you are a database professional that works with SQL Server, strike the word “timestamp” from your vocabulary.

Way back when SQL Server was in its infancy, some developer decided that timestamp would be a good name for a binary column that changes whenever the row was updated.  I don’t know the origins of the timestamp datatype; all I know is that it has nothing to do with date or time.  So why am I including it in a post about Date and Times?  Because for most speakers of the English Language, a timestamp is a marker that indicates when an event happens.  Many junior SQL Server developers and even senior developers on other platforms will often be confused by the timestamp datatype, and wonder what’s going on with that column that they just added.

My recommendation?  Avoid using the word timestamp in design discussions.   When someone mentions “we need to add a timestamp to that row”, make sure that you understand what they mean.  Usually they want you to add a column to capture what datetime the event happened, NOT an indicator of when the row will change.  Note that I’m suggesting that you avoid the WORD timestamp; obviously, if a database requires a timestamp column, use it.

2.  Does anybody really know what time it is?

One of the benefits of a distributed architecture is that your database server doesn’t have to live on the same box as your application server; in fact, your database server may not be in the same timezone as your application server.  And, if you’re using a webserver as an application host, your user may not be in the same timezone as either your application or your database.

I mention this because as a database designer, you may need to capture multiple points-in-time (or you may not); you cannot assume that the application is consistently providing the same time reference for all users.  If you’re using a truly distributed design, you may have one host in New York and one in Los Angeles, and your users may connect to either one to enter in data; what date do you use?

For this reason, I usually advocate two things: 1) always include a database-centric referential date of event (in other words, include a column with a default value of the database time for important data), and 2) always use UTC for saving time.  That second statement’s important enough to warrant a note of its own.

3.  When in doubt, use UTC.

Time is relative to geography; the application you build today in New York may eventually have clients in New Delhi.  If your processes could ever have the potential of crossing multiple time zones (for example, you want to track shipping from Atlanta to Oregon), you’d get the most benefit out of storing your data as UTC in order to understand the relationship between event A (which occurred in EST) and event B (in PST).  It’s a lot easier to change the display of events to a local time from UTC than it is to convert from local time to UTC, do the math, and then convert back.

One problem with recording data with UTC times that I’ve encountered is that is you have centralized data, but distributed users that want to see their data in local time, building OLAP cubes is a pain.  If you’re wanting to analyze data for Atlanta and data for Oregon, Atlanta’s day begins 3 hours earlier than Oregon’s.  I have not found an easy solution to this yet, but am starting to investigate the new datetimeoffset data type in SQL 2008.

4.  Use a standard notation for calculating differences between dates.

Obviously the new date and time datatypes in SQL 2008 address a desire of many in the development community for a method of storing ONLY the date or time of an event, but if you’re using an application that still stores the datetime of an event and you need to round the data to the nearest day, hour, week, whatever unit of time, then use  a standard method for doing so.

(I am assuming that you are using date-based arithmetic rather than converting to a varchar and back).

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)

is the preferred method; you can read why at my previous post.

5.  Use a standard notation for referring to dates.

Americans are weird about dates; we’re one of the few countries that puts the middle container before the smallest container followed by the largest container.  Clueless?  When people ask you what’s your birthdate, how do you answer?  Most people rattle off Month-Day-Year.   An international notation would be Year, Month, Day.

Again, if you’re designing databases and queries that might cross international borders, you’ll save yourself some debugging time in the future if you can get into the habit of using an international standard notation for referring to datetimes.   A query that asks for data WHERE DateOfEvent > ‘11/1/2009’ may work fine in the US (returning data newer than November 1, 2009), but may blow up in the UK (returning data greater than January 11, 2009).

How do you know if you’re a DBA that’s been bitten by this?  You start dating everything (including personal checks) using a notation like ‘2009-11-01’.  I  get odd looks all the time (probably not just because of the way I date my checks).

6.  Finally, get used to the fact that there’s lots of methods to store datetimes.

If you’re doing any sort of database integration, you’re probably familiar with many of the same issues I laid out.  There are others that I haven’t covered, but the important thing I want to focus on is that just because I think these are good habits to get into, others have different opinions.  Some db pro’s prefer to use epoch time rather than a standard datetime; I not only have to be familiar with my own methods of doing something (and prepared to defend those methods), but I also have to be willing to understand other methodologies so that I can do the conversion.

PASS Summit Day 2: the brief wrapup #sqlpass

I’m writing this post quickly, because I have a 6:30 SQLSaturday alumni meeting at a local donut stop.  Please don’t let the brevity of the post throw you off; lots of goals accomplished yesterday.  Unfortunately, going to sessions was not one of them 🙂

  • I spent a lot of time networking, and discussing Agile techniques for database developers with a lot of different people, as well as looking at some of the latest and greatest tools for SQL Server.
  • Speaking of vendors, I shipped home two huge boxes of free stuff, most of it from Quest, Idera, and Red Gate.  My shipping costs were $80, so you can imagine the stuff I had.
  • Finally, and I’ll probably return to this in a later post, I met with the Board of Directors of PASS in an intimate session on governance.  It was fascinating, and I’m surprised that more chapter leaders were not there, because we discussed A LOT of issues related to the local chapters, and how PASS runs itself.  I don’t think that any particular goals were met, but I do believe that the following is true:
    • Change is coming for the local chapters; the PASS board is going to try to do more to connect with the local chapter, but will want more from the local chapter as well.  Think of it as having more concrete discussions about how the local community uses SQL Server, and what niche can PASS fill.
    • Steve Jones asked a series of great questions, including: “What have you done for me?”  He wanted the Board to be able to be more transparent, and expose their accomplishments beyond their individual audiences (for example, Greg Low’s work with the chapters should be available to those who are not chapter leaders).
    • More to come.
  • The day finally wrapped with an afterparty at a local arcade, hosted by Microsoft.  Lots of fun, but I was exhausted.

Gotta run; day 3 will be a blast.

PASS Summit Day 1.5

So this is the first official day of Summit, but since I got here in the middle of the day yesterday, I’m cramming a brief post in combining yesterday and today.  I could sum up the experience in one word:

Awesome.

I have met SO many people face to face that I normally follow on twitter and other social tools, and it’s been amazing to sit and chat with other SQL geeks.  I ended up buying the DVD set today because I had too many sessions to attend, and too many people to chat with.  It’s truly inspiring to hang out with a bunch of intelligent people who are really passionate about what they do.

Highlights:

  • Pinal Dave stole my pen.  Well, he actually borrowed it to fill out his sqlbingo card, and then forgot it.  It’s OK, because I know if I make him feel guilty, he’ll fly back to the States from India to bring it to me.  And I could make him speak at our newsgroup.
  • SQLBingo has not taken off as well as I had hoped, but it’s going OK.  They’re drawing the first two winners tomorrow morning, and I hope to pitch it harder tomorrow.  I forget that just because I’m a social person, not everyone else is.
  • I learned some new stuff at Michelle Ufford’s presentation on VLDB’s.  Need to check things when I get back in the office to see if they’ll help some performance issues.
  • Met Patrick Leblanc, Tim Mitchell, Lori Edwards, Jeff Rush, etc., etc.,etc…
  • Met Remus Rusanu today after Brent Ozar’s session, and was told by him that he liked some of my responses on StackOverflow.  That really made my day, because he gives some great advice on that website.
  • Kevin Kline gave me AND the user group autographed copies of his latest book.
  • MidnightDBA gave me free stuff.

LetDowns:

  • I didn’t win any blogging awards.  More incentive to keep working at it.
  • I realized I should have submitted something to present.  Paul Waters and I made a pact to come up with 3 submission ideas for next year.

OK, I’m officially out of steam.  More to do tomorrow.

Packing my bag for PASS Summit 2009

Flying out on Monday, and I’m so excited.  Here’s the stuff I’m throwing in my bag (as if any of you care).

  1. My Archos 5 IMT – 6+ hours and an extended battery.  Gotta load it up with all of the TV shows I’ve missed over the last few months on my SageTV rig.  Since I can’t find my boosteroo audio amp, I’ll need to go see if I can get a good set of headphones for the plane.
  2. Pad and pens.
  3. My laptop computer, complete with the Nokia softphone software.  I’ll still have to touch base with the office occasionally, and this way I can have portable VOIP.
  4. I’m writing a review for The Art of Unit Testing for the AtlantaMDF Reviews site.  Need to pack the book for the plane.
  5. Airborne Vitamin C.  I know it’s not likely to prevent germs, but boosting my vitamin C intake before I travel is almost a superstitious rite for me.
  6. My #sqlbingo cards and tag.
  7. My PASS Summit itinerary;  I was going to load it into Outlook, but I remember that the last time I did that, everything got screwed up because of the time change. 
  8. My phone, and an extra battery (for all of the twittering I plan on doing).
  9. Business cards.

And I still gotta pack clothing and essentials.  I’m hoping I can do it all with the carryon+laptop limitation, particularly since I have a layover on my way back.

SQLBingo: a few more things to consider…

SQLBingo is still on at Pass Summit 2009; we’re less than a week away, and this should be a lot of fun.  To sum up, a bunch of us SQL Tweeters are going to be wandering around Summit; if you want to play Bingo, you have to match the real person with their avatar on the card, meet the person, and get their codeword.  Should be a lot of fun, but I have a few more things to add.

If you haven’t read the following posts, you may want to start with them first:

SQL PASS Twitter Bingo: The rules so far…

A basic summary of the rules for squares (with some general advice for players).  I’d like to add some things to the “how to play” list, but we’ll get to that in a minute.

#SQLBingo: Meet nice people and learn their safewords.

Brent (as usual) writes a very succinct post on how to play if you’re not a square.  What’s important about Brent’s list is that he lays out some guidelines for things to do BEFORE you go to Summit.

Wanna play?  It takes some prep work…

It should be noted that this is a low-budget game at best; I think one of squares called it “guerrilla networking”, which I think is an awesome term for it.  What it means is that we’re trying to keep the rules and the play very simple, but unfortunately, that also means that the players have to provide some of their own resources for playing.

If you are playing, follow the steps at Brent’s blog:

  1. Print your cards ahead of time from http://sqlserverpedia.com/bingo. (Be sure to refresh in between)
  2. Find the tweeps.
  3. Turn in a card per day when you’ve filled out a row on Tuesday, 2 rows on Wednesday, and blackout on Thursday.

You may also want to set up a Twitter account between now and then.  If you’d like, you can check out sqlbingo’s profile; this is a dummy account set up by Aaron Nelson and myself to follow all of the squares.  The whole point of this account is to simply give you a quick list of all of the Twitter accounts that are on the cards.  You may also want to set up a search for the term #sqlbingo; remember, the squares will be broadcasting their location periodically throughout the day, and using this term will help find them.

Already a square? It takes some prep work…

Same rules for players applies to the squares; however, we’ve decided that squares are not eligible for prizes.  Sorry.  However, if you’d like to make it easy for your adoring fans to find you, you may want to print out a simple badge with your avatar and username on it.  Just a 2×4 scrap of paper, and tape it to your standard Summit badge.  If you forget, c’est la vie!  It’ll just be tougher for people to find you.

Looking forward to meeting all of you face to face!

Stu

Good Habits To Adopt: Enforcing the natural primary key

I’ve been reading Aaron Bertrand’s great series of blog posts on bad habits to kick, and have been thinking to myself: what are some good habits that SQL Server developers should implement?    I spend most of my day griping about bad design from vendors, yet I hardly ever take the time to document what should be done instead.  This post is my first attempt to do so, and it’s based on the following assumptions:

  • Good habits are going to be a lot more controversial than bad habits, and
  • SQL Server doesn’t enforce many of these good habits for you.

The first point refers to the fact that some of the choices that I make are not necessarily the best way to do things, and they may not satisfy the need of every application.  I’m a firm believer that there is an exception to every rule, but my goal is to at least define what the rules are (and again, these rules are my own creation and someone may have better rules).  The second point refers to the fact that SQL Server enforces the rules of SQL, but leaves some of that enforcement open to interpretation.  For example, the relational model defined by SQL assumes that tables are related, but SQL Server doesn’t require that you define a FOREIGN KEY (or even a PRIMARY KEY).

So here’s my first good habit:

When defining a surrogate primary key for a table, you should enforce the natural primary key with the use of a UNIQUE constraint.

To really understand this, you have to start with defining what a surrogate primary key is versus a natural primary key.  You can search for a variety of definitions, but I’ll use the following:

  • Primary Key: a non-nullable attribute (or combination of attributes) that can be used to uniquely identify a specific instance of an entity.  When used within SQL, a primary key can be mapped to a column (or columns) in a table, and the value of the key uniquely identifies a row.
  • Natural Primary Key: a primary key that is not auto-generated by the database or application.  The key is comprised of attributes that are associated with an entity, and the value of those attributes is defined by some authority beyond the scope of the database or application.  For example, a Social Security number is a “arbitrarily” assigned number that belongs to a specific citizen of the United States; most databases that use the Social Security number do not create the number, but rather use it as a reference to a particular US citizen.
  • Surrogate Primary Key: a primary key that is auto-generated by the database or application to specifically identify the row in the table representing the collection of entities.  Surrogate keys have no meaning outside of the database and have no relationship to the other attributes in the table.  An ID of 1 simply identifies a row in a table; a row representing a person, a squid, or an automobile may all have an id of 1, depending on what table the surrogate key the data lives in.

Sidebar: as I was writing this, Pinal Dave post the following to his blog: http://blog.sqlauthority.com/2009/10/22/sql-server-difference-between-candidate-keys-and-primary-key-2/ 

Most novices recognize that every table needs a primary key, and surrogate keys offer some benefits that natural keys do not, including:

  • Immutability: the ability of a key to stay constant over time.  A natural primary key (such as a person’s name) may change, but a surrogate key does not.
  • Simplicity of relational JOINS: surrogate keys can remain as a singular column for each table they represent.  For example, a complete invoice may need to be represented by a ClientID, an InvoiceID, and the LineID’s for the lines on that invoice.  Joining on the natural keys may require the Client Name and Address, the Invoice Number, and the Line Number. 

However, surrogate keys have one major weakness; they do NOT enforce the unique validity of each row.  If you use an IDENTITY function in SQL Server to auto-generate your surrogate PRIMARY KEY, and you insert Stuart Ainsworth into your table of Employees, and you accidentally run your INSERT script again, you’ve just double-inserted Stuart Ainsworth.  While there are certainly multiple people with my name, I’m the only one at my company.  However, my application never noticed it.

Using a UNIQUE CONSTRAINT on the columns holding the natural key information avoids this problem; you get the benefits of a surrogate key AND the unique validation of a natural primary key.   The hard part is, of course, identifying the appropriate natural primary key to enforce.  However, this exercise should NOT be overlooked when designing a database.