Auto generated SQL Server keys – uniqueidentifier or IDENTITY – a rebuttal

I found this article by Armando Prato from facility9’ssite; for the most part, the article explains the differences between using a uniqueidentifier and integer-based identity values to auto-generate (surrogate) primary key values. However, the conclusion to the article has a logical flaw; can you spot it?:

It’s evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches:

  • IDENTITY generated values are configurable, easy to read, and easier to work with
  • Fewer database pages are required to satisfy query requests
  • In comparison to NEWID(), page splitting (and its associated overhead) is eliminated
  • Database size is minimized
  • System functions exist to capture the last generated IDENTITY value (i.e. SCOPE_IDENTITY(), etc)
  • Some system functions – such as MIN() and MAX(), for instance – cannot be used on uniqueidentifier columns
  •  

    Here’s the hint: page-splitting has less to do with data-types than it does with relationship of the clustered index to the natural order of insertion for the data. the author assumes that the PRIMARY KEY is located on the clustered index, and that’s one of my pet peeves. A PRIMARY KEY constraint is a physical implementation of a logical constraint; it’s used to enforce data validation rules, and has nothing to do with data storage or performance concerns. A clustered index, on the other hand, is one of the basic tools for enhancing database performance. Although SQL Server automatically creates a unique clustered index when building a PRIMARY KEY constraint (if no clustered index exists), a good database design never assumes that the clustered index should automatically sit on the same columns as the PRIMARY KEY constraint.

    Granted, if you use integer-based identity values for your surrogate keys, this argument has less power than if you are using uniqueidentifiers; a good rule of thumb to use when choosing columns for clustered indexes is that they should be relatively unique, and they should increase monotonically (i.e, the order of values is such that 1<=1<=2, etc). Identity’s are always unique, and the value of the next identity is always greater than the last.

    But there are times when you cannot use an identity column (partitioned views. for example), and the choice of using a uniqueidentifier should not be dismissed for fear of page splits; rather, you should choose a different column for your clustered index. I modified the code from the original article to prove my point:

    SET NOCOUNT ON
    GO
    USE MASTER
    GO
    CREATE DATABASE CodeGumbo
    GO
    USE CodeGumbo
    GO
    -- Start at 1 and increment by 1
    CREATE TABLE IDENTITY_TEST1
    (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered by default
    CREATE TABLE NEWID_TEST
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered on a different column
    CREATE TABLE NEWID_TEST2
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY NONCLUSTERED,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000),
    UTCDATESTORED datetime DEFAULT GETUTCDATE()
    )
    GO
    CREATE CLUSTERED INDEX clidx_NewID_Test_UtcDateStored ON NEWID_TEST2(UTCDateStored)
    GO
    -- INSERT 1000 ROWS INTO EACH TEST TABLE
    DECLARE @COUNTER INT
    SET @COUNTER = 1
    WHILE (@COUNTER <= 1000)
    BEGIN
    INSERT INTO IDENTITY_TEST1 DEFAULT VALUES
    INSERT INTO NEWID_TEST DEFAULT VALUES
    INSERT INTO NEWID_TEST2 DEFAULT VALUES
    SET @COUNTER = @COUNTER + 1
    END
    GO
    SELECT OBJECT_NAME([OBJECT_ID]) as tablename, avg_fragmentation_in_percent, fragment_count, page_count
    FROM sys.dm_db_index_physical_stats (DB_ID(), null, null, null, null)
    WHERE index_id = 1
    ORDER BY tablename
    GO
    USE MASTER;
    CROP DATABASE CodeGumbo;
    GO

    As you can see from the results below, the clustered index (and thus the table) has minimal fragmentation when the clustered index is on a different column than the uniqueidentifier (NewID_Test2).

    To be fair, the index on the primary key is pretty shredded; however, it’s been my experience that having a fragmented nonclustered index is less damaging than a fragmented clustered index. Both will require maintenance over time, but the longer you can keep the clustered index healthy, the better off your database will be. The trick is determining where the clustered index should live, and it’s not always the same column(s) as the primary key.

    Weekly Link Roundup 2008-12-24

    As promised during my last roundup, I’m going to start doing these on Wednesday’s.  And since Christmas is tomorrow, the season is on my mind:

    10 things for non-coders to do with free software over Christmas: Interesting list of stuff to do; I particularly like the final point (even though it’s tough for me to do).

    Give Camps: Now that I’ve gotten SQL Saturday rolling, I really want to keep my eye on the next iteration of user group camps: a give camp.  Although the main site appears to be down, here’s a couple of links to give camps in the past.

    • We Are Microsoft:  check out their charity listing page, and look at the projects they want to do in a weekend.
    • Ann Arbor Give Camp: This is from their 2008 event.  The cool thing is that they had a satellite camp in Knoxville, TN.  (I’m beginnng to think very highly of that dev community there).
    • Twin Cities Give Camp: If you’re near Minneapolis/St. Paul area around April 26 & 27th, here’s a camp for you!

    And finally, a couple of different versions of the Programmer’s Christmas:

    Merry Christmas, everybody!

    happiness is…

    watching my Roomba with its new cleaning head pick up dirt.  I know that’s materialistic, but with two cats, two kids, a dog, and a yard full of dormant Bermuda grass, I have really missed having clean floors (and yes, I know, I could have vacuumed myself, but what’s the point when I have a robot to do it for me?).  It took Roomba support nearly a month after my initial call to send me the parts.

    Ahhh.

    SQL Saturday #13 Call for Speakers NOW OPEN!

    We just opened up the call for speakers for the first Atlanta SQL Saturday. This is becoming more real by the minute 🙂

    If you’re in the Atlanta Area, and would be willing to help out, please check out our event page for details. Nutshell: it’s being held at the Microsoft offices in Alpharetta on April 25, 2009. All details are subject to change, but that’s where we’re at for now.

    Weekly Link Roundup 2008-12-21

    I’m going to have to start posting these on Wednesdays; trying to post them at the end of the week just causes too much stress. It seems like time just flies on Fridays, and the weekend is wrapped up with house stuff (I finally got Christmas lights up, BTW). Anyway, here’s the links of interest this week:

    SQL Batman’s post on Web Persona: As a budding blogger, I wish that I wrote as eloquently as the Batman. I also wish I had been creative enough to choose a moniker like his, but if I keep building the brand, I think codegumbo will eventually have some of the same power.

    The Code Camp Manifesto: I’m trying to put together SQL Saturday #13 in Atlanta, so I’ve been cruising the web looking for interesting reads on code camps around the country. This is a very succinct synopsis of the code camp philosophy. You may also be interested in the following:

    Not a lot this week, I know. Been spending a lot of time offline with the wife and kids.

    argh….

    K, as the new year approaches, I am deciding (yet again) that I AM GOING TO GET PHYSICALLY FIT! The first step in this process, however, is that I need to get off caffeine. I usually feel much better once I’m free of the sweet, sweet, succubal call of Diet Coke. Day 1 has gone by, and I’m just now starting to get the shakes. A few more days of this, and I’ll be ready to start working out in earnest.

    So long, artificially-sweetened cans of go-juice. I’ll miss you.

    SQL Quiz: Toughest Challenges

    Blast those pingbacks; I lament feeling left out of Chris Shaw’s quiz, and the next thing you know, he tags me.

    Here’s the questions:

    What are the largest challenges that you have faced in your career and how did you overcome those?

    First answer, a technical one:

    One of my biggest challenges was designing a database that would handle millions of rows of syslog data; it was a SQL 2000 box, and the budget for hardware was tight. We started off with a dual-core machine, with only 4GIGs of RAM, and yet we had to analyze 100,000 rows per minute, looking for patterns. We also needed to report trends to the customer on a monthly basis, so I did some reading on data warehousing (Kimball, obviously), and started building fact tables. It became easily visible that we were going to need to partition the data so that the server wouldn’t need to slog through 90 days worth of data when looking for a minute’s worth; so I started playing with partitioned views.

    One problem: partitioned views wouldn’t use parameters to activate partition exclusioning. I searched high and low for the answer, got into several arguments over execution plans with people on the newsgroups about why they thought it was working (they were clearly wrong), and I was just about to give up when I realized that dynamic SQL could save the day. I rewrote all of my procs to use dynamic SQL to activate the partitions, and I as off to the races. I ended up using something that most DBA’s would agree could easily be misused, and it solved my problem. It really drove home the fact that a good database developer will have a toolbox full of stuff; you never know when a left-handed screwdriver will come in handy 🙂

    My second answer is ethical in nature; as most of you are aware, DBA’s are guardians of some of the most precious assets in a company: their data. I had just started working for a company as their all around DBA/report writer/developer/printer support person when they had a change in senior management. Shortly after the change, this company was being audited by their parent company; the auditor came on site, and at one point, asked me to run some numbers for him from our Enterprise Resource Planning (ERP) system. I agreed, but asked for more time, since I was arm deep in a printer at that point (I was serious about the printer support above).

    Between the time that I finished repairing the printer and could get back to my desk, I was approached by the head accountant who asked me to “run the numbers for the auditor, give them to the Chief Operating Officer, make my computer look like it was doing something and go home for the rest of the day”. In other words, hand off the data and get the heck out of Dodge; the accountant would “make sure” the auditor saw it.

    I wish I could say I took the high road, and say that I told the accountant “no, I’ll deliver it myself”, but I didn’t. I went back to my desk generated the data file, saved it to a disk, started a long-running query on my machine, handed the file to the accountant and left. I went home, and started working on my resume. The next day, the auditor was gone, the COO was trumpeting our success, and I told my manager about how icky I felt. Two months later, I was working at a new job, and I got a phone call from my former manager; apparently, the company was being audited again, and he invited me to share my experience with the new auditor. I did so, and shortly after that, most of the upper management (COO, CEO, CFO, and the accountant) were gone.

    Is there a lesson in this? I’m not sure; I clearly didn’t do the right thing the first time, but things still worked out OK in the end. The only apparent victim was my sense of morality, but still, I walked away wishing I had handled the situation differently.

    think I’ll tag SQL Sister and Tim Benninghoff now.

    Weekly Link Roundup 2008-12-13

    OK, I realize this post is late, but my site has been having all sorts of issues (excuses, excuses, I know).  For some reason, this post didn’t get released on schedule. 

    DevGuru’s CSS2 Manual: I’ve been spending a lot of time tweaking the site this week, so I’ve been hanging out here a lot.  It’s very easy to appreciate DevGuru’s approach to all things scripting.

    Chris Shaw’s SQL Quiz: Interesting reads; feeling a little left out that nobody tagged me.  Could be that I call myself a developer these days, instead of a DBA (one of these days I need to write a post on that topic).   Anyway, you may also want to check out the following responses:

    Andy Novick’s site: I met Andy at PASS Summit this year, and really enjoyed a conversation with him.  His website his chock full of presentations, and there’s lots of great learning opportunities in there.

    Vyas’ SQL Snippets:   I found this site, and he’s got some great little code snippets in there.

    CSS, max-height

    I wish I had something of substance to add today, but I’ve been overwhelmed by personal stuff.  My oldest daughter has been sick with strep for the last week, so I’ve been having to rearrange a lot of my schedule so I can hang out with her when needed.  Consequently, I’m just wiped in the evenings. 

    Anyway, you may have noticed that I’ve been playing around with my website, especially with my new blogroll page.  I”ve been adding some blogs to it using RSS feeds, and one thing has become clear: different RSS feeds behave differently when asked to return an item.  For example, some blog postings auto truncate themselves after a few phrases; for example, Brent Ozar:

    Brent Ozar

    [feedsnap, 1]http://feeds.feedburner.com/BrentOzar-SqlServerDba[/feedsnap]

    Contrast this with Andy Warren’s feed:

    It Depends (Andy Warren)

    [feedsnap, 1]http://feeds.feedburner.com/itdepends[/feedsnap]

    Now, while I like reading Andy’s stuff, I don’t necessarily want a bunch of lengthy articles on my blogroll page; I tried to figure out a way to limit the amount of text returned, when I tripped across this nifty CSS property: max-height. From DevGuru: The max-height property is used to set the maximum height of an element. Other properties can be used to set the height, width, maximum width, minimum height and the minimum width. This property is not inherited. (note that Devguru also specifies that this property is not currently supported by any browser; it works in IE7. Go figure).

    Anyway, I simply add the max-height to my div container wrapped around the feedsnap plugin, and suddenly the RSS feed truncates after a certain pixel height.

    <div style=”overflow:hidden;max-height:100px”></div>

    It Depends (Andy Warren)

    [feedsnap, 1]http://feeds.feedburner.com/itdepends[/feedsnap]

     

    Anyway, I hope this puts some ideas in your head.