Auto generated SQL Server keys – uniqueidentifier or IDENTITY – a rebuttal

I found this article by Armando Prato from facility9’ssite; for the most part, the article explains the differences between using a uniqueidentifier and integer-based identity values to auto-generate (surrogate) primary key values. However, the conclusion to the article has a logical flaw; can you spot it?:

It’s evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches:

  • IDENTITY generated values are configurable, easy to read, and easier to work with
  • Fewer database pages are required to satisfy query requests
  • In comparison to NEWID(), page splitting (and its associated overhead) is eliminated
  • Database size is minimized
  • System functions exist to capture the last generated IDENTITY value (i.e. SCOPE_IDENTITY(), etc)
  • Some system functions – such as MIN() and MAX(), for instance – cannot be used on uniqueidentifier columns
  •  

    Here’s the hint: page-splitting has less to do with data-types than it does with relationship of the clustered index to the natural order of insertion for the data. the author assumes that the PRIMARY KEY is located on the clustered index, and that’s one of my pet peeves. A PRIMARY KEY constraint is a physical implementation of a logical constraint; it’s used to enforce data validation rules, and has nothing to do with data storage or performance concerns. A clustered index, on the other hand, is one of the basic tools for enhancing database performance. Although SQL Server automatically creates a unique clustered index when building a PRIMARY KEY constraint (if no clustered index exists), a good database design never assumes that the clustered index should automatically sit on the same columns as the PRIMARY KEY constraint.

    Granted, if you use integer-based identity values for your surrogate keys, this argument has less power than if you are using uniqueidentifiers; a good rule of thumb to use when choosing columns for clustered indexes is that they should be relatively unique, and they should increase monotonically (i.e, the order of values is such that 1<=1<=2, etc). Identity’s are always unique, and the value of the next identity is always greater than the last.

    But there are times when you cannot use an identity column (partitioned views. for example), and the choice of using a uniqueidentifier should not be dismissed for fear of page splits; rather, you should choose a different column for your clustered index. I modified the code from the original article to prove my point:

    SET NOCOUNT ON
    GO
    USE MASTER
    GO
    CREATE DATABASE CodeGumbo
    GO
    USE CodeGumbo
    GO
    -- Start at 1 and increment by 1
    CREATE TABLE IDENTITY_TEST1
    (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered by default
    CREATE TABLE NEWID_TEST
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered on a different column
    CREATE TABLE NEWID_TEST2
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY NONCLUSTERED,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000),
    UTCDATESTORED datetime DEFAULT GETUTCDATE()
    )
    GO
    CREATE CLUSTERED INDEX clidx_NewID_Test_UtcDateStored ON NEWID_TEST2(UTCDateStored)
    GO
    -- INSERT 1000 ROWS INTO EACH TEST TABLE
    DECLARE @COUNTER INT
    SET @COUNTER = 1
    WHILE (@COUNTER <= 1000)
    BEGIN
    INSERT INTO IDENTITY_TEST1 DEFAULT VALUES
    INSERT INTO NEWID_TEST DEFAULT VALUES
    INSERT INTO NEWID_TEST2 DEFAULT VALUES
    SET @COUNTER = @COUNTER + 1
    END
    GO
    SELECT OBJECT_NAME([OBJECT_ID]) as tablename, avg_fragmentation_in_percent, fragment_count, page_count
    FROM sys.dm_db_index_physical_stats (DB_ID(), null, null, null, null)
    WHERE index_id = 1
    ORDER BY tablename
    GO
    USE MASTER;
    CROP DATABASE CodeGumbo;
    GO

    As you can see from the results below, the clustered index (and thus the table) has minimal fragmentation when the clustered index is on a different column than the uniqueidentifier (NewID_Test2).

    To be fair, the index on the primary key is pretty shredded; however, it’s been my experience that having a fragmented nonclustered index is less damaging than a fragmented clustered index. Both will require maintenance over time, but the longer you can keep the clustered index healthy, the better off your database will be. The trick is determining where the clustered index should live, and it’s not always the same column(s) as the primary key.

    Share