#TSQL2sday: Emulating a FIRST aggregation

tsql2sday

Jes Borland is hosting this month’s T-SQL Tuesday, and it’s all about aggregations.  Here’s an old coding trick of mine to emulate a FIRST aggregation in T-SQL.  Say we have a table that has three columns:

  • ID, a uniqueidentifier
  • Name, a varchar that represents something, and
  • DateStored, a datetime that is set when the row is written to the table

And we populate that table like so:

CREATE TABLE TSQL2sDay_FirstAgg
   
(
     
ID UNIQUEIDENTIFIER
   
, NAME VARCHAR(20)
    ,
DateStored DATETIME DEFAULT GETUTCDATE()
    )
    
INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Orange' )

 

It’s easy to figure out the number of rows associated with each name:

-- SELECT data to verify order of DateStored
SELECT  ID
     
, NAME
     
, DateStored
FROM    TSQL2sDay_FirstAgg      

-- Basic Row Count by Name
SELECT  NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

but how do we figure out what the first ID was for each name along with the number of rows?  You could work something out using the HAVING clause of the SELECT statement, or you could do something like the following:

--SELECT first ID and count of rows by Name
SELECT  FirstID = CONVERT(UNIQUEIDENTIFIER, RIGHT(MIN(CONVERT(VARCHAR(24), DateStored, 121) + CONVERT(VARCHAR(36), ID)),
                                                 
36))
      ,
NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

It looks complicated, but it’s not; let’s step through it.

  1. We have to know some basic information about our data; in this case, we know that the datetime value associated with each row with a common name is different.  In other words, there are no two Peanuts with the same DateStored value.  This is important, because in order for there to be a first value, there must be some method of ALWAYS determining which one WAS first.  If two Peanuts showed up at the same time, the model is broken.
  2. The first thing we do is to CONVERT the DateStored value to a varchar; this allows us to concatenate it with other values.  The format of that varchar string is important; it must be precise, and it must sort in an ascending order.  The ODBC canonical format (with milliseconds) is a good candidate for this.
  3. We then CONVERT the uniqueidentifer to a varchar, and append it to the DateStored varchar value.  This gives us a lengthy string which can be sorted by the first 24 characters.
  4. We find the MIN of the string we constructed; this MIN value is determined by the optimizer based on the sorting value of the numbers in the DateStored value.
  5. We then take the RIGHT-most 36 characters (the length of a uniqueidentifier), and convert it back to a uniqueidentifier (so that we have our type back).

There are probably better solutions for this, but this is a simple trick that works under certain circumstances and is portable to several flavors of SQL.

Share