July 2009

Fun with NEWSEQUENTIALID()

We’re thinking about converting the constrain on an existing column in a table with billions of rows of data.  The column is a uniqueidentifier serving as a nonclustered primary key; we’re planning on replacing the NEWID() constraint to a NEWSEQUENTIALID().  What is challenging me is my concern over primary key collisions; will the new default values eventually run into an existing primary key value?

I used the following script to test, and I was a little surprised at the results:

CREATE TABLE TestNewSeqID(ID uniqueidentifier CONSTRAINT DF1 DEFAULT newID(),
                            Marker integer Identity(1,1))
GO

DECLARE @x int
SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END


SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


ALTER TABLE TestNewSeqID
DROP CONSTRAINT DF1;

ALTER TABLE TestNewSeqID
ADD CONSTRAINT DF1 DEFAULT newsequentialid() for ID;

SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END

SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


My results looked something like the following:

ID    Marker
F664E06A-51BF-4FC1-960E-C9E5E854ADAD    9
0746DBAC-8B14-463C-8C4E-9CC789794963    7
24217231-1865-4E2A-B8EF-944DEC9AC6A2    1
323C15B4-59DA-4B73-94E5-54A1A0860F3D    8
B0C1E7EE-3701-4916-A3E2-465FE8462965    2
BFE8C2AB-4788-4596-867E-221F75BAB338    10
C0BC2263-F5E6-41F1-83F6-14BD9D50A6DD    3
9FA2A683-FF02-458E-8703-10D1DCDEFA97    4
75445937-6B46-4B88-9525-0DD809A7BE4B    6
E93284BE-E93B-4009-B206-04CB2B107293    5
4390AD7A-CD7D-DE11-98B8-00196665566A    20
4290AD7A-CD7D-DE11-98B8-00196665566A    19
4190AD7A-CD7D-DE11-98B8-00196665566A    18
4090AD7A-CD7D-DE11-98B8-00196665566A    17
3F90AD7A-CD7D-DE11-98B8-00196665566A    16
3E90AD7A-CD7D-DE11-98B8-00196665566A    15
3D90AD7A-CD7D-DE11-98B8-00196665566A    14
3C90AD7A-CD7D-DE11-98B8-00196665566A    13
3B90AD7A-CD7D-DE11-98B8-00196665566A    12
3A90AD7A-CD7D-DE11-98B8-00196665566A    11

If I looped through several more times, the seed changed and the values got progressively higher, but they were still sorting lower than the original data set.  I’d appreciate it if others would run this script and see if they got the same results.

SQLSaturday #25 (Gainesville, GA) – No good topic turned away!

I think in my rush to emphasize the hybrid nature of this session of SQLSaturday with its focus on Geographic Information Systems and Environmental Policy, I inadvertently downplayed the fact that this is STILL a SQLSaturday.  If you’re passionate about SQL Server, and you want to share that passion with others, this is a great way to interact with others in the community.

So, if you’ve got a topic to share (even if it’s NOT one of our special topics), please sign up to do so.  I’m sorry for the confusion.

SQLSaturday #25
Gainesville, GA
October 10, 2009

You CAN teach an old dog….

This is more of a sticky note kind of post, to remind me to use a better syntax in the future.  Maybe it’ll help some of you as well.

A long time ago, when I was starting to write T-SQL, and I needed to extract only the date from a DATETIME column, I used to do something like this:

SELECT CONVERT(varchar(10), GETDATE(), 101)

 

and it worked OK.  That is, until someone pointed out that I needed to return a datetime instead of a varchar to my calling applications, so I wrapped the whole thing in another CONVERT function, like so:

SELECT CONVERT(datetime, CONVERT(varchar(10), GETDATE(), 101))

At least I now had the correct datetype.  But then I read on the newsgroups, that there may be a (marginal) performance benefit to using the DATEADD and DATEDIFF functions to return the date portion of a datetime, and the following method was suggested:

SELECT DATEADD(day, 0, DATEDIFF(day, 0, GETDATE()))

and that works great! I’ve been using that for years now.  However, it doesn’t work so well when trying to find the first date of a month or year or week.   I figured out that if I flip the parts of the DATEADD statement, then I could easily extract those values:

SELECT DATEADD(month, DATEDIFF(month, 0, GETDATE()),0) --returns 1st day of month
SELECT DATEADD(year, DATEDIFF(year, 0, GETDATE()),0) --returns 1st day of year
SELECT DATEADD(hour, DATEDIFF(hour, 0, GETDATE()),0) --returns the hour
SELECT DATEADD(quarter, DATEDIFF(quarter, 0, GETDATE()),0) --returns the 1st day of the quarter

And while that’s good, I still revert to the original positions out of habit when converting a datetime to a date (e.g., 20090724 23:59:00 to 20090724).  This post is more of a reminder to me to STOP DOING THAT, and use a consistent syntax for date conversions.

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)

Partitioning computed columns

For a project at work, I’ve been asked to help optimize our ETL process.  Part of that involves taking an existing table, and partitioning it along two existing columns.  Since the partitioning process only supports partitioning on one column, the obvious choice is to use a computed column.  I ran into some issues along the way, and thought I would share them here.

First, when partitioning a table on a computed column, you must persist that column.  Persistence is a new feature for SQL Server, introduced in SQL Server 2005 (and still available in 2008).  If you’re familiar with the concept of a computed column in SQL 2000, the basic concept is the same; however, persisted computed columns are columns that are physically stored in the database (the engine updates the values whenever values change in the source column) as opposed to virtual columns.

A persisted computed column is defined like so:

CREATE TABLE Sample (DateID int, TimeID int);

INSERT INTO Sample (DateID, TimeID)
VALUES (1,1);

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, '20090101'))
PERSISTED;

Of course, if you just ran that sample, you got the following error message:

(1 row(s) affected)
Msg 4936, Level 16, State 1, Line 6
Computed column ‘DateIDTimeID’ in table ‘SAMPLE’ cannot be persisted because the column is non-deterministic.

This was the first challenge; although the DATEADD function is always deterministic, the problem lies with the character representation of the date (e.g., ‘20090101’).  This behavior is different when running the code on a SQL 2005 server under the SQL 2000 compatibility mode (80) than when running it at level 90 or above; frankly, I was surprised that the code even compiled under SQL 2000 compatibility mode, since persistence of a computed column was not introduced until SQL 2005.

The way that I determined it was the character representation was I checked out Microsoft Connect (http://connect.microsoft.com), which listed the following issue:

Persistent Computed column using DATEADD | Microsoft Connect

From the response, let me quote:

Hi
The reason for the behavior you observe is because the conversion from string to datetime is not deterministic – it will depend on date format settings.
A work around may be to specify the date constant as a bigint (convert your date to bigint).
The behavior with the string constant is by design.
– Christian Kleinerman

Following Christian’s advice, I sought to do what he suggested, only the programmer in me couldn’t distinguish between CONVERT (the T-SQL function) and convert (the English command); in other words, I tried this:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, CONVERT(bigint, '20090101')))
PERSISTED;

Which, of course, gives a whole different error message:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type datetime.

The statement has been terminated.

Stepping out of programming mode for a minute, I realized that Christian meant that I should replace the string constant I was using to represent the date of January 1, 2009 with the bigint (the number of days since January 1, 1753).  I ran the following:

SELECT CONVERT(bigint, CONVERT(datetime, '20090101'))

Which gave me the result of 39812, which I then plugged into my ALTER TABLE statement, like so:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, 39812))
PERSISTED;

SELECT *
FROM Sample 

And the results were much closer to my liking:

DateID TimeID DateIDTimeID
1 1 2009-01-02 00:01:00.000

 

Of course, I now had to deal with the fact that when I had designed my date and time lookup tables, I had used a seed of 1 instead of 0, so now my results were off by a day and a minute, but simple math could adjust for that (future design tip; always use 0-based keys OR use a smart value for the date, like 20090101).  Anyway, partitioning now works, and I can alter the table as needed.

SQLSaturday #24 & #25 – Oct 9-10, 2009 Gainesville, GA

I’m pushing the envelope again.  I just agreed to help promote another SQLSaturday.

AtlantaMDF and Gainesville State College’s Institute for Environmental and Spatial Analysis are hosting a SQLSaturday.  Well, not just any SQLSaturday; at GSC, Saturday starts on Friday.  Let me try to explain.

On Friday, Oct 9, IESA is hosting a student conference to highlight academic work in Environmental and Spatial Analysis Computing; they’d like to introduce those efforts to the professional community, as well as expose their students to relevant issues faced by IT professionals, so on Saturday, Oct 10, IESA is hosting (in conjunction with AtlantaMDF) a traditional SQLSaturday.  Friday is mostly for the students; Saturday for the professional geeks.  However, students are being encouraged to attend both days, and we’re asking for volunteers from the professional community to listen and respond to student presentations on Friday.

Confused?  I’m probably going to be nuts by the time this is over.  However, if you’re interested, hop on over to http://www.sqlsaturday.com and check out events #24 & #25.  If you’re a SQL professional, and you’d like to register, sign up for event #25, and you’d like to submit a presentation, please feel free to do so as well.

Questions?  I’m listening.

I closed my Facebook account yesterday….

To my friends who have FB accounts, I’m sorry for the abrupt disappearance.  You probably didn’t even notice, given the wads of information you’re processing.

BTW, did you see this ad?  I saw it on Facebook. 

 

Sorry for that brief interruption; my brain gets like that lately.  Scattered, random bits of information keep popping in.  I’ve found it difficult to stay on track, and stay focused.  I’ve become so hung up on maintaining relationships with people from 20 years ago that I’ve started to slip at some of my present goals.  For example, I’ve been working on a book chapter prospectus for the last month, and I’ve had a devil of a time trying to keep it on track.  I’ve also agreed to help host another SQL Saturday in October, and I’ve done nothing for it.   Not to mention the random checking of my account during the work day.

This doesn’t really apply to me, but it’s funny, and I saw it on Facebook.

Facebook has also take some toll on me personally; sure, it’s 15 minutes here or there, but those frequent 15 minute interruptions add up, and they’re time away from my wife and kids.  I’m already in front of a PC 8 hours a day for my job, plus time for my career goals; they shouldn’t feel squeezed in between my FB time as well.

Facebook time?

Anyway, when does being the Social DBA become too social?  For me, it’s apparently when I use Facebook.  I’m afraid I’ll have to walk away from the applications, the social experiences, and the advertisements.  I’m not saying that others can’t manage their online experiences and their real-world time with aplomb and grace; I just can’t. 

Farewell, Facebook, I shall miss thee!

Quick Tip: TempDB on a multi-core SQL Server box

Just learned this today from one of the Microsoft support reps reviewing our SQL Server 2008 box for performance optimizations: if you are running SQL Server on a multi-core box, you should set up tempdb to have as many data files as there are physical CPU’s (up to a reasonable limit).  These files should also be equally sized.  In other words, if you need a 6GB tempdb, and you have 2 cores, you’d want to have 2 3GB files.  If you have 16 cores, you can probably limit it to 8 files of 750 MB.  In a 32 core box, you may benefit from additional files, but you probably want to test to see.

The reason for this (as it was explained to me) is that on a high-transaction server, there is the possibility that the file I/O associated with the creation and destruction of temporary objects may cause blocking on parallel CPU operations. For example, CPU0 may run a stored procedure that uses temporary tables; CPU1 may also run that same procedure simultaneously.  If there is a single tempdb data file, the I/O associated with the creation of the local temp object with the stored procedure being executed on CPU0 may block the creation of the files associated with CPU1, even though the tables themselves are locally scoped

Sizing the files equally also assists the optimizer, as space availability per file is used in considering where to store the local objects.   If you have 1 file that is extremely larger than the second file, most of your objects will be created in the larger file, which could also contribute to blocking concerns.  In short:

  • Create the same number of tempdb data files as there are physical processors on the server (up to 8), and
  • Make sure that the data files are equivalently sized.

NOTE: Immediately after posting this, I read the following from Kevin Kline.  Please note that I am not talking about multiple log files, but rather multiple data files.  http://sqlblog.com/blogs/kevin_kline/archive/2009/06/23/old-performance-tuning-recommendations-die-hard.aspx

NOTE2: @kbriankelley tweeted me shortly after this post, and recommended the following post: http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx

Apparently, there may be some disagreement about the requirement for the 1-to-1 ratio of data files to cores.   Paul Randall indicates that it may be a 1-4 or 1-2 ratio. I don’t know; I had a Microsoftie in the office telling our DBA team that we needed at least 8-10 data files for our 16 core box (and he did specify the 1-1 ratio) .  As a database developer, I don’t think about file I/O enough to argue with him.  I do think that this is a tip worth investigating and testing, however, which is the reason I posted it.