Weekly Link Roundup 2009-02-04

Wow, busy week this week.

How I Get By Without SysadminJeremiah Peschka has a great article on tuning queries from a developer’s perspective.  I’ve been a big advocate of some of his methods for a while now, and it’s nice to see other people come to some of the same conclusions that I have.

Index Clean-Up Scripts:  It seems like I’ve beeen reading a lot about performance tuning and maintenance issues; you’re right.  I’m sudying for the upgrade to my MCDBA (MCITP), and a lot of my maintenance skills are rusty.  Posts like these are very interesting and help to keep the skills in check.

To be honest, I haven’t read a lot of technical sites this week; too worried about what’s going on with this joke of a stimulus package.  I’ll do better next week, I promise 🙂

Weekly Link Roundup 2009-01-28

Stuff I’ve been reading:

Lots and lots of stuff on the digital tv transition:  This is fascinating to me; it’s a perfect example of the stupidity of federal government.  They sold the bandwidth, so everyone needs to upgrade.  They set a date.  We have a change in exeuctive administration (but not in legislative), and suddenly the date’s no good anymore.  I wonder how many lawsuits will be filed by the companies that will technically own the airwaves past the transition date if it gets delayed?

I’ve gone back to the newsgroups:  I used to post quite frequently in microsoft.public.sqlserver.programming (and a few others), but got tired of the same old questions, and many of the same old responses.  However, since SQL Server 2008 has been released, there’s some fresh meat in there.

Notes from SQLSaturday #10 in Tampa:  I’ll admit it; I’m starting to get nervous about our own upcoming SQLSaturday.  Andy Warren’s writeup of the recent SQLSaturday just goes to remind me of how much we’ve got left to do.  Speaking of Andy, he’s also posted his PASS Updates: 1, 2, 3

Happy coding.

Building ranges using a dynamically generated numbers table

We use a queueing metaphor for dataloads for the majority of our imports, and it’s been a challenge to figure out how to generate dynamic ranges to add to the queue. In other words, we want to tell our system to go and pcik data from a source that has an id value >x and <=y, and then do it again in a few minutes with a new x and y value. I should say that it’s been a challenge because I’m still not used to SQL 2005 features; I came across a bit of code using a CTE and recursion to generate a numbers table (thanks to Stefan Keir Gordon). Here’s a slightly modified version of his code:

/*thanks to Stefan Keir Gordon for the inspiration
 http://www.keirgordon.com*/

SET STATISTICS IO ON

;WITH Numbers(n) AS
   (SELECT 1 AS n
	UNION ALL
	SELECT (n + 1) AS n
	FROM Numbers
	WHERE n <= 32767)
SELECT n
FROM Numbers
WHERE n<=32767
OPTION(MAXRECURSION 32767)
In case you’re wondering, 32767 is the maximum number of recursions supported by SQL Server in a CTE; Stefan’s code neatly generates a single column table with values from 1 to 32767. Looking at that, I realized I could easily get a simple range by adding a second column to the final SELECT, like so:
 ;WITH Ranges(MinValue) AS
  (SELECT 1 AS MinValue
   UNION ALL
   SELECT (Minvalue + 1) AS MinValue
   FROM Ranges WHERE MinValue <= 32767)
SELECT MinValue, MinValue + 1 as MaxValue
FROM Ranges
WHERE MinValue <= 32767
OPTION(MAXRECURSION 32767)

I end up with a two-column table with a MaxValue > MinValue, and the ranges increase by 1. But I had some issues to address:

  • I needed to be able to pass in Start and End values to this code.
  • I needed chunks bigger than 1 value in size; I also had a minimum size I wanted to use.
  • In some cases 32767 ranges would be appropriate; in some cases, I needed less.

Here’s the final code; setting the Start and End valeus for the range of ranges is easy to understand, but the other two values may be a bit confusing. The first is @MaxRanges, which simply says “I want no more than this number of ranges to get from the Start to the End”. The maximum value for this number is 32767. @MinChunk refers to the minumum chunk size, or the space between MinValue and MaxValue for a given row.

It may be easier to just play with the values and see; you should never have more rows than @MaxValue, and the difference between MinValue and MaxValue for a given row should always be greater than or equal to @MinChunk:

/*A dynamically generated table of ranges.
2009-01-25 Stuart R Ainsworth
	http://codegumbo.com/

Thanks to Stefan Keir Gordon for the inspiration
 http://www.keirgordon.com */

SET STATISTICS IO ON

DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000

/*@MaxRanges must be <= 32767;
  if the range between Start and End is large,
  and @MaxRanges > 32767, the End point won't be reached.*/
SELECT @MaxRanges=1000, @MinChunk=205

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
	SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))

;WITH Ranges(MinValue) AS
  (SELECT @Start AS MinValue
   UNION ALL
   SELECT (Minvalue + @MinChunk) AS MinValue
   FROM Ranges WHERE MinValue < @End)
SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
			     THEN MinValue + @MinChunk
			     ELSE @End End
FROM Ranges
WHERE MinValue < @End
OPTION(MAXRECURSION 32767)

Good luck; happy coding!

Let’s all go to the movies….

I just got back from watching Gran Torino with the wife; if you haven’t seen this film yet, it’s frikkin’ awesome!  Warning, this movie definitely warrants the R rating it carries.  It’s both funny and sad, and speaks a lot about vengeance, growth, and personal retribution.  It’s kind of a mature Karate Kid.  The only sad thing is that I can’t repeat any of the fiercely comedic lines BECAUSE THEY ARE SO AMAZINGLY OFFENSIVE!    Seriously, drop one of these lines in the wrong crowd, and you could lose some teeth.

Anyway, if you’ve got a strong stomach and love to see old men with attitude, go watch this film.  4 out of 5 stars.

Weekly Link Roundup 2009-01-21

Stuff I’ve been reading:

Causing a stir in Tok Alaska: I first started reading this blog after reading about the -78 degree weather in Alaska on the Drudge Report. The writer is quite talented, and it’s a lot of fun to read.

Wordle:  Technically, this ain’t reading.  I found out about Wordle from Brent’s blog, and thought it was a cool exercise in computer-generated art.  Below is the wordle for my site, generated today.

Joe Celko’s Temporal Data Techniques in SQL:  I’ve said some pretty harsh things about Joe Celko in the past; his online persona is, frankly, obnoxious, and I’ve disagreed with him on many of his approaches to programming in SQL Server.  However, he’s very intelligent, and it’s hard to argue with him when it comes to pure design.

Tony Davis’s Bar-Poo for DBA’s & Developers: Tony writes some esoteric stuff, and this particular post is no exception.  But hey, I won a prize!

Happy coding!

catching up…

I’m starting to fall back into a bad habit of not posting regularly; I’ve already missed a couple of weekly updates, and I haven’t shown off much coding lately.  I wish I could blame it on a lack of time, but the truth is that I’m lazy.  Even though I’ve been keeping busy, I still have plenty of free time.  I’ve just been frittering it away on television.  However, I’ve decided I’m going to take some advice from a fellow blogger: if you can’t keep a committment to two posts a week, commit to one.  Henceforth, I’ll be posting at least one post a week on Wednesday nights.

Now that I’ve got that out of the way, here’s the breakdown of what’s been up with me since my last post:

  • Judged a speech contest at the local community college,
  • Started our latest sprint in the scrum method at work,
  • Finished the sponsor letter for SQLSaturday,
  • made several appeals for volunteers and speakers for SQLSaturday,
  • finally took down my Christmas lights, and
  • cleaned my home office.

I’m pooped just making the list.  🙂

AtlantaMDF meeting 2009-01-12

Wanted to drop a quick post wrapping up tonight’s usergroup meeting for AtlantaMDF.  Tonight was the first chance I’ve had to pitch SQLSaturday #13 to a larger audience, and I got a lot of positive feedback.  I had several people volunteer to help, and even had two or three people volunteer to present (not including those who were interested in presenting in my two super-short sessions).  I’m hoping this is going to be killer; I’ve got a lot of legwork to do to get sponsors lined up, but we’re on our way!

There were a lot of new faces tonight; the cynical side of me wants to point to the economy.  A lot of people are either looking for work, or want to make sure that they still have work in the next year.  User groups provide both free education, and free contacts, and you can’t fault someone for doing what they can to improve themselves and make sure that their opportunites are lined up.  However, I think a lot of people are finally starting to realize that SQL Server is a great platform for database development, and the best way they can learn about the technology is to participate in the user group.

Tonight, Eric Melcher of Intellinet gave a great presentation on End-To-End BI.  If you weren’t there, you may want to download it from the AtlantaMDF site when he publishes it; I think the title was a little misleading, but the content was great.  Eric really focused on the client tools available for analysis, including everything from Excel to Performance Point, touching on Excel Web Services, Proclarity, and Reporting Services along the way.  If you’re like me, and you feel overwhelmed by what BI is, and what options are available to display data, this was a great presentation to get your feet wet.

Happy New Year!

So, sitting up and watching Double Impact and cruising the web, I read Andy Warren’s Thoughts for 2009, and I thought I should probably contribute my own ideas as well.  I definitely want to spend more time blogging and contributing to the user groups at large.  Obviously, I’m going to be tied up with SQL Saturday for the first part of the year (speaking of which, I need to get my sponsor letter organized), but I still want to do more. 

Professionally, I need to find more leadership opportunities in my company.  This year’s a bit of a challenge because we’re understaffed with a lot of projects coming down the pipe, and there’s not a lot of openings in management positions, BUT I need to find a way to make my name known.  I want to be the SQL guy for not just my division, but I also want to expand beyond that to the company level as well.

I want to complete my MCITP certs for database administration and development this year.  One thing I learned at PASS Summit was that I need to continue to find ways to expand my knowledge.  I feel very confident in my ability to code T-SQL and design elegant database solutions, but I have 0 experience in Microsoft’s BI solution, and my DBA skills are a bit rusty.

I like Andy’s suggestion to play chess; I’m going to open that up and say I want to play more games with my kids.  They’re getting older, and soon they’ll be too old to hang out with me.  I want to find ways to encourage them to think, and I want to spend time just having fun with them.

Finally, I’m going to get healthy.  As soon as this shoulder recuperates from surgery, I’m back on the workout routine.  I am going to find 30 minutes a day to work out, and I’m going to stick to it.

How about you?

Weekly Link Roundup 2008-12-31

Happy New Year! Another short list this week; I had shoulder surgery yesterday (bone spur near my rotator cuff), so I’m just now coherent. Hydrocodone is potent stuff. I’m hoping that next week will get things back to normal.

Eyespi20: I always enjoy it when someone comments on my site, because it gives me the opportunity to find new sites to read. Margaret’s site is one of those links (I need to add her to my BlogRoll page). I particularly enjoyed this post, because it turned me on to TwitterTools. She’s apparently also into remodeling, a subject that once was near and dear to my heart.

It’s All About the Wireless Access:  Andy Warren’s brief musing on a new netbook; I’m currently lusting after an Asus EEE 900HD (or 1000HD; can’t decide).  Google led me to Scott Hanselman’s Dell Mini 9 – Practical Developer’s Review, which in turn led me to Omar Shahine’s article:  Welcome to a new era of computing… netbook and Atom

Obviously, I’ve been keeping up with facility9 (who appears to be sharing free code with Brent Ozar), since his last link round up spurred my Monday post: Auto generated SQL Server keys – uniqueidentifier or IDENTITY – a rebuttal.  Is it shameless self-promotion to list your own stuff in your list of links?

Finally, this last one is kind of specialized; I’m a fan of SageTV(software for Home Theatre PC’s), although not necessarily a power user.  A recent thread in their user forums has opened up about form vs. function, which is interesting to anybody from a developer background.  At what point do you allocate resources to alter the design as opposed to adding new features?

Happy coding!

Auto generated SQL Server keys – uniqueidentifier or IDENTITY – a rebuttal

I found this article by Armando Prato from facility9’ssite; for the most part, the article explains the differences between using a uniqueidentifier and integer-based identity values to auto-generate (surrogate) primary key values. However, the conclusion to the article has a logical flaw; can you spot it?:

It’s evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches:

  • IDENTITY generated values are configurable, easy to read, and easier to work with
  • Fewer database pages are required to satisfy query requests
  • In comparison to NEWID(), page splitting (and its associated overhead) is eliminated
  • Database size is minimized
  • System functions exist to capture the last generated IDENTITY value (i.e. SCOPE_IDENTITY(), etc)
  • Some system functions – such as MIN() and MAX(), for instance – cannot be used on uniqueidentifier columns
  •  

    Here’s the hint: page-splitting has less to do with data-types than it does with relationship of the clustered index to the natural order of insertion for the data. the author assumes that the PRIMARY KEY is located on the clustered index, and that’s one of my pet peeves. A PRIMARY KEY constraint is a physical implementation of a logical constraint; it’s used to enforce data validation rules, and has nothing to do with data storage or performance concerns. A clustered index, on the other hand, is one of the basic tools for enhancing database performance. Although SQL Server automatically creates a unique clustered index when building a PRIMARY KEY constraint (if no clustered index exists), a good database design never assumes that the clustered index should automatically sit on the same columns as the PRIMARY KEY constraint.

    Granted, if you use integer-based identity values for your surrogate keys, this argument has less power than if you are using uniqueidentifiers; a good rule of thumb to use when choosing columns for clustered indexes is that they should be relatively unique, and they should increase monotonically (i.e, the order of values is such that 1<=1<=2, etc). Identity’s are always unique, and the value of the next identity is always greater than the last.

    But there are times when you cannot use an identity column (partitioned views. for example), and the choice of using a uniqueidentifier should not be dismissed for fear of page splits; rather, you should choose a different column for your clustered index. I modified the code from the original article to prove my point:

    SET NOCOUNT ON
    GO
    USE MASTER
    GO
    CREATE DATABASE CodeGumbo
    GO
    USE CodeGumbo
    GO
    -- Start at 1 and increment by 1
    CREATE TABLE IDENTITY_TEST1
    (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered by default
    CREATE TABLE NEWID_TEST
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
    )
    GO
    --uniqueidentifier, clustered on a different column
    CREATE TABLE NEWID_TEST2
    (
    ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY NONCLUSTERED,
    TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000),
    UTCDATESTORED datetime DEFAULT GETUTCDATE()
    )
    GO
    CREATE CLUSTERED INDEX clidx_NewID_Test_UtcDateStored ON NEWID_TEST2(UTCDateStored)
    GO
    -- INSERT 1000 ROWS INTO EACH TEST TABLE
    DECLARE @COUNTER INT
    SET @COUNTER = 1
    WHILE (@COUNTER <= 1000)
    BEGIN
    INSERT INTO IDENTITY_TEST1 DEFAULT VALUES
    INSERT INTO NEWID_TEST DEFAULT VALUES
    INSERT INTO NEWID_TEST2 DEFAULT VALUES
    SET @COUNTER = @COUNTER + 1
    END
    GO
    SELECT OBJECT_NAME([OBJECT_ID]) as tablename, avg_fragmentation_in_percent, fragment_count, page_count
    FROM sys.dm_db_index_physical_stats (DB_ID(), null, null, null, null)
    WHERE index_id = 1
    ORDER BY tablename
    GO
    USE MASTER;
    CROP DATABASE CodeGumbo;
    GO

    As you can see from the results below, the clustered index (and thus the table) has minimal fragmentation when the clustered index is on a different column than the uniqueidentifier (NewID_Test2).

    To be fair, the index on the primary key is pretty shredded; however, it’s been my experience that having a fragmented nonclustered index is less damaging than a fragmented clustered index. Both will require maintenance over time, but the longer you can keep the clustered index healthy, the better off your database will be. The trick is determining where the clustered index should live, and it’s not always the same column(s) as the primary key.