SQL Server

#sqlsat35 looking ahead to the weekend

This weekend I’ll be traveling to the DFW metropolitan area to attend SQLSaturday #35; I’m very excited about it.  I didn’t know if I’d be able to attend this weekend (had to trade kid time with the ex-wife), so I missed the call for speakers.  I am looking forward to actually attending sessions and bumping into some friends.   If you’re there, look me up; I’ll be wearing my SQLSaturday #41 t-shirt (see this link for a sample).

I’ll be packing a couple of presentations (just in case they have an opening): the Social DBA and my latest discussion on XML in SQL Server 2008.  I was planning on submitting both of them to PASS Summit this year, but I feel a little guilty about the Social DBA one given that I’ve completely slacked off over the last few months.  I keep thinking I’m going to get back on the wagon, but life has been flying by much too fast these days.

Speaking of friends, Dallas is kind of like an old home to me; I grew up in Louisiana, and many of my high school and college buddies wound up in the big D after graduation.  I’m looking forward to crashing on a few couches, having a few beers, and hearing what happened over the last 20 years or so.

Speaking today: PASS AppDev Virtual Chapter

I know it’s short notice, but to be honest, I totally forgot about this until a couple of weeks ago.  I’ll be presenting today at noon eastern on a LiveMeeting for the Application Developers Virtual Chapter of PASS.  Deets below:

“You Got XML In My Database? What’s Up With That?”
May 11th 12:00 PM EDT (GMT -4)
Add to Calendar
Presenter: Stuart Ainsworth

A brief presentation exploring the marriage of XML and relational databases, including when it works and when it doesn’t. Coverage will include various use case scenarios, and some tips on how to improve performance using design techniques.

Stuart Ainsworth

Stuart R Ainsworth, MA, MEd is a Database Architect working in the realm of Financial Information Security; over the last 15 years, he’s worked as a Research Analyst, a report writer, a DBA, a programmer, and a public speaking professor. He’s one of the chapter leaders for AtlantaMDF, the Atlanta chapter of PASS. A master of air guitar, he has yet to understand the point of Rock Band (“You push buttons? What’s that all about?”).

How do I view the presentation?
Attendee URL:  Live Meeting link

YAY! Somebody likes me…

I just got the email today; Red Gate friended me.  Or, rather, I am now a Friend of Red Gate.  What does this mean?  I try to think of it kind of like a NASCAR patch; I now have a perpetual sponsor for all of my various presentations in the SQL community.   I don’t know what else to say other than “THANK YOU!”

Seriously, I love Red Gate’s products, and have been a big fan of theirs for a long time.  Their SQL Compare product just works, and it works well (unlike the sometimes-challenging implementation in Visual Studio’s Database Developer).   I bought SQL Prompt with my own money because I saw it as a major time saver.

I’m very jazzed about this.

Building ranges using numbers

A while back, I posted a solution to a problem I was facing where I needed to generate a table of ranges.  While I haven’t experienced any sort of issues with the solution, a couple of things have recently occurred which has caused me to go back and take a look at the problem, and see if I can come up with a more elegant solution.

The first was a comment by Jeff Moden on the original post; Jeff suggested that recursion may not be the best choice for a solution given the relatively large number of logical reads compared to his solution and/or a permanent Numbers table.  Jeff posted a solution, but his solution only addressed the first part of the article (building a dynamic numbers table) and NOT the second part (coming up with a table of ranges).  The problem that I was struggling with using a table of numbers was the dynamic span between rows; in other words, I couldn’t figure out how to force row 1 to start with 10, and row 2 start with 215

The second driving factor was the FizzBuzz solution I proffered yesterday; while I was reviewing that code, I had a glimmer of an idea about how to handle the dynamic spanning using MODULO.  I reworked the solution using Jeff’s suggestion, as well as using a permanent numbers table.  My examples are below:

Original code:

/*A dynamically generated table of ranges.
2009-01-25 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Stefan Keir Gordon for the inspiration
 http://www.keirgordon.com */


DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000

/*@MaxRanges must be <= 32767;
  if the range between Start and End is large,
  and @MaxRanges > 32767, the End point won't be reached.*/


SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))

;WITH Ranges(MinValue) AS
  (SELECT @Start AS MinValue
   UNION ALL
   SELECT (Minvalue + @MinChunk) AS MinValue
   FROM Ranges WHERE MinValue < @End)
SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM Ranges
WHERE MinValue < @End
OPTION(MAXRECURSION 32767)

Modification based on Jeff’s suggestion:

/*A dynamically generated table of ranges. part 2
2009-09-27 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Jeff Moden for the inspiration*/

DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000
SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))


SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM (    SELECT 
            ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) -1 + @Start AS MinValue
        FROM Master.sys.columns ac1
        CROSS JOIN Master.sys.columns ac2) x
WHERE (MinValue = @Start
    OR (Minvalue-@Start) % @MinChunk = 0)
    AND MinValue + @MinChunk <= @End + @MinChunk

I also used a permanent table of numbers (per Jeff’s suggestion), but given the similarity to Jeff’s modification, I won’t post the code here.  Simply replace the subquery with a numbers table (you’ll need to monkey with the math to get the dynamic starting point).  What was interesting to me was the output for each of the three solutions when using SET STATISTICS IO and TIME:

Original code:

Table 'Worktable'. Scan count 2, logical reads 607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 16 ms,  elapsed time = 88 ms.

Modification (dynamic table):

Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 187 ms,  elapsed time = 203 ms.

Modification (permanent table):

Table 'NUMBERS'. Scan count 1, logical reads 424, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 125 ms,  elapsed time = 156 ms.

At first glance, it would appear that Jeff’s assertion is correct; the number of logical reads is greatly reduced by his suggested modification (more so by the dynamic table than the permanent table, interestingly enough).  However, the CPU time is increased.  And, while I know that execution plans are no guarantee for performance, in every situation SQL Server estimated that avoiding the original code was 0% of the three plans (dynamic modification 65% and permanent numbers table 35%).

So now, I’m stuck.  I can see Jeff’s point, but I’m not sure that a solution based on his suggestion is that much better than my original solution; do I worry more about I/O or CPU?

FizzBuzz

Stumbled across Mike Hillwig’s post in response to a question posed by Brent Ozar.  Depending on how literally you want to use the word “print” (do you mean the actual command, or do you just want output?), I think I have a slightly more elegant solution than what Mike proposed.

SELECT CASE WHEN n%3=0 AND n%5=0 THEN 'FizzBuzz' 
            WHEN n % 3 = 0 THEN 'Fizz'
            WHEN n % 5 = 0 THEN 'Buzz'
            ELSE CONVERT(varchar(8), n) END as "N"
FROM (    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) AS "N"
    FROM Master.sys.All_Columns ac1) x

Basically, it avoids the WHILE loop; however, if you must use a PRINT command, then Mike’s suggestion is probably as elegant as I can imagine.  What’s interesting is that this may also provide me a better solution to my need to build a dynamic range that I posted a while back.  I’ll have to think about it a bit more, but I think it’s possible.

Coming to PASS Summit 2009: Twitter BINGO!

So, last year at PASS Summit, I had my first real experience with a heavily-twittered event.  What was interesting was that I met a lot of people in real-life that I had just recently begun following on Twitter; I had made the comment (in passing) to someone that we should have had BINGO cards to keep track of all of the face-to-face interaction.  This year (since I just recently found out that I was going to Summit 2009 – yay!), I decided to do what I could to make the idea come to fruition.  Since I know Brent Ozar is big into social networking and he has some pull with Quest, I approached him with the idea.

We’re still working out the details, but it looks like Twitter BINGO is on for Summit 2009 (note: this is NOT TwitterINGO).  I do know that we’re going to need at least 25 tweeps (is there another name for people who use Twitter?) to volunteer as game board icons.  As an icon, you’ll be sought out by game players seeking to fill in their game cards.  If you’re interested in volunteering, please leave a comment with your twitter handle below (or DM me, @stuarta); to be eligible you must:

        • Use Twitter semi-regularly;
        • Be planning to attend PASS Summit 2009, and;
        • Be willing to talk to lots of people you don’t know now face-to-face.

More details will follow as we start working the kinks out, but I’m hoping this will help the online twitter community intersect with meatspace.  Looking forward to hearing from volunteers!

EDIT: Since it sounds like Quest and/or SQLServerPedia might be putting up prizes for this, we’re going to have to exclude vendor twitter accounts from playing.  Personal accounts are fine, so if you work for Quest, and they have a booth at PASS, you can still play as yourself.  Kapiche?

Another tale of the unwilling DBA – tempdb and the LUN

As many of you know, I’m a database DEVELOPER; I’ve been a DBA in the past, but my real specialty lies in getting different database platforms to exchange information with SQL Server.  I haven’t done a lot of hands-on administration (backups, maintenance plans, etc) in a while, but that’s changed recently.  Our primary production DBA has been out of the office for an extended period of time, and I’ve had to step back into a support role.

One of the projects he was working on before he left was performance tuning our databases on the SAN; he had recently segregated the tempdb into its own LUN, and everything looked OK, but during some of our overnight jobs we started seeing messages like:

Exception Information: System.Data.SqlClient.SqlException: The transaction log for database ‘tempdb’ is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

Uh-oh.  I had no clue where to begin; I knew that the database was on its own LUN, and I knew that that LUN was full, but how big should the tempdb LUN be?  I tried searching the Internet for guidance, but got nowhere; there doesn’t seem to be a rule-of-thumb for sizing tempdb based on other databases.

The LUN was currently sized for 10GB; we have 1.5TB of data.  I knew that seemed small, but we had been running for a couple of weeks without issues, so I suggested to our production DBA team that we should double it, and see what happens.   All was well until we got to the month-end processes, and BAM! 20GB of space sucked up by tempdb.  Now what?

I turned to Twitter, and got several helpful (and some not so helpful) responses; as many people pointed out, it needs to have as much free space available as possible, and the size of it was more dependent on activity than the amount of data.  Both excellent points, but neither one addresses the need to have a specific LUN size on a SAN.  Space ain’t free.

@venzann came up with the best guide: “Size of indexes + estimate on largest transaction size + wiggleroom.”

Using his suggestion on our 1.5TB server, I came up with an estimate of 430GB in indexes.   I stopped there, knowing that my SAN admin would shoot me if I tried to run that past them; besides, our server had run OK for a couple of weeks on 10GB; if we needed 430GB, we would have choked long before that (in retrospect, our production DBA should have recorded the size of tempdb BEFORE he moved it to a LUN; c’est la vie).   I decided to pick a smaller number: 100 GB.  Why?

First, it made logical sense using @venzann’s formula.  We use a partitioning schema for most of our data, so the sum of the active indexes was really only about 75GB.  Other indexes may be occasionally used by queries, but 90% of our searches only look at the last 2 days worth of data.

Second, it’s a nice round number.  100GB is 15% of 1.5TB.  Round numbers are good for rules-of-thumb.  You never hear anyone say “that ____ should be 14% of ____; just a rule-of-thumb”.

Finally, it was the maximum size I could request from our SAN admin without them ordering new drives.  Hey, if someone says you can have $100, you don’t say “all I need is 50” (unless of course, you’re Porter).

Anyway, it seems to be working; we’re humming along now without errors; in another 20 days or so, we’ll be doing month-end again, and we’ll see what happens.  I can’t wait for the production DBA to get back, so I can quit thinking about this stuff, and go back to my safe zone.

Wrapup: Columbus, GA SQL Server User Group, August 25, 2009

Tuesday, I drove to Columbus,GA to speak at the Columbus GA SQL Server User Group; I had first met Ken Simmons at the Birmingham SQL Saturday, and had promised him that I would drive down there to present.  Ken’s done a great job of organizing a PASS chapter in a small-town setting; Columbus doesn’t have a lot of technical employers, so founding a specialized user group requires a lot of work and input from the few large employers in town.   There were about 14 people in attendance, and 10 of them worked for the same company (although in different divisions).

Just as a side note; the group is meeting at the Columbus Public Library, which is a BEAUTIFUL facility.  The staff was very nice and accommodating.  The only limitation to meeting there is that they have to shut down by 8 PM.

My talk went OK; I’m always critical of my own work, and I realized that it’s a bit esoteric for some people.  One thing that I have learned (but haven’t really applied) is that many people who attend user group sessions and SQLSaturday’s don’t always have the same experiences that I do.  For example, I briefly discussed indexes in this presentation (Confessions of a Data Integrator; look for the upcoming blog posts), and I was a little surprised that there were so many questions about indexing strategies.  I was glad I was able to discuss it, but if I’d known that was still a hot topic, I would have prepared more.

So here’s the lesson learned: if you’re a beginning speaker, or even an advanced speaker, don’t assume that fundemental topics are overdone.  There’s still a great demand for people who can take an important concept (like index strategies, or disk I/O) and explain it in a simple fashion.  Even people who have been DBA’s for years may still have a question about something that you assume is too basic to focus on.

Speaking of basics, just a quick reminder that the Gainesville GA SQLSaturday (#25) is coming up in October; we’re still looking for speakers and attendees.  If you’re in the Atlanta area, Gainesville is only a half-hour away up I-85 and I-985.

Columbus, GA SQL Server Users Group – August 25th

Just confirmed today with Ken Simmons that I’ll be presenting at the Columbus GA SQL Server Users Group on August 25th at 6 pm at the Columbus Public library; if you’re in mid-west Georgia (and a data geek), come join us, and stop by and say “hi!”

I’ll be re-hashing a talk I gave at the Birmingham SQLSaturday regarding dealing with “somebody else’s bad data”; this is always a fun subject, and it’s great for interactive discussions (because every database professional deals with somebody else’s problem).

See you there!

Fun with NEWSEQUENTIALID()

We’re thinking about converting the constrain on an existing column in a table with billions of rows of data.  The column is a uniqueidentifier serving as a nonclustered primary key; we’re planning on replacing the NEWID() constraint to a NEWSEQUENTIALID().  What is challenging me is my concern over primary key collisions; will the new default values eventually run into an existing primary key value?

I used the following script to test, and I was a little surprised at the results:

CREATE TABLE TestNewSeqID(ID uniqueidentifier CONSTRAINT DF1 DEFAULT newID(),
                            Marker integer Identity(1,1))
GO

DECLARE @x int
SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END


SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


ALTER TABLE TestNewSeqID
DROP CONSTRAINT DF1;

ALTER TABLE TestNewSeqID
ADD CONSTRAINT DF1 DEFAULT newsequentialid() for ID;

SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END

SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


My results looked something like the following:

ID    Marker
F664E06A-51BF-4FC1-960E-C9E5E854ADAD    9
0746DBAC-8B14-463C-8C4E-9CC789794963    7
24217231-1865-4E2A-B8EF-944DEC9AC6A2    1
323C15B4-59DA-4B73-94E5-54A1A0860F3D    8
B0C1E7EE-3701-4916-A3E2-465FE8462965    2
BFE8C2AB-4788-4596-867E-221F75BAB338    10
C0BC2263-F5E6-41F1-83F6-14BD9D50A6DD    3
9FA2A683-FF02-458E-8703-10D1DCDEFA97    4
75445937-6B46-4B88-9525-0DD809A7BE4B    6
E93284BE-E93B-4009-B206-04CB2B107293    5
4390AD7A-CD7D-DE11-98B8-00196665566A    20
4290AD7A-CD7D-DE11-98B8-00196665566A    19
4190AD7A-CD7D-DE11-98B8-00196665566A    18
4090AD7A-CD7D-DE11-98B8-00196665566A    17
3F90AD7A-CD7D-DE11-98B8-00196665566A    16
3E90AD7A-CD7D-DE11-98B8-00196665566A    15
3D90AD7A-CD7D-DE11-98B8-00196665566A    14
3C90AD7A-CD7D-DE11-98B8-00196665566A    13
3B90AD7A-CD7D-DE11-98B8-00196665566A    12
3A90AD7A-CD7D-DE11-98B8-00196665566A    11

If I looped through several more times, the seed changed and the values got progressively higher, but they were still sorting lower than the original data set.  I’d appreciate it if others would run this script and see if they got the same results.