Development

Building ranges using numbers

A while back, I posted a solution to a problem I was facing where I needed to generate a table of ranges.  While I haven’t experienced any sort of issues with the solution, a couple of things have recently occurred which has caused me to go back and take a look at the problem, and see if I can come up with a more elegant solution.

The first was a comment by Jeff Moden on the original post; Jeff suggested that recursion may not be the best choice for a solution given the relatively large number of logical reads compared to his solution and/or a permanent Numbers table.  Jeff posted a solution, but his solution only addressed the first part of the article (building a dynamic numbers table) and NOT the second part (coming up with a table of ranges).  The problem that I was struggling with using a table of numbers was the dynamic span between rows; in other words, I couldn’t figure out how to force row 1 to start with 10, and row 2 start with 215

The second driving factor was the FizzBuzz solution I proffered yesterday; while I was reviewing that code, I had a glimmer of an idea about how to handle the dynamic spanning using MODULO.  I reworked the solution using Jeff’s suggestion, as well as using a permanent numbers table.  My examples are below:

Original code:

/*A dynamically generated table of ranges.
2009-01-25 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Stefan Keir Gordon for the inspiration
 http://www.keirgordon.com */


DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000

/*@MaxRanges must be <= 32767;
  if the range between Start and End is large,
  and @MaxRanges > 32767, the End point won't be reached.*/


SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))

;WITH Ranges(MinValue) AS
  (SELECT @Start AS MinValue
   UNION ALL
   SELECT (Minvalue + @MinChunk) AS MinValue
   FROM Ranges WHERE MinValue < @End)
SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM Ranges
WHERE MinValue < @End
OPTION(MAXRECURSION 32767)

Modification based on Jeff’s suggestion:

/*A dynamically generated table of ranges. part 2
2009-09-27 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Jeff Moden for the inspiration*/

DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000
SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))


SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM (    SELECT 
            ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) -1 + @Start AS MinValue
        FROM Master.sys.columns ac1
        CROSS JOIN Master.sys.columns ac2) x
WHERE (MinValue = @Start
    OR (Minvalue-@Start) % @MinChunk = 0)
    AND MinValue + @MinChunk <= @End + @MinChunk

I also used a permanent table of numbers (per Jeff’s suggestion), but given the similarity to Jeff’s modification, I won’t post the code here.  Simply replace the subquery with a numbers table (you’ll need to monkey with the math to get the dynamic starting point).  What was interesting to me was the output for each of the three solutions when using SET STATISTICS IO and TIME:

Original code:

Table 'Worktable'. Scan count 2, logical reads 607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 16 ms,  elapsed time = 88 ms.

Modification (dynamic table):

Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 187 ms,  elapsed time = 203 ms.

Modification (permanent table):

Table 'NUMBERS'. Scan count 1, logical reads 424, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 125 ms,  elapsed time = 156 ms.

At first glance, it would appear that Jeff’s assertion is correct; the number of logical reads is greatly reduced by his suggested modification (more so by the dynamic table than the permanent table, interestingly enough).  However, the CPU time is increased.  And, while I know that execution plans are no guarantee for performance, in every situation SQL Server estimated that avoiding the original code was 0% of the three plans (dynamic modification 65% and permanent numbers table 35%).

So now, I’m stuck.  I can see Jeff’s point, but I’m not sure that a solution based on his suggestion is that much better than my original solution; do I worry more about I/O or CPU?

FizzBuzz

Stumbled across Mike Hillwig’s post in response to a question posed by Brent Ozar.  Depending on how literally you want to use the word “print” (do you mean the actual command, or do you just want output?), I think I have a slightly more elegant solution than what Mike proposed.

SELECT CASE WHEN n%3=0 AND n%5=0 THEN 'FizzBuzz' 
            WHEN n % 3 = 0 THEN 'Fizz'
            WHEN n % 5 = 0 THEN 'Buzz'
            ELSE CONVERT(varchar(8), n) END as "N"
FROM (    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) AS "N"
    FROM Master.sys.All_Columns ac1) x

Basically, it avoids the WHILE loop; however, if you must use a PRINT command, then Mike’s suggestion is probably as elegant as I can imagine.  What’s interesting is that this may also provide me a better solution to my need to build a dynamic range that I posted a while back.  I’ll have to think about it a bit more, but I think it’s possible.

Coming to PASS Summit 2009: Twitter BINGO!

So, last year at PASS Summit, I had my first real experience with a heavily-twittered event.  What was interesting was that I met a lot of people in real-life that I had just recently begun following on Twitter; I had made the comment (in passing) to someone that we should have had BINGO cards to keep track of all of the face-to-face interaction.  This year (since I just recently found out that I was going to Summit 2009 – yay!), I decided to do what I could to make the idea come to fruition.  Since I know Brent Ozar is big into social networking and he has some pull with Quest, I approached him with the idea.

We’re still working out the details, but it looks like Twitter BINGO is on for Summit 2009 (note: this is NOT TwitterINGO).  I do know that we’re going to need at least 25 tweeps (is there another name for people who use Twitter?) to volunteer as game board icons.  As an icon, you’ll be sought out by game players seeking to fill in their game cards.  If you’re interested in volunteering, please leave a comment with your twitter handle below (or DM me, @stuarta); to be eligible you must:

        • Use Twitter semi-regularly;
        • Be planning to attend PASS Summit 2009, and;
        • Be willing to talk to lots of people you don’t know now face-to-face.

More details will follow as we start working the kinks out, but I’m hoping this will help the online twitter community intersect with meatspace.  Looking forward to hearing from volunteers!

EDIT: Since it sounds like Quest and/or SQLServerPedia might be putting up prizes for this, we’re going to have to exclude vendor twitter accounts from playing.  Personal accounts are fine, so if you work for Quest, and they have a booth at PASS, you can still play as yourself.  Kapiche?

SQLSaturday #25 – big news!

Just a quick post to announce a couple of cool developments that are happening for SQLSaturday #25 in Gainesville, GA, Oct 10.

  • We’ve got 75 attendees registered!  I was really worried about the low turnout, but I’m feeling a lot more comfortable with it now.  One of the struggles that we’re facing is that Gainesville, GA is 55 miles from Atlanta; many residents seem to think that Gainesville is a long way away from the tech corridors.
  • We’ve got 15 presentations scheduled; that’s almost 3 full tracks, and I’m working to get those last 3 slots filled. 
  • Finally, WE HAVE A SPONSOR WITH CASH!  I’ve been very successful at getting the book publishers (Apress, Pearson, & O’Reilly) to donate stuff to giveaway, but I’ve had a hard time getting cash money collected.  Luckily, this is a small event, so our costs will be small, and we’re charging a lunch fee, but still… we need money 🙂   Anyway, thanks to Red Gate for stepping up and helping out; I love this company!

More news to come; we’re still working out the kinks with the student conference, but I’m excited about the opportunity. 

Another tale of the unwilling DBA – tempdb and the LUN

As many of you know, I’m a database DEVELOPER; I’ve been a DBA in the past, but my real specialty lies in getting different database platforms to exchange information with SQL Server.  I haven’t done a lot of hands-on administration (backups, maintenance plans, etc) in a while, but that’s changed recently.  Our primary production DBA has been out of the office for an extended period of time, and I’ve had to step back into a support role.

One of the projects he was working on before he left was performance tuning our databases on the SAN; he had recently segregated the tempdb into its own LUN, and everything looked OK, but during some of our overnight jobs we started seeing messages like:

Exception Information: System.Data.SqlClient.SqlException: The transaction log for database ‘tempdb’ is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

Uh-oh.  I had no clue where to begin; I knew that the database was on its own LUN, and I knew that that LUN was full, but how big should the tempdb LUN be?  I tried searching the Internet for guidance, but got nowhere; there doesn’t seem to be a rule-of-thumb for sizing tempdb based on other databases.

The LUN was currently sized for 10GB; we have 1.5TB of data.  I knew that seemed small, but we had been running for a couple of weeks without issues, so I suggested to our production DBA team that we should double it, and see what happens.   All was well until we got to the month-end processes, and BAM! 20GB of space sucked up by tempdb.  Now what?

I turned to Twitter, and got several helpful (and some not so helpful) responses; as many people pointed out, it needs to have as much free space available as possible, and the size of it was more dependent on activity than the amount of data.  Both excellent points, but neither one addresses the need to have a specific LUN size on a SAN.  Space ain’t free.

@venzann came up with the best guide: “Size of indexes + estimate on largest transaction size + wiggleroom.”

Using his suggestion on our 1.5TB server, I came up with an estimate of 430GB in indexes.   I stopped there, knowing that my SAN admin would shoot me if I tried to run that past them; besides, our server had run OK for a couple of weeks on 10GB; if we needed 430GB, we would have choked long before that (in retrospect, our production DBA should have recorded the size of tempdb BEFORE he moved it to a LUN; c’est la vie).   I decided to pick a smaller number: 100 GB.  Why?

First, it made logical sense using @venzann’s formula.  We use a partitioning schema for most of our data, so the sum of the active indexes was really only about 75GB.  Other indexes may be occasionally used by queries, but 90% of our searches only look at the last 2 days worth of data.

Second, it’s a nice round number.  100GB is 15% of 1.5TB.  Round numbers are good for rules-of-thumb.  You never hear anyone say “that ____ should be 14% of ____; just a rule-of-thumb”.

Finally, it was the maximum size I could request from our SAN admin without them ordering new drives.  Hey, if someone says you can have $100, you don’t say “all I need is 50” (unless of course, you’re Porter).

Anyway, it seems to be working; we’re humming along now without errors; in another 20 days or so, we’ll be doing month-end again, and we’ll see what happens.  I can’t wait for the production DBA to get back, so I can quit thinking about this stuff, and go back to my safe zone.

SQLSaturday #25 Gainesville GA Oct 10, 2009

Just a quick post to note that a) I’m still alive and posting, and b) I’ve been super busy trying to get this little conference off the ground.  I say little, because it looks like the Gainesville event is going to be a lot more intimate than the Atlanta event in April.  That’s not necessarily a bad thing, because I’m hoping that it will stimulate the growth of the North GA development community.

We’re still looking for sponsors and speakers, so if you’re interested in either, feel free to let me know (http://www.sqlsaturday.com/eventhome.aspx?eventid=26).  The biggest obstacle that I’m running into is the fact that many sponsors are concerned that the event is too close to the Atlanta event; on the other hand, many of the typical attendees are concerned that the event is too far from Atlanta.  It’s an odd catch-22.

Nevertheless, we have managed to garner some impressive presentations, and we’re continuing to slowly add attendees.  If we don’t have a lot of sponsors, we’ll be OK; as long as we can build community, we should.

Speaking of building community, if you haven’t read Alan Stevens’ recent blog postings about his experiences at the Jacksonville Code Camp, you should. 

http://netcave.org/AnEyeOpeningExperience.aspx

http://netcave.org/AnEyeOpeningApology.aspx

I’m not saying that I necessarily agree with everything Alan posted, but his experience should be considered when planning one of these events; although we want to keep SQLSaturdays (and other code camps; note the little “c”) community-focused, we need to be sure that expectations between speakers, attendees, and organizers are well-managed.  I appreciate the fact that Alan apologized for his statements; I also think that as an organizer, I need to learn from his experience to avoid similar situations at the events I help with.

Gotta run.

Wrapup: Columbus, GA SQL Server User Group, August 25, 2009

Tuesday, I drove to Columbus,GA to speak at the Columbus GA SQL Server User Group; I had first met Ken Simmons at the Birmingham SQL Saturday, and had promised him that I would drive down there to present.  Ken’s done a great job of organizing a PASS chapter in a small-town setting; Columbus doesn’t have a lot of technical employers, so founding a specialized user group requires a lot of work and input from the few large employers in town.   There were about 14 people in attendance, and 10 of them worked for the same company (although in different divisions).

Just as a side note; the group is meeting at the Columbus Public Library, which is a BEAUTIFUL facility.  The staff was very nice and accommodating.  The only limitation to meeting there is that they have to shut down by 8 PM.

My talk went OK; I’m always critical of my own work, and I realized that it’s a bit esoteric for some people.  One thing that I have learned (but haven’t really applied) is that many people who attend user group sessions and SQLSaturday’s don’t always have the same experiences that I do.  For example, I briefly discussed indexes in this presentation (Confessions of a Data Integrator; look for the upcoming blog posts), and I was a little surprised that there were so many questions about indexing strategies.  I was glad I was able to discuss it, but if I’d known that was still a hot topic, I would have prepared more.

So here’s the lesson learned: if you’re a beginning speaker, or even an advanced speaker, don’t assume that fundemental topics are overdone.  There’s still a great demand for people who can take an important concept (like index strategies, or disk I/O) and explain it in a simple fashion.  Even people who have been DBA’s for years may still have a question about something that you assume is too basic to focus on.

Speaking of basics, just a quick reminder that the Gainesville GA SQLSaturday (#25) is coming up in October; we’re still looking for speakers and attendees.  If you’re in the Atlanta area, Gainesville is only a half-hour away up I-85 and I-985.

Columbus, GA SQL Server Users Group – August 25th

Just confirmed today with Ken Simmons that I’ll be presenting at the Columbus GA SQL Server Users Group on August 25th at 6 pm at the Columbus Public library; if you’re in mid-west Georgia (and a data geek), come join us, and stop by and say “hi!”

I’ll be re-hashing a talk I gave at the Birmingham SQLSaturday regarding dealing with “somebody else’s bad data”; this is always a fun subject, and it’s great for interactive discussions (because every database professional deals with somebody else’s problem).

See you there!

Fun with NEWSEQUENTIALID()

We’re thinking about converting the constrain on an existing column in a table with billions of rows of data.  The column is a uniqueidentifier serving as a nonclustered primary key; we’re planning on replacing the NEWID() constraint to a NEWSEQUENTIALID().  What is challenging me is my concern over primary key collisions; will the new default values eventually run into an existing primary key value?

I used the following script to test, and I was a little surprised at the results:

CREATE TABLE TestNewSeqID(ID uniqueidentifier CONSTRAINT DF1 DEFAULT newID(),
                            Marker integer Identity(1,1))
GO

DECLARE @x int
SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END


SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


ALTER TABLE TestNewSeqID
DROP CONSTRAINT DF1;

ALTER TABLE TestNewSeqID
ADD CONSTRAINT DF1 DEFAULT newsequentialid() for ID;

SET @x= 1
WHILE @x <=10
BEGIN
    INSERT INTO TestNewSeqID DEFAULT VALUES
    SET @x = @x +1
END

SELECT *
FROM TestNewSeqID
ORDER BY ID DESC


My results looked something like the following:

ID    Marker
F664E06A-51BF-4FC1-960E-C9E5E854ADAD    9
0746DBAC-8B14-463C-8C4E-9CC789794963    7
24217231-1865-4E2A-B8EF-944DEC9AC6A2    1
323C15B4-59DA-4B73-94E5-54A1A0860F3D    8
B0C1E7EE-3701-4916-A3E2-465FE8462965    2
BFE8C2AB-4788-4596-867E-221F75BAB338    10
C0BC2263-F5E6-41F1-83F6-14BD9D50A6DD    3
9FA2A683-FF02-458E-8703-10D1DCDEFA97    4
75445937-6B46-4B88-9525-0DD809A7BE4B    6
E93284BE-E93B-4009-B206-04CB2B107293    5
4390AD7A-CD7D-DE11-98B8-00196665566A    20
4290AD7A-CD7D-DE11-98B8-00196665566A    19
4190AD7A-CD7D-DE11-98B8-00196665566A    18
4090AD7A-CD7D-DE11-98B8-00196665566A    17
3F90AD7A-CD7D-DE11-98B8-00196665566A    16
3E90AD7A-CD7D-DE11-98B8-00196665566A    15
3D90AD7A-CD7D-DE11-98B8-00196665566A    14
3C90AD7A-CD7D-DE11-98B8-00196665566A    13
3B90AD7A-CD7D-DE11-98B8-00196665566A    12
3A90AD7A-CD7D-DE11-98B8-00196665566A    11

If I looped through several more times, the seed changed and the values got progressively higher, but they were still sorting lower than the original data set.  I’d appreciate it if others would run this script and see if they got the same results.

SQLSaturday #25 (Gainesville, GA) – No good topic turned away!

I think in my rush to emphasize the hybrid nature of this session of SQLSaturday with its focus on Geographic Information Systems and Environmental Policy, I inadvertently downplayed the fact that this is STILL a SQLSaturday.  If you’re passionate about SQL Server, and you want to share that passion with others, this is a great way to interact with others in the community.

So, if you’ve got a topic to share (even if it’s NOT one of our special topics), please sign up to do so.  I’m sorry for the confusion.

SQLSaturday #25
Gainesville, GA
October 10, 2009