Code

Querying XML in SQL Server 2005+: namespaces

I recently helped a friend solve an XML problem, and thought I would post the solution here.  Although there are lots of notes on how to use XQuery in SQL Server 2005+, this was a real world scenario that was trickier than I expected.  The friend works for an insurance company broker, and in one of their applications, accident questionnaires (and their answers) are stored in XML.  This allows them to treat all questionnaires as the same, regardless of their origin as long as the QuestionCodes are common across vendors.

Below is the sample data that he was asking me about; he needed to get one question and answer per row into a data set:


DECLARE @T TABLE ( RowID INT, Fragment XML )
INSERT  INTO @T
       
( RowID, Fragment )
VALUES  ( 1, '<Questionnaire xmlns="http://www.xxx.com/schemas/xxxXXXXXxxx.0">
<Question>
<QuestionCode>74</QuestionCode>
<Question>Why did you wreck your car?</Question>
<Answer>I was drunk</Answer>
<Explanation />
</Question>
<Question>
<QuestionCode>75</QuestionCode>
<Question>Why is the rum all gone?</Question>
<Answer>Because I drank it.</Answer>
<Explanation />
</Question>
</Questionnaire>'
)
,       (
2, '<Questionnaire xmlns="http://www.xxx.com/schemas/xxxXXXXXxxx.0">
<Question>
<QuestionCode>74</QuestionCode>
<Question>Why did you wreck your car?</Question>
<Answer>Stuart was drunk</Answer>
<Explanation />
</Question>
<Question>
<QuestionCode>75</QuestionCode>
<Question>Why is the rum all gone?</Question>
<Answer>Because I made mojitos.</Answer>
<Explanation />
</Question>
</Questionnaire>'
)

I thought it was a simple query; simply use the .nodes() method to rip each of the questions and corresponding answers into their own rows, but for some reason, when I ran the following, I got interesting results:

SELECT  t.RowID
     
, QuestionCode = t1.frag.value('(QuestionCode)[1]', 'int')
      ,
Question = t1.frag.value('(Question)[1]', 'varchar(max)')
      ,
Answer = t1.frag.value('(Answer)[1]', 'varchar(max)')
      ,
Explanation = t1.frag.value('(Explanation)[1]', 'varchar(max)')
FROM    @t t
       
CROSS
APPLY Fragment.nodes('//Questionnaire/Question') AS t1 ( frag )

RowID  QuestionCode    Question    Answer  Explanation

That’s right, nothing.  Strange, considering I’ve done variations of this query for a couple of years now to parse out firewall data fragments.  I looked closer, and tried to see what was different about the XML fragment from this example compared to mine, and it was clear: a namespace reference.   Most of the data I deal with is not true XML, but rather fragments I convert to XML in order to facilitate easy transformations.  To test, I stripped the namespace line (xmlns="http://www.xxx.com/schemas/xxxXXXXXxxx.0" ) out, and voila!  Results!

RowID QuestionCode Question Answer Explanation
1 74 Why did you wreck your car? I was drunk
1 75 Why is the rum all gone? Because I drank it.
2 74 Why did you wreck your car? Stuart was drunk
2 75 Why is the rum all gone? Because I made mojitos.

Well, that was great, because it showed me where the problem was but how do I fix it?  I stumbled upon a solution, but to be honest, I’m not sure it’s the best one.  If I modify my query to refer to any namespace (the old wildcard: *) like so:

 

SELECT  t.RowID
     
, QuestionCode = t1.frag.value('(*:QuestionCode)[1]', 'int')
      ,
Question = t1.frag.value('(*:Question)[1]', 'varchar(max)')
      ,
Answer = t1.frag.value('(*:Answer)[1]', 'varchar(max)')
      ,
Explanation = t1.frag.value('(*:Explanation)[1]', 'varchar(max)')
FROM    @t t
       
CROSS
APPLY Fragment.nodes('//*:Questionnaire/*:Question') AS t1 ( frag )

 

I get the correct results.

RowID QuestionCode Question Answer Explanation
1 74 Why did you wreck your car? I was drunk
1 75 Why is the rum all gone? Because I drank it.
2 74 Why did you wreck your car? Stuart was drunk
2 75 Why is the rum all gone? Because I made mojitos.

Here’s the question for any XML guru that stumbles along the way; is there a better way to do this?

T-SQL tip: Use a role for EXECUTE permissions

If you are in a high-security environment, this may not be the best tip for you, but if you’re in a situation like mine where you have a SQL user that is connecting to a database in order to EXECUTE stored procs, and you know that user needs to be able to EXECUTE every proc in that database, you can save a little time by creating a role for that:


CREATE ROLE usp_execute
GO

GRANT EXECUTE ON SCHEMA::dbo TO usp_execute
GO

This means that the user will be able to execute EVERY stored proc belonging to the schema dbo from this point forward; again, be cautious when using this.  Security models should not be taken lightly.

Speaking today: PASS AppDev Virtual Chapter

I know it’s short notice, but to be honest, I totally forgot about this until a couple of weeks ago.  I’ll be presenting today at noon eastern on a LiveMeeting for the Application Developers Virtual Chapter of PASS.  Deets below:

“You Got XML In My Database? What’s Up With That?”
May 11th 12:00 PM EDT (GMT -4)
Add to Calendar
Presenter: Stuart Ainsworth

A brief presentation exploring the marriage of XML and relational databases, including when it works and when it doesn’t. Coverage will include various use case scenarios, and some tips on how to improve performance using design techniques.

Stuart Ainsworth

Stuart R Ainsworth, MA, MEd is a Database Architect working in the realm of Financial Information Security; over the last 15 years, he’s worked as a Research Analyst, a report writer, a DBA, a programmer, and a public speaking professor. He’s one of the chapter leaders for AtlantaMDF, the Atlanta chapter of PASS. A master of air guitar, he has yet to understand the point of Rock Band (“You push buttons? What’s that all about?”).

How do I view the presentation?
Attendee URL:  Live Meeting link

#TSQL2sDay 003: Maslow and relational design

Rob Farley is hosting the third installment of TSQL Tuesday, and it’s a fun one: relationships (in honor of Valentine’s Day).   While I’m not currently in much of a mood to opine on the virtues of love and databases, I did think I wanted to post something a bit more esoteric this time.  Not many of you may know that I don’t have a formal background in Information Technology (some of my more sarcastic friends just held their tongues at that one); I actually have a Master of Arts in Communication, and a Master’s of Education in Instructional Technology.  I tripped into IT when I failed my comprehensive exams for the doctoral program in Speech Communication at the University of Georgia.  Awful time, but ultimately one of the best things to ever happen to me.

Anyway, why is this relevant?  Because the goal of this post is to attempt to extend one of the more famous models of social psychology and communication to database design; bear with me (I’m assuming that many of you either have no background in social psych or slept through it), but I’m hoping that this extension to the metaphor will benefit you in terms of your application design.

Maslow: the crash course.

The following is a BRIEF introduction to the theory; if you want more details, Wikipedia is your friend. In a nutshell, Abraham Maslow proposed that humans, as a social animal, were driven to fulfill certain basic needs in a quest for self-actualization or enlightenment.  He proposed a pyramidic model of five sets (or stages) of these needs, with the four lowest ones being required to achieve before attempting the fifth; few people ever attain the fifth level, but the quest to reach that is part of our collective experience.  I’ve defined the five stages below:

maslows_hierarchy_of_needssvg Physiological:

The basic requirements for human existence; food, water, etc.

Safety:

This often translates into security, but it’s different than the term we use in information technology careers; safety is the ability to acquire and maintain goods for ongoing existence.  The Physiological elements are immediate needs; Safety elements are the ability to fulfill those immediate needs at a future date.

Social:

Where do we belong?  How do we interact with others who need us (and we need)?  What is our role, and how does that affect our definition of the self?

Esteem:

Esteem stems from the social need; once our relationship with others has been established, we can truly begin to define ourselves and the virtue of our importance in the world.

Self-Actualization:

Self-actualization is the ultimate fulfillment of one’s potential; to be what one is, without need for constant reinforcement from other beings, yet able to exist in harmony with purpose.  Few people have ever attained this stage, but as stated before, the quest to reach the top of the pyramid drives human development.

So what does this mean to the database designer?

Why is all of this important?  This is not a perfect analogy, but if we extend Maslow’s model to the area of database design, some interesting questions arise (particularly in the third and fourth stages, which is why I felt like this point would be relevant to the TSQL Tuesday challenge of relationships).  Let’s take each stage, and step through them again.

Physiological:

While applications don’t have physiological needs, they DO require certain basic elements for long term survival.  Questions to consider at this stage are things like: How much space will I need?  What are the server requirements?  Can my database live in cloud or a mobile device?   What sort of I/O concerns do I have?

Safety:

Recall that safety is NOT security (in terms of who has access to the data), but it is security in terms of long-term survival of the application.  Is the database you’re designing intended for a long-term project, or is it “throw-away” code?  Have you designed it in such a way so that it’s easy to replace without impacting the dependent application?

Social:

Speaking of dependent applications (and herein lies the relationship aspect of this post), is your database application designed so that it is loosely related and decoupled from the application?  Does the database fulfill the needed role within the relationship (data storage), without treading too far into business logic?  Can the database handle multiple relationships with various applications (UI/reporting/business services).

Esteem:

Closely related to the social nature of the database within the application stack is the need for self-esteem within the database; can the database meet the the needs of the dependent applications WHILE retaining enough information to establish new relationships?  A classic example of this is the lookup table; a database with low self-esteem will only store the enumerated values provided to it by some other application. 

Without the enabling application, the database lacks sufficient internal definition to validate meaning; in practical terms, this means that the database is not decoupled from the application enough to enable the development of alternate accessing applications.  For example, my day job is to reverse engineer vendor databases; few things in the world are more disturbing than a table full of numbers without any sort of category associated with that number.  The application designer decided to store that enumeration in the application; security through obfuscation IS a method of securing your database, but not a very effective one.

A high-self esteem database will store all of the appropriate lookup values (complete with constraints) in order to provide complete validity within the structure.  The database can then be reused by several different applications, without requiring a complete set of business rules to determine those relationships.    The data layer is definitional; the business layer should be procedural.

hal[1] Self-Actualization:

I have to admit that discussing self-actualization in regards to application design makes me think of HAL.  “I’m sorry, Dave….”

To try and stay on track with this metaphor, self-actualization is the basic premise of BI; when your database can start providing you knowledge instead of just data, it has attained the highest level of potential.  Few apps make it that far without requiring substantial redesign, but the ones that do are invaluable to the enterprise they support.

So where are we?

Dunno.  I hope this little exercise made your brain hurt just a bit, and opened up a new metaphor for understanding database design issues within the application stack.   If you have more questions than answers, that’s a good place to be.

#TSQL2sDay 001: Date/Time Issues and Data Modeling

This is my first post for T-SQL Tuesday (thanks, Adam, for the idea!); I’m hoping this will be a continuing block party, and hopefully connect a lot of us SQL bloggers together.  There’s a lot of great content out there, and sometimes it’s just tough to find.

Just a few notes before I get started on this; for this particular post, I’m pulling from my experiences as a data architect for the last few years.  I spend a lot of time reverse-engineering other databases in order to centralize and analyze data from a variety of sources.  Most of my efforts pre-date SQL 2008, so I don’t have a lot of experience with the newer datetime data types introduced in that version; however, I’ll point you in that direction if I think it might address an issue that I raise.  Also, there’s no particular order to my list of notes; think of it as a random collection of date-time issues.  Some of them may be specific to T-SQL and SQL Server, and others may just be procedural notes about data integration (and thus may be applicable to other platforms).

Ready?  Let’s get started.

1.  If you are a database professional that works with SQL Server, strike the word “timestamp” from your vocabulary.

Way back when SQL Server was in its infancy, some developer decided that timestamp would be a good name for a binary column that changes whenever the row was updated.  I don’t know the origins of the timestamp datatype; all I know is that it has nothing to do with date or time.  So why am I including it in a post about Date and Times?  Because for most speakers of the English Language, a timestamp is a marker that indicates when an event happens.  Many junior SQL Server developers and even senior developers on other platforms will often be confused by the timestamp datatype, and wonder what’s going on with that column that they just added.

My recommendation?  Avoid using the word timestamp in design discussions.   When someone mentions “we need to add a timestamp to that row”, make sure that you understand what they mean.  Usually they want you to add a column to capture what datetime the event happened, NOT an indicator of when the row will change.  Note that I’m suggesting that you avoid the WORD timestamp; obviously, if a database requires a timestamp column, use it.

2.  Does anybody really know what time it is?

One of the benefits of a distributed architecture is that your database server doesn’t have to live on the same box as your application server; in fact, your database server may not be in the same timezone as your application server.  And, if you’re using a webserver as an application host, your user may not be in the same timezone as either your application or your database.

I mention this because as a database designer, you may need to capture multiple points-in-time (or you may not); you cannot assume that the application is consistently providing the same time reference for all users.  If you’re using a truly distributed design, you may have one host in New York and one in Los Angeles, and your users may connect to either one to enter in data; what date do you use?

For this reason, I usually advocate two things: 1) always include a database-centric referential date of event (in other words, include a column with a default value of the database time for important data), and 2) always use UTC for saving time.  That second statement’s important enough to warrant a note of its own.

3.  When in doubt, use UTC.

Time is relative to geography; the application you build today in New York may eventually have clients in New Delhi.  If your processes could ever have the potential of crossing multiple time zones (for example, you want to track shipping from Atlanta to Oregon), you’d get the most benefit out of storing your data as UTC in order to understand the relationship between event A (which occurred in EST) and event B (in PST).  It’s a lot easier to change the display of events to a local time from UTC than it is to convert from local time to UTC, do the math, and then convert back.

One problem with recording data with UTC times that I’ve encountered is that is you have centralized data, but distributed users that want to see their data in local time, building OLAP cubes is a pain.  If you’re wanting to analyze data for Atlanta and data for Oregon, Atlanta’s day begins 3 hours earlier than Oregon’s.  I have not found an easy solution to this yet, but am starting to investigate the new datetimeoffset data type in SQL 2008.

4.  Use a standard notation for calculating differences between dates.

Obviously the new date and time datatypes in SQL 2008 address a desire of many in the development community for a method of storing ONLY the date or time of an event, but if you’re using an application that still stores the datetime of an event and you need to round the data to the nearest day, hour, week, whatever unit of time, then use  a standard method for doing so.

(I am assuming that you are using date-based arithmetic rather than converting to a varchar and back).

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)

is the preferred method; you can read why at my previous post.

5.  Use a standard notation for referring to dates.

Americans are weird about dates; we’re one of the few countries that puts the middle container before the smallest container followed by the largest container.  Clueless?  When people ask you what’s your birthdate, how do you answer?  Most people rattle off Month-Day-Year.   An international notation would be Year, Month, Day.

Again, if you’re designing databases and queries that might cross international borders, you’ll save yourself some debugging time in the future if you can get into the habit of using an international standard notation for referring to datetimes.   A query that asks for data WHERE DateOfEvent > ‘11/1/2009’ may work fine in the US (returning data newer than November 1, 2009), but may blow up in the UK (returning data greater than January 11, 2009).

How do you know if you’re a DBA that’s been bitten by this?  You start dating everything (including personal checks) using a notation like ‘2009-11-01’.  I  get odd looks all the time (probably not just because of the way I date my checks).

6.  Finally, get used to the fact that there’s lots of methods to store datetimes.

If you’re doing any sort of database integration, you’re probably familiar with many of the same issues I laid out.  There are others that I haven’t covered, but the important thing I want to focus on is that just because I think these are good habits to get into, others have different opinions.  Some db pro’s prefer to use epoch time rather than a standard datetime; I not only have to be familiar with my own methods of doing something (and prepared to defend those methods), but I also have to be willing to understand other methodologies so that I can do the conversion.

Good Habits To Adopt: Enforcing the natural primary key

I’ve been reading Aaron Bertrand’s great series of blog posts on bad habits to kick, and have been thinking to myself: what are some good habits that SQL Server developers should implement?    I spend most of my day griping about bad design from vendors, yet I hardly ever take the time to document what should be done instead.  This post is my first attempt to do so, and it’s based on the following assumptions:

  • Good habits are going to be a lot more controversial than bad habits, and
  • SQL Server doesn’t enforce many of these good habits for you.

The first point refers to the fact that some of the choices that I make are not necessarily the best way to do things, and they may not satisfy the need of every application.  I’m a firm believer that there is an exception to every rule, but my goal is to at least define what the rules are (and again, these rules are my own creation and someone may have better rules).  The second point refers to the fact that SQL Server enforces the rules of SQL, but leaves some of that enforcement open to interpretation.  For example, the relational model defined by SQL assumes that tables are related, but SQL Server doesn’t require that you define a FOREIGN KEY (or even a PRIMARY KEY).

So here’s my first good habit:

When defining a surrogate primary key for a table, you should enforce the natural primary key with the use of a UNIQUE constraint.

To really understand this, you have to start with defining what a surrogate primary key is versus a natural primary key.  You can search for a variety of definitions, but I’ll use the following:

  • Primary Key: a non-nullable attribute (or combination of attributes) that can be used to uniquely identify a specific instance of an entity.  When used within SQL, a primary key can be mapped to a column (or columns) in a table, and the value of the key uniquely identifies a row.
  • Natural Primary Key: a primary key that is not auto-generated by the database or application.  The key is comprised of attributes that are associated with an entity, and the value of those attributes is defined by some authority beyond the scope of the database or application.  For example, a Social Security number is a “arbitrarily” assigned number that belongs to a specific citizen of the United States; most databases that use the Social Security number do not create the number, but rather use it as a reference to a particular US citizen.
  • Surrogate Primary Key: a primary key that is auto-generated by the database or application to specifically identify the row in the table representing the collection of entities.  Surrogate keys have no meaning outside of the database and have no relationship to the other attributes in the table.  An ID of 1 simply identifies a row in a table; a row representing a person, a squid, or an automobile may all have an id of 1, depending on what table the surrogate key the data lives in.

Sidebar: as I was writing this, Pinal Dave post the following to his blog: http://blog.sqlauthority.com/2009/10/22/sql-server-difference-between-candidate-keys-and-primary-key-2/ 

Most novices recognize that every table needs a primary key, and surrogate keys offer some benefits that natural keys do not, including:

  • Immutability: the ability of a key to stay constant over time.  A natural primary key (such as a person’s name) may change, but a surrogate key does not.
  • Simplicity of relational JOINS: surrogate keys can remain as a singular column for each table they represent.  For example, a complete invoice may need to be represented by a ClientID, an InvoiceID, and the LineID’s for the lines on that invoice.  Joining on the natural keys may require the Client Name and Address, the Invoice Number, and the Line Number. 

However, surrogate keys have one major weakness; they do NOT enforce the unique validity of each row.  If you use an IDENTITY function in SQL Server to auto-generate your surrogate PRIMARY KEY, and you insert Stuart Ainsworth into your table of Employees, and you accidentally run your INSERT script again, you’ve just double-inserted Stuart Ainsworth.  While there are certainly multiple people with my name, I’m the only one at my company.  However, my application never noticed it.

Using a UNIQUE CONSTRAINT on the columns holding the natural key information avoids this problem; you get the benefits of a surrogate key AND the unique validation of a natural primary key.   The hard part is, of course, identifying the appropriate natural primary key to enforce.  However, this exercise should NOT be overlooked when designing a database.

<facedesk>

So, I’ve been answering questions on StackOverflow for two days now, and I still like the experience.  However, I have noticed a disturbing trend in several of the questions being asked that involve SQL Server.  Here’s an example of one; I’ll give you the question, and my answer, and then tell you what’s bugging me.

TSQL Generate 5 character length string, all digits [0-9] that doesn’t already exist in database

What’s the best way to do this?

I need to generate a 5 digit length string where all the characters are numeric. However, I need to be able to do this ‘x’ amount of times (user variable) and store this random strings in a database. Furthermore, I can’t generate the same string twice. Old strings will be removed after 6 months.

Pseudo-code

DECLARE @intIterator INT,
@intMax

SET @intIterator = 1
SET @intMax = 5 (number of strings to generate)

WHILE @intIterator <= @intMax
BEGIN

  -- GENERATE RANDOM STRING OF 5 NUMERIC DIGITS
   ???

  -- INSERT INTO DB IF DOESN'T ALREADY EXIST
  INSERT INTO TSTRINGS
  SELECT @RANDOMSTRING

  IF @@ERROR = 0
    SET @intIterator = @intIterator + 1

END

I know this probably isn’t the best way to do it, so advice is appreciated. But really looking for ideas on how to generate the numeric 5 length strings.

 

I think this is a great question; the guy’s obviously looking for a list of randomly generated five-character strings comprised entirely of digits; this means there are only 100,000 possibilities.  This question is tagged as SQL and TSQL, which means that he’s looking for a database-driven solution (I’m aware that there are several ways to skin this cat; I’m just showing you that he’s specifically looking for a SQL solution).  Note, however, that he’s using a loop to create, validate, and insert 1 row at a time. 

Here’s where a lot of reading lately on numbers tables comes into play; I think about this a bit, and post the following solution:

 

Here’s a set-based approach, using SQL 2005 syntax (would be a little easier with SQL 2008, but you didn’t specify). Also, if you had a numbers table, a large chunk of it can be cut out.

No looping, no duplicates, and should be nearly instantaneous (assuming the target column is indexed).

DECLARE @intMax INTEGER
SET @intMax = 5
INSERT  INTO TSTRINGS
        SELECT  q.nString
        FROM    ( SELECT    ROW_NUMBER() OVER ( ORDER BY ( NEWID() ) ) AS N2 ,
                            RIGHT(REPLICATE(‘0’, 5) + CONVERT(VARCHAR(5), N), 5) AS nString
                  FROM      –the subquery below could be replaced by a numbers table   
                            ( SELECT TOP 100000
                                        ROW_NUMBER() OVER ( ORDER BY ( ac1.Object_ID ) ) – 1 AS N
                              FROM      Master.sys.columns ac1
                                        CROSS JOIN Master.sys.columns ac2
                                        CROSS JOIN Master.sys.columns ac3
                            ) numbers
                  WHERE     RIGHT(REPLICATE(‘0’, 5) + CONVERT(VARCHAR(5), N), 5) NOT IN ( SELECT    nString
                                                                                          FROM      TSTRINGS ) –check to see if reused
                ) q
        WHERE   q.N2 <= @intMax

A nice, tidy, set-based solution that I’ll probably keep in mind for generating passwords in the future (with a little work).  So, what’s bugging me?  Is it the fact that I have 0 votes for this solution?  No, that’s not really it, because votes are kind of whimsical; some times you get them; sometimes you don’t.

No, what bugs me, is that out of the 6 answers posted, 3 of them involve CURSORS.  For a SQL question.   I know.  Shock and horror.

Seriously, though; why do people still use cursors?  I know that there are times when cursors are an appropriate tool, but you shouldn’t use them everyday.  It’s kind of like buying those specialized drill bits for removing a stripped screw; for that specific application, they work great.  If you leave them in your drill, and you try to use them in an everyday situation; guess what?  You’ve stripped out a screw.

Let me be as clear as I can:

CURSORS SUCK, PEOPLE.  HEAR ME?  THEY SUCK.

There you go; I feel better now.

Building ranges using numbers

A while back, I posted a solution to a problem I was facing where I needed to generate a table of ranges.  While I haven’t experienced any sort of issues with the solution, a couple of things have recently occurred which has caused me to go back and take a look at the problem, and see if I can come up with a more elegant solution.

The first was a comment by Jeff Moden on the original post; Jeff suggested that recursion may not be the best choice for a solution given the relatively large number of logical reads compared to his solution and/or a permanent Numbers table.  Jeff posted a solution, but his solution only addressed the first part of the article (building a dynamic numbers table) and NOT the second part (coming up with a table of ranges).  The problem that I was struggling with using a table of numbers was the dynamic span between rows; in other words, I couldn’t figure out how to force row 1 to start with 10, and row 2 start with 215

The second driving factor was the FizzBuzz solution I proffered yesterday; while I was reviewing that code, I had a glimmer of an idea about how to handle the dynamic spanning using MODULO.  I reworked the solution using Jeff’s suggestion, as well as using a permanent numbers table.  My examples are below:

Original code:

/*A dynamically generated table of ranges.
2009-01-25 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Stefan Keir Gordon for the inspiration
 http://www.keirgordon.com */


DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000

/*@MaxRanges must be <= 32767;
  if the range between Start and End is large,
  and @MaxRanges > 32767, the End point won't be reached.*/


SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))

;WITH Ranges(MinValue) AS
  (SELECT @Start AS MinValue
   UNION ALL
   SELECT (Minvalue + @MinChunk) AS MinValue
   FROM Ranges WHERE MinValue < @End)
SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM Ranges
WHERE MinValue < @End
OPTION(MAXRECURSION 32767)

Modification based on Jeff’s suggestion:

/*A dynamically generated table of ranges. part 2
2009-09-27 Stuart R Ainsworth
    http://codegumbo.com/

Thanks to Jeff Moden for the inspiration*/

DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000
SELECT @MaxRanges=100, @MinChunk=2

IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
    SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))


SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
                 THEN MinValue + @MinChunk
                 ELSE @End End
FROM (    SELECT 
            ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) -1 + @Start AS MinValue
        FROM Master.sys.columns ac1
        CROSS JOIN Master.sys.columns ac2) x
WHERE (MinValue = @Start
    OR (Minvalue-@Start) % @MinChunk = 0)
    AND MinValue + @MinChunk <= @End + @MinChunk

I also used a permanent table of numbers (per Jeff’s suggestion), but given the similarity to Jeff’s modification, I won’t post the code here.  Simply replace the subquery with a numbers table (you’ll need to monkey with the math to get the dynamic starting point).  What was interesting to me was the output for each of the three solutions when using SET STATISTICS IO and TIME:

Original code:

Table 'Worktable'. Scan count 2, logical reads 607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 16 ms,  elapsed time = 88 ms.

Modification (dynamic table):

Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 187 ms,  elapsed time = 203 ms.

Modification (permanent table):

Table 'NUMBERS'. Scan count 1, logical reads 424, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 125 ms,  elapsed time = 156 ms.

At first glance, it would appear that Jeff’s assertion is correct; the number of logical reads is greatly reduced by his suggested modification (more so by the dynamic table than the permanent table, interestingly enough).  However, the CPU time is increased.  And, while I know that execution plans are no guarantee for performance, in every situation SQL Server estimated that avoiding the original code was 0% of the three plans (dynamic modification 65% and permanent numbers table 35%).

So now, I’m stuck.  I can see Jeff’s point, but I’m not sure that a solution based on his suggestion is that much better than my original solution; do I worry more about I/O or CPU?

FizzBuzz

Stumbled across Mike Hillwig’s post in response to a question posed by Brent Ozar.  Depending on how literally you want to use the word “print” (do you mean the actual command, or do you just want output?), I think I have a slightly more elegant solution than what Mike proposed.

SELECT CASE WHEN n%3=0 AND n%5=0 THEN 'FizzBuzz' 
            WHEN n % 3 = 0 THEN 'Fizz'
            WHEN n % 5 = 0 THEN 'Buzz'
            ELSE CONVERT(varchar(8), n) END as "N"
FROM (    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (ac1.Object_ID)) AS "N"
    FROM Master.sys.All_Columns ac1) x

Basically, it avoids the WHILE loop; however, if you must use a PRINT command, then Mike’s suggestion is probably as elegant as I can imagine.  What’s interesting is that this may also provide me a better solution to my need to build a dynamic range that I posted a while back.  I’ll have to think about it a bit more, but I think it’s possible.