We use a queueing metaphor for dataloads for the majority of our imports, and it’s been a challenge to figure out how to generate dynamic ranges to add to the queue. In other words, we want to tell our system to go and pcik data from a source that has an id value >x and <=y, and then do it again in a few minutes with a new x and y value. I should say that it’s been a challenge because I’m still not used to SQL 2005 features; I came across a bit of code using a CTE and recursion to generate a numbers table (thanks to Stefan Keir Gordon). Here’s a slightly modified version of his code:
/*thanks to Stefan Keir Gordon for the inspiration
http://www.keirgordon.com*/
SET STATISTICS IO ON
;WITH Numbers(n) AS
(SELECT 1 AS n
UNION ALL
SELECT (n + 1) AS n
FROM Numbers
WHERE n <= 32767)
SELECT n
FROM Numbers
WHERE n<=32767
OPTION(MAXRECURSION 32767)
In case you’re wondering, 32767 is the maximum number of recursions supported by SQL Server in a CTE; Stefan’s code neatly generates a single column table with values from 1 to 32767. Looking at that, I realized I could easily get a simple range by adding a second column to the final SELECT, like so:
;WITH Ranges(MinValue) AS
(SELECT 1 AS MinValue
UNION ALL
SELECT (Minvalue + 1) AS MinValue
FROM Ranges WHERE MinValue <= 32767)
SELECT MinValue, MinValue + 1 as MaxValue
FROM Ranges
WHERE MinValue <= 32767
OPTION(MAXRECURSION 32767)
I end up with a two-column table with a MaxValue > MinValue, and the ranges increase by 1. But I had some issues to address:
- I needed to be able to pass in Start and End values to this code.
- I needed chunks bigger than 1 value in size; I also had a minimum size I wanted to use.
- In some cases 32767 ranges would be appropriate; in some cases, I needed less.
Here’s the final code; setting the Start and End valeus for the range of ranges is easy to understand, but the other two values may be a bit confusing. The first is @MaxRanges, which simply says “I want no more than this number of ranges to get from the Start to the End”. The maximum value for this number is 32767. @MinChunk refers to the minumum chunk size, or the space between MinValue and MaxValue for a given row.
It may be easier to just play with the values and see; you should never have more rows than @MaxValue, and the difference between MinValue and MaxValue for a given row should always be greater than or equal to @MinChunk:
/*A dynamically generated table of ranges.
2009-01-25 Stuart R Ainsworth
http://codegumbo.com/
Thanks to Stefan Keir Gordon for the inspiration
http://www.keirgordon.com */
SET STATISTICS IO ON
DECLARE @MaxRanges int, @MinChunk int, @Start int, @End int
SELECT @Start=10, @End=200000
/*@MaxRanges must be <= 32767;
if the range between Start and End is large,
and @MaxRanges > 32767, the End point won't be reached.*/
SELECT @MaxRanges=1000, @MinChunk=205
IF (SELECT CEILING((@End-@Start)/(@MaxRanges*1.00)))>@MinChunk
SET @MinChunk = CEILING((@End-@Start)/(@MaxRanges*1.00))
;WITH Ranges(MinValue) AS
(SELECT @Start AS MinValue
UNION ALL
SELECT (Minvalue + @MinChunk) AS MinValue
FROM Ranges WHERE MinValue < @End)
SELECT MinValue, MaxValue = CASE WHEN MinValue + @MinChunk < @End
THEN MinValue + @MinChunk
ELSE @End End
FROM Ranges
WHERE MinValue < @End
OPTION(MAXRECURSION 32767)
Good luck; happy coding!