How many licks DOES it take…?

So, it’s been a while since I’ve posted, but I’ve finally managed to eke out some time after Summit 2010, and wanted to follow up on a conversation that Tim Mitchell, Arnie Rowland, and I had at the Friends of Red Gate dinner.  We were doing a SQL Server oriented trivia contest which asked the following question:

How many nonclustered indexes does SQL Server 2008 support on a single table?

And the answer is: 999, which is enough to cause most DBA’s to grumble under their breath abut the consequences of setting up that many indexes and what they would do to if they ever found that many indexes on a table.  Of course, being the amateur math geek that I am, I casually asked Tim and Arnie:

What’s the smallest table (in terms of amount of columns) that would support 999 indexes?

After knocking it around for a bit, we came up a estimate of 6, which actually isn’t too far off; however, our method of getting there was mostly intuitive, and I wanted to figure out the actual formula for calculating that number.  I knew it had to with factorials, but I wasn’t exactly sure how to get there.  After searching the internet, I finally figured out the following principles:

  • Column order matters when building indexes, so when choosing pairs from a set of columns, a set of ab <> ba.
  • The more columns on the table, the wider the indexes could be; choosing columns from a wider set would require iteration.  In other words, if you have 3 columns on a table, you would have 3 single-column indexes, 6 double-column indexes, and 6 triple-column indexes.

The formula that represents this is SUM(n!/(n-k)!), where n represents the number of columns in the table and k represents the number of columns in the index.  Plugging this into an spreadsheet, you get the following matrix:

    Number of Columns in Index (k)
  1 2 3 4 5 6 SUM
Number of Possible Columns (n) 1 1           1
2 2 2         4
3 3 6 6       15
4 4 12 24 24     64
5 5 20 60 120 120   325
6 6 30 120 360 720 720 1956

 

At first glance, we’re done; it looks like 6 was the right answer, because with only 6 columns in a table, you have a whopping 1,956 possible indexes to choose from.  However, there’s more to the story: SQL Server 2005 introduced the INCLUDE option to indexes, which throws a kink in the above formula. 

At first, I thought it was relatively simple; you had two subsets for each n, where the elements in each subset couldn’t be in the other one, but it’s a little more deceptive.  Here’s the principles for generating it:

  • For a set (n) of possible columns, there are two mutually exclusive subsets: the base (k) and the included columns (l).  The number of elements in the two subsets must be less than or equal to the number of elements in the master set.
  • Column order matters in the base columns, but not the included columns, so the formula above can work for a base set of columns, but iterating through the included columns requires only the unique set of elements.

And here’s the part where my brain exploded; I couldn’t figure out a way to mathematically demonstrate the two relationships, so I built a SQL script, iterating through a set of 5 columns; all in all I ended up with a listing of 845 possible combinations, which means that 6 still stands as the minimum number of columns on a table needed to generate the maximum number of nonclustered indexes.

The point to this story?  None, really.  Just a silly geek exercise.  However, I think it does point out that index strategy is a complex problem, and there are multiple ways to index any given table.  Choosing the right one is more difficult than it looks.


DECLARE @c TABLE ( NAME VARCHAR(100) ) ; INSERT  INTO @c
       
( NAME )
VALUES  ( 'a' ),
        (
'b' ),
        (
'c' ),
        (
'd' ),
        (
'e' )
      
SELECT  n = 1
     
, k = 1
     
, l = 0
     
, NAME
     
, INCLUDE = NULL
INTO    #tmp
FROM    @c
UNION ALL
SELECT  n = 2
     
, k = 2
     
, l = 0
     
, NAME = c1.NAME + ',' + c2.NAME
     
, INCLUDE = NULL
FROM    @c c1
       
CROSS JOIN @c c2
WHERE   c1.name <> c2.name
UNION ALL
SELECT  n = 2
     
, k = 1
     
, l = 1
     
, NAME = c1.NAME
     
, INCLUDE = c2.NAME
FROM    @c c1
       
CROSS JOIN @c c2
WHERE   c1.name <> c2.name
UNION ALL
SELECT  n = 3
     
, k = 3
     
, l = 0
     
, NAME = c1.NAME + ',' + c2.NAME + ',' + c3.NAME
     
, INCLUDE = NULL
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
WHERE   c1.name <> c2.name
       
AND c2.name <> c3.name
       
AND c1.name <> c3.name
UNION ALL
SELECT  n = 3
     
, k = 2
     
, l = 1
     
, NAME = c1.NAME + ',' + c2.name
     
, INCLUDE = c3.name
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
WHERE   c1.name <> c2.name
       
AND c2.name <> c3.name
       
AND c1.name <> c3.name
UNION ALL
SELECT  n = 3
     
, k = 1
     
, l = 2
     
, NAME = c1.NAME
     
, INCLUDE = c2.NAME + ',' + c3.name
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c2.name < c3.name
UNION ALL
SELECT  n = 4
     
, k = 4
     
, l = 0
     
, NAME = c1.NAME + ',' + c2.NAME + ',' + c3.NAME + ',' + c4.name
     
, INCLUDE = NULL
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
WHERE   c1.name <> c2.name
       
AND c2.name <> c3.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name <> c4.NAME
       
AND c3.name <> c4.name
UNION ALL
SELECT  n = 4
     
, k = 3
     
, l = 1
     
, NAME = c1.NAME + ',' + c2.NAME + ',' + c3.NAME
     
, INCLUDE = c4.name
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name <> c3.name
       
AND c2.name <> c4.NAME
       
AND c3.name <> c4.name
UNION ALL
SELECT  n = 4
     
, k = 2
     
, l = 2
     
, NAME = c1.NAME + ',' + c2.NAME
     
, INCLUDE = c3.name + ',' + c4.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name <> c3.name
       
AND c2.name <> c4.NAME
       
AND c3.name < c4.name
UNION ALL
SELECT  n = 4
     
, k = 1
     
, l = 3
     
, NAME = c1.NAME
     
, INCLUDE = c2.name + ',' + c3.NAME + ',' + c4.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name < c3.name
       
AND c2.name < c4.NAME
       
AND c3.name < c4.name
UNION ALL
SELECT  n = 5
     
, k = 5
     
, l = 0
     
, NAME = c1.NAME + ',' + c2.name + ',' + c3.NAME + ',' + c4.NAME + ',' + c5.NAME
     
, INCLUDE = NULL
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
       
CROSS JOIN @c c5
WHERE   c1.name <> c2.name
       
AND c2.name <> c3.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name <> c4.NAME
       
AND c3.name <> c4.name
       
AND c1.name <> c5.NAME
       
AND c2.name <> c5.NAME
       
AND c3.name <> c5.name
       
AND c4.name <> c5.name
UNION ALL
SELECT  n = 5
     
, k = 4
     
, l = 1
     
, NAME = c1.NAME + ',' + c4.name + ',' + c3.NAME + ',' + c2.NAME
     
, INCLUDE = c5.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
       
CROSS JOIN @c c5
WHERE   c1.name <> c2.name
       
AND c2.name <> c3.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c2.name <> c4.NAME
       
AND c3.name <> c4.name
       
AND c1.name <> c5.NAME
       
AND c2.name <> c5.NAME
       
AND c3.name <> c5.name
       
AND c4.name <> c5.name
UNION ALL
SELECT  n = 5
     
, k = 3
     
, l = 2
     
, NAME = c1.NAME + ',' + c2.name + ',' + c3.NAME
     
, INCLUDE = c4.NAME + ',' + c5.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
       
CROSS JOIN @c c5
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c1.name <> c5.NAME
       
AND c2.name <> c3.name
       
AND c2.name <> c4.NAME
       
AND c2.name <> c5.NAME
       
AND c3.name <> c4.name
       
AND c3.name <> c5.name
       
AND c4.name < c5.name
UNION ALL
SELECT  n = 5
     
, k = 2
     
, l = 3
     
, NAME = c1.NAME + ',' + c2.name
     
, INCLUDE = c3.NAME + ',' + c4.NAME + ',' + c5.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
       
CROSS JOIN @c c5
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c1.name <> c5.NAME
       
AND c2.name <> c3.name
       
AND c2.name <> c4.NAME
       
AND c2.name <> c5.NAME
       
AND c3.name < c4.name
       
AND c3.name < c5.name
       
AND c4.name < c5.name
UNION ALL
SELECT  n = 5
     
, k = 1
     
, l = 4
     
, NAME = c1.NAME
     
, INCLUDE = c2.name + ',' + c3.NAME + ',' + c4.NAME + ',' + c5.NAME
FROM    @c c1
       
CROSS JOIN @c c2
       
CROSS JOIN @c c3
       
CROSS JOIN @c c4
       
CROSS JOIN @c c5
WHERE   c1.name <> c2.name
       
AND c1.name <> c3.name
       
AND c1.name <> c4.NAME
       
AND c1.name <> c5.NAME
       
AND c2.name < c3.name
       
AND c2.name < c4.NAME
       
AND c2.name < c5.NAME
       
AND c3.name < c4.name
       
AND c3.name < c5.name
       
AND c4.name < c5.name SELECT n, COUNT(*)
FROM #tmp
GROUP BY n
ORDER BY n DROP TABLE #tmp   

Share