SQL Server

Partitioning computed columns

For a project at work, I’ve been asked to help optimize our ETL process.  Part of that involves taking an existing table, and partitioning it along two existing columns.  Since the partitioning process only supports partitioning on one column, the obvious choice is to use a computed column.  I ran into some issues along the way, and thought I would share them here.

First, when partitioning a table on a computed column, you must persist that column.  Persistence is a new feature for SQL Server, introduced in SQL Server 2005 (and still available in 2008).  If you’re familiar with the concept of a computed column in SQL 2000, the basic concept is the same; however, persisted computed columns are columns that are physically stored in the database (the engine updates the values whenever values change in the source column) as opposed to virtual columns.

A persisted computed column is defined like so:

CREATE TABLE Sample (DateID int, TimeID int);

INSERT INTO Sample (DateID, TimeID)
VALUES (1,1);

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, '20090101'))
PERSISTED;

Of course, if you just ran that sample, you got the following error message:

(1 row(s) affected)
Msg 4936, Level 16, State 1, Line 6
Computed column ‘DateIDTimeID’ in table ‘SAMPLE’ cannot be persisted because the column is non-deterministic.

This was the first challenge; although the DATEADD function is always deterministic, the problem lies with the character representation of the date (e.g., ‘20090101’).  This behavior is different when running the code on a SQL 2005 server under the SQL 2000 compatibility mode (80) than when running it at level 90 or above; frankly, I was surprised that the code even compiled under SQL 2000 compatibility mode, since persistence of a computed column was not introduced until SQL 2005.

The way that I determined it was the character representation was I checked out Microsoft Connect (http://connect.microsoft.com), which listed the following issue:

Persistent Computed column using DATEADD | Microsoft Connect

From the response, let me quote:

Hi
The reason for the behavior you observe is because the conversion from string to datetime is not deterministic – it will depend on date format settings.
A work around may be to specify the date constant as a bigint (convert your date to bigint).
The behavior with the string constant is by design.
– Christian Kleinerman

Following Christian’s advice, I sought to do what he suggested, only the programmer in me couldn’t distinguish between CONVERT (the T-SQL function) and convert (the English command); in other words, I tried this:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, CONVERT(bigint, '20090101')))
PERSISTED;

Which, of course, gives a whole different error message:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type datetime.

The statement has been terminated.

Stepping out of programming mode for a minute, I realized that Christian meant that I should replace the string constant I was using to represent the date of January 1, 2009 with the bigint (the number of days since January 1, 1753).  I ran the following:

SELECT CONVERT(bigint, CONVERT(datetime, '20090101'))

Which gave me the result of 39812, which I then plugged into my ALTER TABLE statement, like so:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, 39812))
PERSISTED;

SELECT *
FROM Sample 

And the results were much closer to my liking:

DateID TimeID DateIDTimeID
1 1 2009-01-02 00:01:00.000

 

Of course, I now had to deal with the fact that when I had designed my date and time lookup tables, I had used a seed of 1 instead of 0, so now my results were off by a day and a minute, but simple math could adjust for that (future design tip; always use 0-based keys OR use a smart value for the date, like 20090101).  Anyway, partitioning now works, and I can alter the table as needed.

Quick Tip: TempDB on a multi-core SQL Server box

Just learned this today from one of the Microsoft support reps reviewing our SQL Server 2008 box for performance optimizations: if you are running SQL Server on a multi-core box, you should set up tempdb to have as many data files as there are physical CPU’s (up to a reasonable limit).  These files should also be equally sized.  In other words, if you need a 6GB tempdb, and you have 2 cores, you’d want to have 2 3GB files.  If you have 16 cores, you can probably limit it to 8 files of 750 MB.  In a 32 core box, you may benefit from additional files, but you probably want to test to see.

The reason for this (as it was explained to me) is that on a high-transaction server, there is the possibility that the file I/O associated with the creation and destruction of temporary objects may cause blocking on parallel CPU operations. For example, CPU0 may run a stored procedure that uses temporary tables; CPU1 may also run that same procedure simultaneously.  If there is a single tempdb data file, the I/O associated with the creation of the local temp object with the stored procedure being executed on CPU0 may block the creation of the files associated with CPU1, even though the tables themselves are locally scoped

Sizing the files equally also assists the optimizer, as space availability per file is used in considering where to store the local objects.   If you have 1 file that is extremely larger than the second file, most of your objects will be created in the larger file, which could also contribute to blocking concerns.  In short:

  • Create the same number of tempdb data files as there are physical processors on the server (up to 8), and
  • Make sure that the data files are equivalently sized.

NOTE: Immediately after posting this, I read the following from Kevin Kline.  Please note that I am not talking about multiple log files, but rather multiple data files.  http://sqlblog.com/blogs/kevin_kline/archive/2009/06/23/old-performance-tuning-recommendations-die-hard.aspx

NOTE2: @kbriankelley tweeted me shortly after this post, and recommended the following post: http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx

Apparently, there may be some disagreement about the requirement for the 1-to-1 ratio of data files to cores.   Paul Randall indicates that it may be a 1-4 or 1-2 ratio. I don’t know; I had a Microsoftie in the office telling our DBA team that we needed at least 8-10 data files for our 16 core box (and he did specify the 1-1 ratio) .  As a database developer, I don’t think about file I/O enough to argue with him.  I do think that this is a tip worth investigating and testing, however, which is the reason I posted it. 

SQL Server 2008 Upgrade block: SQL Server Service Health Check

I’m posting this because I hope it helps someone else avoid losing several hours of work like I did yesterday.  Even though I’m a developer, I’m also the only DBA in our development group, and I was tasked with upgrading one of our dev boxes from SQL 2005 to SQL 2008. Install started OK, but got stuck during the Upgrade Rules section with the following error message:

“The SQL Server service cannot be restarted; or for a clustered instance, the SQL Server resource is not online.”

Googling the error message (or any combination of the message with “SQL Server upgrade” or “Health”) was useless.  I did find one persistent poster (BobC), who posted several versions of this issue to multiple forums until he got a lead.  If you google the rule name (Engine_SqlEngineHealthCheck), you get the following page from connect:

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=460288

which states that

Thank you for your report. We believe the issue you are experiencing is a result of SQL Server Upgrade requiring full Admin rights on the server, including “Debug programs” security policy.
Check to see if the server has removed the local administrator from “Debug programs” security policy. If so, adding Administrators back to the Debug Programs Group Policy should allow the upgrade to succeed.

In my case, I was attempting to do the install as a local administrator on the box, but apparently a group policy (I’m unsure if it’s the default or one of our company’s) ONLY allows Domain Administrators to have the ability to “Debug Programs”.    I had to ask for help from our network admin, who was able to grant my login FULL administrative rights on the box, and the upgrade continued.

SCRUM, Source Control, and the SQL Server Developer (Part 2)

So my last post focused on the modifications my shop has made to our implementation of Scrum without a lot of details about how we manage our code changes.  This post is intended to explain how we set up source control to support the scrum process.

Source control is one of those poorly-defined practices; it’s something we know that we should be doing as a developer, but it’s not always done as a DBA.  In fact, I would guess that many shops that have DBA’s doing most of their reporting and data integration tasks don’t practice source control methods at all; if they do, it’s typically in the form of scripting out the entire database or relying on differential backups.  Separating the data structure from the data itself is often difficult to do, so I am sure that many teams don’t do it at all.

We’re currently using Visual Studio for Team Systems with Team Foundation Server as our source control repository; it’s been a steep learning curve for those of us who came from a Visual SourceSafe background, and applying that knowledge to the unfamiliar context of a new UI (VSTS:DB Pro) and concept (source control for databases) has been more than challenging, particularly since we’ve adopted a new development method (Scrum) as well.  It’s take a lot of discussion (sometimes quite heated) to get where we are today, especially since there’s not a lot of “best practices” discussion for source controlling SQL Server out there.

The biggest challenge has been the concept of branching and merging; TFS recognizes that you may have multiple development lines going on, especially among multiple developers.  When do you split code out to work on it?  When do you put it back together?  How does this affect deployment of new features, vs. the release of patches to fix the currently deployed line of code?

For us, we decided to address our database source control thusly:

  1. We set up the initial databases in source control in our Main branch.  The Main branch is supposed to represent the heart of our code; it’s the final resting place of changes before they go to QA. 
  2. From Main, we branch off a line of code for each PBI (Product Backlog Item; see my first post) that we’re working on.  If a PBI spans multiple databases, we include a copy of each database under that PBI.  Our cookie trail in TFS looks something like this:
    • Main…Database1
    • Main…Database2
    • Development…PBI 1…Database1
    • Development…PBI 1…Database2
    • Development…PBI 2…Database1
  3. Once a PBI is ready to be deployed, we merge the databases for that PBI back to their original sources; if another developer merges in their changes which would cause conflict, those conflicts have to be resolved before we allow the final merge (this means that if Dev1 working on PBI 1 has some changes that would affect Dev2 working on PBI 2, those changes are discovered at the second merge).
  4. We also have a line of code that represents Productions deployment; at the beginning of each Sprint, we branch from the Main line into Production and then do a schema comparison with production to ensure that the Production line actually resembles Production.  In case of a needed patch, we patch the production line, deploy it, and then merge those changes back into the Main line.  If there are conflicts, we have to find them and resolve them.

There are some drawbacks; merging a patch into Main doesn’t always go smoothly, since we have to track down who is responsible for what changes.  Documentation in the procs helps, but we need to do a better job (especially when those changes involve objects that are not easily commented; like tables).  Furthermore, when it comes time to deploy from the Main branch, if QA decides that a feature isn’t ripe yet, then we have to do some legwork to back it out.  All in all, however, it works, despite the bumps along the way.

I’d be interested in hearing how others are doing source control for their database scripts; please feel free to comment below.

SQL Server & XML: links of interest

I’ve been working on a project recently that requires using XML in SQL Server, so I’ve been searching the web to find references. Books Online is unusually obtuse on this subject (the samples are sub-par for this complex subject), so I’ve been trying to find good examples of how it works. Here’s some of the better ones:

Passing lists to SQL Server 2005 with XML Parameters Jon Galloway gives a short and sweet example of shredding an XML variable into rows.

XQuery Inside SQL Server 2005 A longer set of examples using CROSS APPLY to associate XML data in a column to a value in a standard column. This is probably the most important query I needed, because we have a lot of XML values (config files) that are stored in a table; those config files are specific to particular clients.

SQL Server 2008 Books Online Here’s the basic data elements; again, the vocabulary is dense, but you at least need to know where it is.

Now, on to FLWOR (which I’m just beginning to study): Blooming FLWOR – An Introduction to the XQuery FLWOR Expression

Rube Goldberg would be proud…

This post is a bit of a rant; I spent the whole day trying to resolve a curious chain of events.  Basically, I wasted a day of development troubleshooting something stupid; hopefully, others will learn from my mistakes, and avoid the pain and suffering I experienced.

It all started with SQL Server 2008; we’re looking at migrating to it sometime this year, and I finally begged the production guys into letting this developer get a copy of it installed on my machine.  Seems to make sense, since I don’t know, I WRITE THE CODE THAT THEY RUN.  (sorry, a bit tense).  Anyhoo, I started the install.  All was going well, apart from the unusual hiccup with my Logitech web cam.  I eventually got it installed, started to play around with it, and then I noticed that I only had 1 GB of disk space free.

Sidebar: remember when 1GB was a HUGE amount of space?  Unfortunately, it ain’t much now, and my laptop is starting to suffer from too-much-crapitis.

Realizing that I was running out of space, I made the extremely stupid decision to uninstall SQL Server 2005 from my hard drive and free up a few gig.  SQL 2008 should be sufficient right?  I do very little development on my local box (we have dev servers for that), so I thought I would be safe.  How wrong I was.  The uninstall took about 30 minutes, a reboot later, and I get bugged by Dan about an issue with one of the database projects I checked in using VSTS:Database Professional.  I fire up Data Dude, and basically my laptop shot me the bird.  Apparently, VSTS REQUIRES an installation of SQL Server 2005 to run; SQL 2008 won’t do.

Sidebar: I realize that there is a new version of VSTS:Database Professional that doesn’t have this requirement, but until it’s official, I can’t have it (company policy).  Make it official, Microsoft, and you won’t get rants like these.  I’m warning you!  (insert mental picture of an ant shaking his fist at an elephant).

Since I had limited space to install, I chose to install SQL Server 2005 Express edition.  All went well, until the installer required a version of MSXML 6, and then it puked.  Why?  Because Windows XP Service Pack 3 installs a higher version of MSXML 6 then SQL Server 2005 does, and the SQL Server 2005 can’t write over the newer version, but nor will it accept the newer version as valid.  This is the computing equivalent of saying “I like cherry pie and I like whipped cream, but I don’t like cherry pie with whipped cream”.  I couldn’t install SSE 2005 until I uninstalled MSXML 6, and I couldn’t uninstall MSXML 6 without uninstalling Windows XP SP3.

Sidebar: Insert mental picture of ant and elephant again, only this time the ant is banging his head on the desk and crying while the elephant leaves a large steaming deposit nearby; the elephant’s not cruel, just oblivious to the ant’s peril.

Luckily, I found this post at support.microsoft.com, describing the use of the Windows Installer CleanUp utility to remove MSXML 6, without uninstalling XP SP3.  Dodged a bullet there, I thought to myself; instead, I stepped into a lawnmower blade.  I got SQL 2005 Express installed, fired up VSTS:DB Pro, and realized that I HAD THE WRONG VERSION (missing service packs; I had version 611, VSTS:DB Pro needed 612).  No big deal, right?  I’ll run Windows Update, get the missing service packs (since I was now on an old version of MSXML 6 anyway); bad idea.

Apparently, there’s been a lot of patches since XP SP3 was released, including a Service Pack for SQL 2008 (the cog that started this pinwheel); 900 Meg of patches to download, which wouldn’t be such a big deal except for the fact that my company has my connection capped at 512K.

Sidebar: The ant has just realized that it’s not entirely the elephant’s fault.  The queen ant has also made some strange policy decision that must be followed.  However, it’s much easier to blame the elephant, because it’s lonely without a job or an ant hill.  Stupid elephant; wise queen ant.

Anyway, at 3PM this afternoon (I clock out at 4PM),  I was able to answer Dan’s question.  Looking forward to another productive day tomorrow.