Development

You CAN teach an old dog….

This is more of a sticky note kind of post, to remind me to use a better syntax in the future.  Maybe it’ll help some of you as well.

A long time ago, when I was starting to write T-SQL, and I needed to extract only the date from a DATETIME column, I used to do something like this:

SELECT CONVERT(varchar(10), GETDATE(), 101)

 

and it worked OK.  That is, until someone pointed out that I needed to return a datetime instead of a varchar to my calling applications, so I wrapped the whole thing in another CONVERT function, like so:

SELECT CONVERT(datetime, CONVERT(varchar(10), GETDATE(), 101))

At least I now had the correct datetype.  But then I read on the newsgroups, that there may be a (marginal) performance benefit to using the DATEADD and DATEDIFF functions to return the date portion of a datetime, and the following method was suggested:

SELECT DATEADD(day, 0, DATEDIFF(day, 0, GETDATE()))

and that works great! I’ve been using that for years now.  However, it doesn’t work so well when trying to find the first date of a month or year or week.   I figured out that if I flip the parts of the DATEADD statement, then I could easily extract those values:

SELECT DATEADD(month, DATEDIFF(month, 0, GETDATE()),0) --returns 1st day of month
SELECT DATEADD(year, DATEDIFF(year, 0, GETDATE()),0) --returns 1st day of year
SELECT DATEADD(hour, DATEDIFF(hour, 0, GETDATE()),0) --returns the hour
SELECT DATEADD(quarter, DATEDIFF(quarter, 0, GETDATE()),0) --returns the 1st day of the quarter

And while that’s good, I still revert to the original positions out of habit when converting a datetime to a date (e.g., 20090724 23:59:00 to 20090724).  This post is more of a reminder to me to STOP DOING THAT, and use a consistent syntax for date conversions.

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)

Partitioning computed columns

For a project at work, I’ve been asked to help optimize our ETL process.  Part of that involves taking an existing table, and partitioning it along two existing columns.  Since the partitioning process only supports partitioning on one column, the obvious choice is to use a computed column.  I ran into some issues along the way, and thought I would share them here.

First, when partitioning a table on a computed column, you must persist that column.  Persistence is a new feature for SQL Server, introduced in SQL Server 2005 (and still available in 2008).  If you’re familiar with the concept of a computed column in SQL 2000, the basic concept is the same; however, persisted computed columns are columns that are physically stored in the database (the engine updates the values whenever values change in the source column) as opposed to virtual columns.

A persisted computed column is defined like so:

CREATE TABLE Sample (DateID int, TimeID int);

INSERT INTO Sample (DateID, TimeID)
VALUES (1,1);

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, '20090101'))
PERSISTED;

Of course, if you just ran that sample, you got the following error message:

(1 row(s) affected)
Msg 4936, Level 16, State 1, Line 6
Computed column ‘DateIDTimeID’ in table ‘SAMPLE’ cannot be persisted because the column is non-deterministic.

This was the first challenge; although the DATEADD function is always deterministic, the problem lies with the character representation of the date (e.g., ‘20090101’).  This behavior is different when running the code on a SQL 2005 server under the SQL 2000 compatibility mode (80) than when running it at level 90 or above; frankly, I was surprised that the code even compiled under SQL 2000 compatibility mode, since persistence of a computed column was not introduced until SQL 2005.

The way that I determined it was the character representation was I checked out Microsoft Connect (http://connect.microsoft.com), which listed the following issue:

Persistent Computed column using DATEADD | Microsoft Connect

From the response, let me quote:

Hi
The reason for the behavior you observe is because the conversion from string to datetime is not deterministic – it will depend on date format settings.
A work around may be to specify the date constant as a bigint (convert your date to bigint).
The behavior with the string constant is by design.
– Christian Kleinerman

Following Christian’s advice, I sought to do what he suggested, only the programmer in me couldn’t distinguish between CONVERT (the T-SQL function) and convert (the English command); in other words, I tried this:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, CONVERT(bigint, '20090101')))
PERSISTED;

Which, of course, gives a whole different error message:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type datetime.

The statement has been terminated.

Stepping out of programming mode for a minute, I realized that Christian meant that I should replace the string constant I was using to represent the date of January 1, 2009 with the bigint (the number of days since January 1, 1753).  I ran the following:

SELECT CONVERT(bigint, CONVERT(datetime, '20090101'))

Which gave me the result of 39812, which I then plugged into my ALTER TABLE statement, like so:

ALTER TABLE SAMPLE
ADD DateIDTimeID AS DATEADD(minute, TimeID, DATEADD(day, DateID, 39812))
PERSISTED;

SELECT *
FROM Sample 

And the results were much closer to my liking:

DateID TimeID DateIDTimeID
1 1 2009-01-02 00:01:00.000

 

Of course, I now had to deal with the fact that when I had designed my date and time lookup tables, I had used a seed of 1 instead of 0, so now my results were off by a day and a minute, but simple math could adjust for that (future design tip; always use 0-based keys OR use a smart value for the date, like 20090101).  Anyway, partitioning now works, and I can alter the table as needed.

SQLSaturday #24 & #25 – Oct 9-10, 2009 Gainesville, GA

I’m pushing the envelope again.  I just agreed to help promote another SQLSaturday.

AtlantaMDF and Gainesville State College’s Institute for Environmental and Spatial Analysis are hosting a SQLSaturday.  Well, not just any SQLSaturday; at GSC, Saturday starts on Friday.  Let me try to explain.

On Friday, Oct 9, IESA is hosting a student conference to highlight academic work in Environmental and Spatial Analysis Computing; they’d like to introduce those efforts to the professional community, as well as expose their students to relevant issues faced by IT professionals, so on Saturday, Oct 10, IESA is hosting (in conjunction with AtlantaMDF) a traditional SQLSaturday.  Friday is mostly for the students; Saturday for the professional geeks.  However, students are being encouraged to attend both days, and we’re asking for volunteers from the professional community to listen and respond to student presentations on Friday.

Confused?  I’m probably going to be nuts by the time this is over.  However, if you’re interested, hop on over to http://www.sqlsaturday.com and check out events #24 & #25.  If you’re a SQL professional, and you’d like to register, sign up for event #25, and you’d like to submit a presentation, please feel free to do so as well.

Questions?  I’m listening.

Quick Tip: TempDB on a multi-core SQL Server box

Just learned this today from one of the Microsoft support reps reviewing our SQL Server 2008 box for performance optimizations: if you are running SQL Server on a multi-core box, you should set up tempdb to have as many data files as there are physical CPU’s (up to a reasonable limit).  These files should also be equally sized.  In other words, if you need a 6GB tempdb, and you have 2 cores, you’d want to have 2 3GB files.  If you have 16 cores, you can probably limit it to 8 files of 750 MB.  In a 32 core box, you may benefit from additional files, but you probably want to test to see.

The reason for this (as it was explained to me) is that on a high-transaction server, there is the possibility that the file I/O associated with the creation and destruction of temporary objects may cause blocking on parallel CPU operations. For example, CPU0 may run a stored procedure that uses temporary tables; CPU1 may also run that same procedure simultaneously.  If there is a single tempdb data file, the I/O associated with the creation of the local temp object with the stored procedure being executed on CPU0 may block the creation of the files associated with CPU1, even though the tables themselves are locally scoped

Sizing the files equally also assists the optimizer, as space availability per file is used in considering where to store the local objects.   If you have 1 file that is extremely larger than the second file, most of your objects will be created in the larger file, which could also contribute to blocking concerns.  In short:

  • Create the same number of tempdb data files as there are physical processors on the server (up to 8), and
  • Make sure that the data files are equivalently sized.

NOTE: Immediately after posting this, I read the following from Kevin Kline.  Please note that I am not talking about multiple log files, but rather multiple data files.  http://sqlblog.com/blogs/kevin_kline/archive/2009/06/23/old-performance-tuning-recommendations-die-hard.aspx

NOTE2: @kbriankelley tweeted me shortly after this post, and recommended the following post: http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx

Apparently, there may be some disagreement about the requirement for the 1-to-1 ratio of data files to cores.   Paul Randall indicates that it may be a 1-4 or 1-2 ratio. I don’t know; I had a Microsoftie in the office telling our DBA team that we needed at least 8-10 data files for our 16 core box (and he did specify the 1-1 ratio) .  As a database developer, I don’t think about file I/O enough to argue with him.  I do think that this is a tip worth investigating and testing, however, which is the reason I posted it. 

squeaky wheel… yada, yada

Well, I guess if you complain about something long and hard enough, things happen.  Sometimes, things happen that have nothing to do with your complaining, but the coincidental correlation is so strong that your mind draws associations for it (which sometimes leads to fodder for blog posts).

BTW, sorry for not posting much lately; my personal life has gotten rather complicated lately, so I’ve been avoiding the computer after hours.

Anyway, I’ve been struggling for some time now with Visual Studio for Team Systems: Database Edition, particularly in a scrum environment, and it just seems that I’m not making much headway with it.  It’s just a foreign model of usage, and I’m still struggling with the tool. It’s kind of like learning to drive a car after driving a golf cart; the basic principles are the same, but there’s a heckuva lot more options.

Anyway, Shawn Wildermuth to my rescue; his company, Agilitrain, is hosting a series of workshops on Agile Database Techniques using Visual Studio 2008 Database Edition; I told my boss about it, and even though training money is pretty tight right now, I got approval to go within minutes (that says a lot about my boss, or it may say something about his opinion of me and the need for training).  Anyway, here’s the course description:

Agile Database Techniques

This course provides students with the knowledge and skills to properly manage the SQL Server database development lifecycle in an agile environment. The student will learn how to manage changes to the database structure, ensuring quality through T-SQL unit testing, and how to automate the building and deploying of SQL Server databases. Intended for developers and administrators already comfortable with the concepts of designing, programming, testing, and deploying SQL Server databases, this course focuses on using Visual Studio Team System 2008 Database Edition to maximize productivity while minimizing mistakes.

Price: $1995.00 (USD)
Pricing may be different in locations outside the United States
Language(s): English
Length: 3 Days

Pre-requisites:

Before attending this course, the student should have experience developing and maintaining SQL Server 2000, 2005, or 2008 databases. It would also be beneficial if the student has worked in a team-based software development project and is familiar with their organization’s Software Development Life Cycle.

I’ve also been notified that Nob Hill Software has released their latest version of Randolph (an automatic versioning software for databases) with full support for TFS.  Unfortunately, I haven’t had a chance to install and test it yet (I’ve actually been working on an Analysis Services project; looks like I’m becoming a BI guy), but once I do, I’ll post a review here and at the AtlantaMDF Review site.

SQL Server 2008 Upgrade block: SQL Server Service Health Check

I’m posting this because I hope it helps someone else avoid losing several hours of work like I did yesterday.  Even though I’m a developer, I’m also the only DBA in our development group, and I was tasked with upgrading one of our dev boxes from SQL 2005 to SQL 2008. Install started OK, but got stuck during the Upgrade Rules section with the following error message:

“The SQL Server service cannot be restarted; or for a clustered instance, the SQL Server resource is not online.”

Googling the error message (or any combination of the message with “SQL Server upgrade” or “Health”) was useless.  I did find one persistent poster (BobC), who posted several versions of this issue to multiple forums until he got a lead.  If you google the rule name (Engine_SqlEngineHealthCheck), you get the following page from connect:

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=460288

which states that

Thank you for your report. We believe the issue you are experiencing is a result of SQL Server Upgrade requiring full Admin rights on the server, including “Debug programs” security policy.
Check to see if the server has removed the local administrator from “Debug programs” security policy. If so, adding Administrators back to the Debug Programs Group Policy should allow the upgrade to succeed.

In my case, I was attempting to do the install as a local administrator on the box, but apparently a group policy (I’m unsure if it’s the default or one of our company’s) ONLY allows Domain Administrators to have the ability to “Debug Programs”.    I had to ask for help from our network admin, who was able to grant my login FULL administrative rights on the box, and the upgrade continued.

milestones…

Sorry for the delay in this post; my family and I are planning to get away for some much-needed vacation time, and like most Americans, the week before and the week after a vacation are the busiest weeks of the year.  I’m trying to cram in as much stuff as I can before I leave, even though that’s SO NOT THE POINT of a vacation.

Anyway, I really have a lot to write about this week, but just don’t have the time.  Hopefully, I’ll cling to those ideas in the back of my skull somewhere over the next week.  There was one incident of note, however, that I’d really like to share.

I recently met another Atlanta developer by the name of Cheyenne Throckmorton; Cheyenne, like me, is interested in what we can do to build up the development communities in North Georgia, regardless of your particular technological focus.  We’ve had a couple of email conversations, and at some point, we’re probably gonna grab a burger and talk.  However, my story is not about Cheyenne; it’s about Ben Forta.

Who is Ben Forta?  Ben is a ColdFusion Evangelist for Adobe.  He’s coming to Atlanta on June 8 (the same night that my boys from AtlantaMDF meet).  Why should I, a SQL Server developer, care about a ColdFusion developer?  Well, it’s because I didn’t know he was a ColdFusion developer.  I thought he was a SQL guy (remind me to tell you my Ray Charles stories).

A long time ago, I was a struggling doctoral student in the field of Health Communication.  I had completed my coursework and was already working in the field as a research analyst (analyzing data from large samples of cancer patients and people at risk for skin cancer)  when an amazing thing happened: I failed my comprehensive exams.  Twice.   I remember sitting outside my advisor’s office to plan for my attack on the third occasion for over an hour, when I realized two things: 1) she wasn’t coming (I never heard from her again), and 2) I didn’t really like academia.   I had started down a path, and felt compelled to finish it, even though the desire had long since left me.  I decided that day I was going to make a career switch.

One of the tools for managing data we were using was Microsoft Access; I know, go ahead and laugh.  Not the most robust of databases; however, Access does provide some exposure to both Visual Basic (for Applications) and SQL.  I started looking around for Access programming jobs, and applied for the first one that sounded reasonable.  I aced the interview (those communication skills come in handy), but they also had a programming test.  I failed miserably.  I decided that I needed to do some studying (I was, after all, an ex-graduate student).  I reached out for several books to learn about programming; Ben Forta’s Teach Yourself SQL in 10 Minutes (2nd Edition) was one of those books.   I landed a decent job shortly thereafter because of those efforts, and I feel like Forta’s guide was large part of that (I’m not saying that I became a super programmer overnight; I just worked hard and had some great foundations).  Nearly 10 years later, I’m a Data Architect, and I like what I do.

Cheyenne suggested that I send him a copy of the book, and he would get Ben to sign it.  I’ve long since lost my copy, so I ran out and bought the third edition.  I’m hoping Ben will sign it (since I’ll be away on vacation), because even though I no longer need the book, it’ll add to my collection of milestones along my path. 

Gotta run to finish packing; see you in a week.

SQL Saturday #7 wrap-up

Finally back in my hotel room; I’ve forgotten how exhausting it is to actually present these days.  John Baldwin and the rest of the Steel City SQL user group did an awesome job pulling this together.  These guys are a lot of fun to be around, and they really went out of their way to make sure that speakers felt welcome.

Some highlights:

  • The facility was awesome.    It was held at a new building on the Jefferson State Community College campus; every room was equipped with a projector and audio equipment, and seating was comfortable, etc.
  • There were a few familiar faces, but I also had the chance to meet other Southeastern SQL geeks.  I’m hoping that we can build on this relationship, and encourage new growth for all of our user groups.
  • I particularly enjoyed hearing Andy Warren’s session on statistics; even though I don’t do a lot of work in performance tuning anymore, it was useful for me as a developer to understand what is a potential cause for bottlenecks on our servers.
  • My own session (Confessions of a Data Integrator) went well; I need to trim it a bit and sharpen the focus, but I got a lot of positive feedback and positive comments from attendees.  I definitely think I can sharpen it a bit, and I also think I can revamp it for SQL Saturday 17 (in Baton Rouge).

The only snafu I noticed was that the schedule had some interesting time slots; some sessions were 60 minutes in length and some sessions were 75 minutes.  On at least two occasions, the 75 minute sessions were cut short (because the volunteer misread the length of time for that session).  Overall, it was a great day, and I’m looking forward to future events.

Announcing AtlantaMDF Reviews

Just a quick post to tout a recent sister site that I’ve started (with the help of several other AtlantaMDF members): AtlantaMDF Reviews.  Our goal is to provide a forum of book and tool reviews that would be of interest to our local PASS user group.

The site has just started, with only 2 brief reviews as of this date.  However, we’ve got several people working behind the scenes reading and reviewing new material so that we can begin posting several articles soon (BTW, if you’re an AtlantaMDF attendee and would like to contribute, please find me and let me know).

Anyway, if you’re a Google Reader user, here’s a button for you to add the feed to your collection.   If you’re not a Google Reader user, the feedburner URL is below. Keep your eyes open for new material soon:

Add to Google

http://feeds2.feedburner.com/AtlantaMDFReviews