#TSQL2sDay 001: Date/Time Issues and Data Modeling

This is my first post for T-SQL Tuesday (thanks, Adam, for the idea!); I’m hoping this will be a continuing block party, and hopefully connect a lot of us SQL bloggers together.  There’s a lot of great content out there, and sometimes it’s just tough to find.

Just a few notes before I get started on this; for this particular post, I’m pulling from my experiences as a data architect for the last few years.  I spend a lot of time reverse-engineering other databases in order to centralize and analyze data from a variety of sources.  Most of my efforts pre-date SQL 2008, so I don’t have a lot of experience with the newer datetime data types introduced in that version; however, I’ll point you in that direction if I think it might address an issue that I raise.  Also, there’s no particular order to my list of notes; think of it as a random collection of date-time issues.  Some of them may be specific to T-SQL and SQL Server, and others may just be procedural notes about data integration (and thus may be applicable to other platforms).

Ready?  Let’s get started.

1.  If you are a database professional that works with SQL Server, strike the word “timestamp” from your vocabulary.

Way back when SQL Server was in its infancy, some developer decided that timestamp would be a good name for a binary column that changes whenever the row was updated.  I don’t know the origins of the timestamp datatype; all I know is that it has nothing to do with date or time.  So why am I including it in a post about Date and Times?  Because for most speakers of the English Language, a timestamp is a marker that indicates when an event happens.  Many junior SQL Server developers and even senior developers on other platforms will often be confused by the timestamp datatype, and wonder what’s going on with that column that they just added.

My recommendation?  Avoid using the word timestamp in design discussions.   When someone mentions “we need to add a timestamp to that row”, make sure that you understand what they mean.  Usually they want you to add a column to capture what datetime the event happened, NOT an indicator of when the row will change.  Note that I’m suggesting that you avoid the WORD timestamp; obviously, if a database requires a timestamp column, use it.

2.  Does anybody really know what time it is?

One of the benefits of a distributed architecture is that your database server doesn’t have to live on the same box as your application server; in fact, your database server may not be in the same timezone as your application server.  And, if you’re using a webserver as an application host, your user may not be in the same timezone as either your application or your database.

I mention this because as a database designer, you may need to capture multiple points-in-time (or you may not); you cannot assume that the application is consistently providing the same time reference for all users.  If you’re using a truly distributed design, you may have one host in New York and one in Los Angeles, and your users may connect to either one to enter in data; what date do you use?

For this reason, I usually advocate two things: 1) always include a database-centric referential date of event (in other words, include a column with a default value of the database time for important data), and 2) always use UTC for saving time.  That second statement’s important enough to warrant a note of its own.

3.  When in doubt, use UTC.

Time is relative to geography; the application you build today in New York may eventually have clients in New Delhi.  If your processes could ever have the potential of crossing multiple time zones (for example, you want to track shipping from Atlanta to Oregon), you’d get the most benefit out of storing your data as UTC in order to understand the relationship between event A (which occurred in EST) and event B (in PST).  It’s a lot easier to change the display of events to a local time from UTC than it is to convert from local time to UTC, do the math, and then convert back.

One problem with recording data with UTC times that I’ve encountered is that is you have centralized data, but distributed users that want to see their data in local time, building OLAP cubes is a pain.  If you’re wanting to analyze data for Atlanta and data for Oregon, Atlanta’s day begins 3 hours earlier than Oregon’s.  I have not found an easy solution to this yet, but am starting to investigate the new datetimeoffset data type in SQL 2008.

4.  Use a standard notation for calculating differences between dates.

Obviously the new date and time datatypes in SQL 2008 address a desire of many in the development community for a method of storing ONLY the date or time of an event, but if you’re using an application that still stores the datetime of an event and you need to round the data to the nearest day, hour, week, whatever unit of time, then use  a standard method for doing so.

(I am assuming that you are using date-based arithmetic rather than converting to a varchar and back).

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)

is the preferred method; you can read why at my previous post.

5.  Use a standard notation for referring to dates.

Americans are weird about dates; we’re one of the few countries that puts the middle container before the smallest container followed by the largest container.  Clueless?  When people ask you what’s your birthdate, how do you answer?  Most people rattle off Month-Day-Year.   An international notation would be Year, Month, Day.

Again, if you’re designing databases and queries that might cross international borders, you’ll save yourself some debugging time in the future if you can get into the habit of using an international standard notation for referring to datetimes.   A query that asks for data WHERE DateOfEvent > ‘11/1/2009’ may work fine in the US (returning data newer than November 1, 2009), but may blow up in the UK (returning data greater than January 11, 2009).

How do you know if you’re a DBA that’s been bitten by this?  You start dating everything (including personal checks) using a notation like ‘2009-11-01’.  I  get odd looks all the time (probably not just because of the way I date my checks).

6.  Finally, get used to the fact that there’s lots of methods to store datetimes.

If you’re doing any sort of database integration, you’re probably familiar with many of the same issues I laid out.  There are others that I haven’t covered, but the important thing I want to focus on is that just because I think these are good habits to get into, others have different opinions.  Some db pro’s prefer to use epoch time rather than a standard datetime; I not only have to be familiar with my own methods of doing something (and prepared to defend those methods), but I also have to be willing to understand other methodologies so that I can do the conversion.

December 8, 2009 · stuart · One Comment
Posted in: TSQL2sDay

One Response

  1. Adam Machanic : T-SQL Tuesday #001 (Date/Time Tricks): The Roundup - December 9, 2009

    [...] Stuart Ainsworth surprised me with his post. Why? Because it’s the only one that covered data modeling, one of the more interesting aspects of the date and time question. His very interesting post is titled “Date/Time Issues and Data Modeling”. [...]

Leave a Reply