#TSQL2sDay 001: Date/Time Issues and Data Modeling
This is my first post for T-SQL Tuesday (thanks, Adam, for the idea!); Iâm hoping this will be a continuing block party, and hopefully connect a lot of us SQL bloggers together. Thereâs a lot of great content out there, and sometimes itâs just tough to find.
Just a few notes before I get started on this; for this particular post, Iâm pulling from my experiences as a data architect for the last few years. I spend a lot of time reverse-engineering other databases in order to centralize and analyze data from a variety of sources. Most of my efforts pre-date SQL 2008, so I donât have a lot of experience with the newer datetime data types introduced in that version; however, Iâll point you in that direction if I think it might address an issue that I raise. Also, thereâs no particular order to my list of notes; think of it as a random collection of date-time issues. Some of them may be specific to T-SQL and SQL Server, and others may just be procedural notes about data integration (and thus may be applicable to other platforms).
Ready? Letâs get started.
1. If you are a database professional that works with SQL Server, strike the word âtimestampâ from your vocabulary.
Way back when SQL Server was in its infancy, some developer decided that timestamp would be a good name for a binary column that changes whenever the row was updated. I donât know the origins of the timestamp datatype; all I know is that it has nothing to do with date or time. So why am I including it in a post about Date and Times? Because for most speakers of the English Language, a timestamp is a marker that indicates when an event happens. Many junior SQL Server developers and even senior developers on other platforms will often be confused by the timestamp datatype, and wonder whatâs going on with that column that they just added.
My recommendation? Avoid using the word timestamp in design discussions. When someone mentions âwe need to add a timestamp to that rowâ, make sure that you understand what they mean. Usually they want you to add a column to capture what datetime the event happened, NOT an indicator of when the row will change. Note that Iâm suggesting that you avoid the WORD timestamp; obviously, if a database requires a timestamp column, use it.
2. Does anybody really know what time it is?
One of the benefits of a distributed architecture is that your database server doesnât have to live on the same box as your application server; in fact, your database server may not be in the same timezone as your application server. And, if youâre using a webserver as an application host, your user may not be in the same timezone as either your application or your database.
I mention this because as a database designer, you may need to capture multiple points-in-time (or you may not); you cannot assume that the application is consistently providing the same time reference for all users. If youâre using a truly distributed design, you may have one host in New York and one in Los Angeles, and your users may connect to either one to enter in data; what date do you use?
For this reason, I usually advocate two things: 1) always include a database-centric referential date of event (in other words, include a column with a default value of the database time for important data), and 2) always use UTC for saving time. That second statementâs important enough to warrant a note of its own.
3. When in doubt, use UTC.
Time is relative to geography; the application you build today in New York may eventually have clients in New Delhi. If your processes could ever have the potential of crossing multiple time zones (for example, you want to track shipping from Atlanta to Oregon), youâd get the most benefit out of storing your data as UTC in order to understand the relationship between event A (which occurred in EST) and event B (in PST). Itâs a lot easier to change the display of events to a local time from UTC than it is to convert from local time to UTC, do the math, and then convert back.
One problem with recording data with UTC times that Iâve encountered is that is you have centralized data, but distributed users that want to see their data in local time, building OLAP cubes is a pain. If youâre wanting to analyze data for Atlanta and data for Oregon, Atlantaâs day begins 3 hours earlier than Oregonâs. I have not found an easy solution to this yet, but am starting to investigate the new datetimeoffset data type in SQL 2008.
4. Use a standard notation for calculating differences between dates.
Obviously the new date and time datatypes in SQL 2008 address a desire of many in the development community for a method of storing ONLY the date or time of an event, but if youâre using an application that still stores the datetime of an event and you need to round the data to the nearest day, hour, week, whatever unit of time, then use a standard method for doing so.
(I am assuming that you are using date-based arithmetic rather than converting to a varchar and back).
SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0)
is the preferred method; you can read why at my previous post.
5. Use a standard notation for referring to dates.
Americans are weird about dates; weâre one of the few countries that puts the middle container before the smallest container followed by the largest container. Clueless? When people ask you whatâs your birthdate, how do you answer? Most people rattle off Month-Day-Year. An international notation would be Year, Month, Day.
Again, if youâre designing databases and queries that might cross international borders, youâll save yourself some debugging time in the future if you can get into the habit of using an international standard notation for referring to datetimes. A query that asks for data WHERE DateOfEvent > â11/1/2009â may work fine in the US (returning data newer than November 1, 2009), but may blow up in the UK (returning data greater than January 11, 2009).
How do you know if youâre a DBA thatâs been bitten by this? You start dating everything (including personal checks) using a notation like â2009-11-01â. I get odd looks all the time (probably not just because of the way I date my checks).
6. Finally, get used to the fact that thereâs lots of methods to store datetimes.
If youâre doing any sort of database integration, youâre probably familiar with many of the same issues I laid out. There are others that I havenât covered, but the important thing I want to focus on is that just because I think these are good habits to get into, others have different opinions. Some db proâs prefer to use epoch time rather than a standard datetime; I not only have to be familiar with my own methods of doing something (and prepared to defend those methods), but I also have to be willing to understand other methodologies so that I can do the conversion.