As many of you know, I’m a database DEVELOPER; I’ve been a DBA in the past, but my real specialty lies in getting different database platforms to exchange information with SQL Server. I haven’t done a lot of hands-on administration (backups, maintenance plans, etc) in a while, but that’s changed recently. Our primary production DBA has been out of the office for an extended period of time, and I’ve had to step back into a support role.
One of the projects he was working on before he left was performance tuning our databases on the SAN; he had recently segregated the tempdb into its own LUN, and everything looked OK, but during some of our overnight jobs we started seeing messages like:
Exception Information: System.Data.SqlClient.SqlException: The transaction log for database ‘tempdb’ is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
Uh-oh. I had no clue where to begin; I knew that the database was on its own LUN, and I knew that that LUN was full, but how big should the tempdb LUN be? I tried searching the Internet for guidance, but got nowhere; there doesn’t seem to be a rule-of-thumb for sizing tempdb based on other databases.
The LUN was currently sized for 10GB; we have 1.5TB of data. I knew that seemed small, but we had been running for a couple of weeks without issues, so I suggested to our production DBA team that we should double it, and see what happens. All was well until we got to the month-end processes, and BAM! 20GB of space sucked up by tempdb. Now what?
I turned to Twitter, and got several helpful (and some not so helpful) responses; as many people pointed out, it needs to have as much free space available as possible, and the size of it was more dependent on activity than the amount of data. Both excellent points, but neither one addresses the need to have a specific LUN size on a SAN. Space ain’t free.
@venzann came up with the best guide: “Size of indexes + estimate on largest transaction size + wiggleroom.”
Using his suggestion on our 1.5TB server, I came up with an estimate of 430GB in indexes. I stopped there, knowing that my SAN admin would shoot me if I tried to run that past them; besides, our server had run OK for a couple of weeks on 10GB; if we needed 430GB, we would have choked long before that (in retrospect, our production DBA should have recorded the size of tempdb BEFORE he moved it to a LUN; c’est la vie). I decided to pick a smaller number: 100 GB. Why?
First, it made logical sense using @venzann’s formula. We use a partitioning schema for most of our data, so the sum of the active indexes was really only about 75GB. Other indexes may be occasionally used by queries, but 90% of our searches only look at the last 2 days worth of data.
Second, it’s a nice round number. 100GB is 15% of 1.5TB. Round numbers are good for rules-of-thumb. You never hear anyone say “that ____ should be 14% of ____; just a rule-of-thumb”.
Finally, it was the maximum size I could request from our SAN admin without them ordering new drives. Hey, if someone says you can have $100, you don’t say “all I need is 50” (unless of course, you’re Porter).
Anyway, it seems to be working; we’re humming along now without errors; in another 20 days or so, we’ll be doing month-end again, and we’ll see what happens. I can’t wait for the production DBA to get back, so I can quit thinking about this stuff, and go back to my safe zone.
“100GB is 15% of 1.5TB”
Oh really, now?
If 150 GB is 10% of 1500 GB…