May 2011

Reminder: Data Architecture PASS VC presentation tomorrow

Just a quick note: I’ll be presenting from the wilds of Hoschton, GA via Live Meeting tomorrow night at 8PM EST.  Details cut and pasted below from http://dataarch.sqlpass.org/:

 

Subject:
From DBA to Data Architect: Changing Your Game

Start Time:
Thursday, May 19, 2011 8:00 PM US Eastern Time (May 20, 2011 1:00 AM GMT)

End Time:
Thursday, May 19, 2011 9:00 PM US Eastern Time (May 20, 2011 2:00 AM GMT)

Presenter:
Stuart Ainsworth (blog|@CodeGumbo)

Live Meeting Link:
https://www.livemeeting.com/cc/UserGroups/join?id=JQTC9F&role=attend&pw=h%3E%234n%212Mj

From DBA to Data Architect: Changing Your Game
The role of database administrator has been around for years, but as information collection and storage needs have skyrocketed, a new career opportunity has opened within enterprises: the data architect. This session is intended to give some guidance on the differences between database administrators (and their cousins, the database developer) and data architects, including specific advice on how to transition from one role into the next.

Stuart Ainsworth
Stuart R Ainsworth, MA, MEd is a Database Architect working in the realm of Financial Information Security; over the last 15 years, he’s worked as a Research Analyst, a report writer, a DBA, a programmer, and a public speaking professor. He’s one of the chapter leaders for AtlantaMDF, the Atlanta chapter of PASS. A master of air guitar, he has yet to understand the point of Rock Band ("You push buttons? What’s that all about?").

#msteched Columnstore indexes unveiled–DBI312

Live blogging again; hope you find my notes useful (scattered though they are).  I’ve been waiting on this session because it’s a very specific area of interest.  I work a lot with VLDB’s, and performance is always a concern; claims are that Denali’s columnstore may boost performance of certain queries hundred-fold.  Let’s see how they work, and I’m hoping I can convince my boss to set up a test bed to try this out.

Presenter is Eric N Hanson from Microsoft (Twitter). 

We start off with a story; I like story-time.  Actually, it’s a very effective way to break out user cases.

Buzzphrase for Columnstore: “Enabling interaction with data”.  Supposed to be super efficient, and get large amounts of data back from SQL Server Denali.  Internal project name is Apollo; columnstore is only part of the picture.

Area of focus is BI & DW: load large amounts of data, high-read, incremental loads.  Partitioning is mandatory for this feature.

Curious as to why the examples join tables in the WHERE clause, and not the more accepted syntax of JOIN.

K, here comes the magic: example uses a Fact Table with 100 million rows in it.  Clustered on a date column, and a columnstore index.  Clustered index is still B-TREE; columnstore indexes are nonclustered.

Running duplicate queries; using index hint to force optimizer to use the clustered index in one example.  Wow; 100,000,000 rows of data aggregated in a second on a two-year old laptop.  50x speedup on this particular hardware.  According to presenter: “this is the biggest enhancement to SQL Server since we bought the code from Sybase".

And here’s the meat and potatoes; how does this work?  Vertical partitioning stores each column in a separate page.  Columnstore is based on the same code as PowerPivot and the BI engine;  Vertipaq if you want to do more reading on this.  Columnstore data is highly compressed, so smaller footprint to read from disk and can be stored in main memory.

New query execution plan: batch processing.  “the edsel is the way of the future”.  Actually, the idea is that batches of vectors are stored in query plan; highly efficient data representation.  We can also scale to more cores: tests are showing linear acceleration up to 32 cores.

Instead of storing data as a page, data is stored as a column segment which represents about 1,000,000 rows.

Questions have begun; some questions are good, but this is a 300 level session, folks.  If you don’t understand basic SQL syntax (like how to create an index), this may not be the session for you.  Great question about the relevance of traditional indexes after this is unveiled, and Hanson’s response: in most Decision Support Applications, columnstore is the way to go particularly for scans.

Some index hints for choosing the columnstore or ignoring it:

WITH (index(index_name))

OPTION (ignore_nonclustered_columnstore_index) <—use for bad plan selection if necessary.

Same traditional rules for index hints: trust the optimizer first, rewrite second, and then use hints last.

A couple of new icons for query execution plans: columnstore scan, and batch hash table processing.  Each execution operator now operates in either batch mode or row mode; batch mode is what you want for speed. 

New term of interest: dictionary.  A dictionary is storage for unique values with a lookup so that a column can stores highly compressed information.

Most things just work with SQL Server; Backup and Restore, Mirroring, SSMS, etc.

Lots of datatypes don’t work with column store: long decimals, binary, BLOB, uniqueidentifier, long datetimes, CLR,  (n)varchar(max).

query performance restrictions: outer joins, Unions; Stick with Inner Joins, Star Joins (need to look this one up) Aggregation.  About to show a query which doesn’t benefit from batch processing.  Essence is below:

SELECT t.ID, COUNT(t2.ID)
FROM t LEFT JOIN t2 ON t.ID=t2.ID
GROUP BY t.ID

Left Join knocks it out of batch processing; need to rewrite as an INNER JOIN, but note that you lose the NULL values, so you have to use a CTE; need to get slides for his sample, but you do an INNER JOIN in the CTE, and then do an OUTER JOIN. 

WITH CTE( INNER JOIN)
SELECT blah
FROM t OUTER JOIN CTE ON t.ID yada yada.

Adding data to columnstore; basic methods:

1.  Drop and re-add the index before load.  Expensive, but works well with traditional daily builds

2.  Partition switching.  Sweet spot needs to be tested, but easy one is the hour.  NOLOCK queries pre-empt the ability to do paritioned queries.  Need to read up on this, but may be fixed in future version

3  trickle load can be done, but needs to be tested.

Very awesome; I cannot wait until this is actually released in CTP 3, so I can play around with it.

#msteched “Juneau” preview

Sitting in the Database Dev session at Tech Ed, taking a look at “Juneau”, the new development suite for SQL Server.  I can’t wait to see if this thing slices and dices data like a Ronco food processor.  Again, live blogging, so please excuse the scattered thoughts, poor spelling, and other bad habits below.

Database development is hard.  I’d love to use that sometimes when my boss asks me why something is running behind schedule. 

Great question!  Where does the definition of the database live if most of your code is built on ALTER scripts?

We should work declaratively, not scripted; devs write CREATE statement, and let the tools manage the change statements.

Demos forthcoming…

 

Connected development; working directly with the database server.  Much like SSMS, but it’s obviously the VS shell instead.  Operation is very similar to SSMS in that you build and execute queries inside of the shell; I wonder if execution plans work.

Table designer: uh-oh.  Not sure if this is a good idea.  However, the tools do drift detection before building an active script.

Side note to Microsoft presenters: sometimes the patter works, but only when it’s natural.  Usually, it feels scripted, which makes it clunky instead of funny.

 

Offline Development: Import database into a project, to build a model from the database.  Declarative code is the keyword.  Left-side versus right-side development; idea is that information is arranged depending on the perspective from which you are accustomed.  DBA’s will see the traditional database hierarchy; developers will see the namespaces.

Table designer preview again; nice thing is that it is generating scripts on the fly.

SSTD is deployed to a new lightweight test-and-debug single user instance on your desktop.  Cool stuff; no need to deploy to a test server; it’s there already.

Ughhhhh; live demos are painful.  I’m going to remember to NOT DO THAT EVER AGAIN when I present. 

Yay!!! Execution plans are there!!!!  SWEET.

Interesting way to handle connected and disconnected development.  Since you are building CREATE statements, you can’t run them in a connected sense; execute fails.  But what happens when you run it and are building a new object?  Will it execute?

OMG!!!!!!!! THEY ARE SHOWING A STORED PROCEDURE WITH A F’ING WHILE LOOP IN IT!!!!! I can’t believe that one slipped by.

Cooling off; I totally zoned out on the last view steps.  Lemme calm down and move on to the next one.

 

Publish to SQL Azure: More accurately, the project can be targeted to various editions, including Azure.

Nice feature to work online and offline.  Not much else to say; the tool changes the model to fit the deployment target.

 

SSMS is NOT going away, but the majority of the development functionality will be replicated in Visual Studio. Both will have Server Explorers and query windows, but SSMS will still ship as a part of the client tools for SQL Server.

Schema comparison tools still are lackluster compared to Red Gate, but they seem to have improved a little since I last remember them.  Not really much new here compared to the database projects of 2010 and 2008, but it looks like there has been some thought about best practices.

Overall, a good presentation, but I still think there’s a lot left to do.

#msteched skydiving into the cloud

Here’s a first for me; I’m live-blogging from a keynote.  Diving straight into it, so please excuse the sparseness of the notes.

Robert Wahbe trying to lay out the vision for the cloud from Microsoft; trying to establish that the future is the cloud.  I hate buzzwords, but I think the point of the metaphor is to draw together private vs public application development.  If you build your internal private apps as a private cloud, it should be easy to move to a pubic distributed architecture.

Factors driving the move to the cloud:

  • Extension of existing applications
  • large data sets
  • high perf computing
  • events
  • marketing campaigns (high spike traffic)

Talk about Travelocity’s use of the public cloud.  Focus on scalability, and new products to handle the traffic.  Capacity on demand is the buzzword that they use, and I wish I had a server room like they just showed. 

How easy is it to do this is the question they raise?  I’m still a skeptic.  Biggest problem is not a greenfield scenario, but migration from an existing infrastructure.  Let’s see if demo guy (I missed his name) can convince me. 

Ooh, a Contoso application; I’ve been stuck on AdventureWorks for too long.  Somebody bring back Northwinds.

Ooh, we all need AUTOMATION!  AUTOMATION is what we need! Really?

OK, the VMM service deployment looks cool.  

OK, ADHD just kicked in, and I’ve heard blah, blah, blah for the last couple of minutes.  Apparently, Systems Center will let you do all kinds of one-button stuff.    Now, can I get management to convert from an app that had no clue about that 10 years ago but is still critical to my business?

Now, SQL, and Sharepoint with Amir Netz.  Project Crescent looks interesting. Wow, I like the animation.  I think there will be a lot of future in Management of Business Analysts.  Too much information can cripple a company just as bad as too little; there’s going to have be some training in “what are good questions to ask?”  We’ve finally got the tools to tell us “42”, but do we understand our businesses well enough to understand how we got there?

Average number of devices is 4, so a lot of us are now carrying 6 or 8.  I have two on me right now (work lappy and iPhone), and two at home.  I feel geek-deprived.   Must buy more gear.

Windows Phone 7 interface is nice, but I thin Microsoft has a long way to go to win the smartphone wars.  I really don’t want to go back to carrying multiple devices (smartphone for personal, and smartphone for work).   

Just realized that my butt is starting to hurt; that’s a good indicator that a keynote is running long.  Time for the mid-keynote stretch. 

I’m wondering how many butterflies the Kinect guy has right now; the first few gestures slipped a little.

Second half of the keynote?  Yeesh….  I hope there’s a power plug in the building somewhere.

I chuckled a little when the guy said that “if you’re an ASP.NET developer, you can write Sharepoint Applications.”  I think of all of the swearing that my ASP.NET developer buddies do when you mention Sharepoint…

OK, I’ve done as much as I can for now; I’ll try to blog more today around the sessions. 

knowing when to walk away…

I don’t like to think of myself as a quitter; I especially don’t like to walk away from something I enjoy doing.  However, I’ve recently had to re-evaluate my workload (both personal and professional), and I realized that something had to give.  For me, that something was the Atlanta SQLSaturday 2011.

Now, before you panic, the project is still continuing without me; there’s still a team working very diligently to pull this together.  I’m just no longer working on it.

My reasons for stepping away were many:

  • My personal life is getting very complicated. I’m getting married this summer, and there’s a lot of details left to work out.  On top of that, I have two wonderful kids from my first marriage.  My family deserves no less than my best, and that means I need to put them first.
  • Volunteer work should be fun work.  If you’re leaving a volunteer project more stressed than when you went in, something’s not right.  In my case, I was too invested in the SQLSaturday project to really allow it to grow; I have certain ideas about what should be done, and it’s time to let others take it in a different direction. 
  • Volunteer work should enhance your professional and personal skills.  In my case, I’ve let some of my technical goals slip because I’m investing too much time in volunteering.  I need to finish my technical certifications, for example.
  • There’s other tasks to be done in the Atlanta SQL Server community.  I’ve spent the last few years working with SQLSaturday and AtlantaMDF; one of our original goals for SQLSaturday was to help feed the user group, and to be honest, that hasn’t happened.  Stepping back and looking objectively at the situation has made me realize that there’s some foundational work that needs to be done in the user group in order for it to truly benefit from community outreaches like SQLSaturday.

All of that being said, I’m still going to be doing volunteer work; I just need to make smarter investments of my time.  One of the harder lessons in life is knowing when to walk away from something in order to let it (and yourself) flourish.