Conferences

#SQLSat111 “Biggish Data”

Trying not to be to obnoxious about this, but I’m presenting on “Biggish Data” at SQL Saturday 111 in Atlanta tomorrow; if you’re there, swing by and see me.  I presented this on Monday at AtlantaMDF, and am attempting to incorporate the feedback from that presentation into this one.  Should be good, but I’m still working on the slides.

However, if you attend my session, you’ll get a free koozie!

koozie

A day with Andy (and Matt..) #SQLSAT111 #SQLPASS

I first met Andy Leonard at my first PASS Summit (2008); he was sitting at the back of the room in the Chapter Leaders meeting.  At the time, there was this newfangled social technology called Twitter (perhaps you’ve heard of it), and everybody was live tweeting during the meeting (shades of things to come).  I had been following his twitter stream for some time, and I saw a tweet of his go by.  I responded with something clever like “I’m watching you tweet @AndyLeonard”.  Immediately his head popped up, and looked around the room.  After the meeting, I walked over and introduced myself. 

Talking to Andy is probably one of the easiest things to do in the world; there have been time throughout my association with the SQL community where I’ve bumped into him at some event or another, and he always seems to remember who I am, and what’s going on with me.  I’m nowhere near the prolific writer that he is, and neither am I as plugged in to what’s going on, but yet he always seems to slide into a conversation with me like an old friend.  Andy is just a good guy to know, and I’m glad he’s a friend of mine.

Now, we haven’t always agreed on things; during the PASS elections of 2010, Andy and I looked at the process from two very different perspectives.  Some of our discussions got heated online, and going back and reading those posts today makes me a little sad.  But then I have to pause and think about the face-to-face conversations Andy and I had regarding the same subject, and it’s clear that he really does have a heart for the SQL community; even though I don’t always agree with him, I know that he’s trying to encourage people to do the right thing.  Again, he’s a good guy.

Why do I bring this up? Well, it’s because I’m on the team to bring SQL Saturday 111 to Atlanta on April 14.  We’ve invited Andy Leonard to do a precon for us on April 13; $100 to spend a day with Andy (and Matt Masson).  I know that Andy brings that same outlook to his teaching that he does to his community activities; with a certain amount of encouragement, people can do better.  If you’re not already registered for his class, you should.  You’ll learn something about SSIS, and maybe a little bit about friendship.

#sqlsat89 XML 201 Slide decks uploaded

Just a quick note to say I uploaded my slides from last weekend’s SQL Saturday 89; the event was a blast, and I hope to have my wrap up posted soon.   Thanks to all of those who attended; I felt like it was a great class, and I’m hoping to continue to improve in the future.

SQL Server XML 201

The xml datatype in SQL Server expands the potential of the relational platform to store increasingly complex forms of data, but without the use of the appropriate query language, much of that potential can remain unused. This session will cover the basics of SQL Server XQuery and FLWOR; the assumption is that attendees will have some basic exposure to XML (including the use of XML in SQL Server). Learn how to “run queries within a query”, and how to extract XML data into a tabular format. Coverage will include the five basic XML methods (.exist(), .value(), .query(), .nodes(), and .modify()) and FLWOR (for, let, where, order by, and return).

Upcoming presentations…

So, if you’ve been wondering where I’ve been, the answer is “too frikkin’ busy to write”.  Settling in to a new marriage, changes at my job(s), and volunteer work have been a little more  time-consuming than I originally planned.  I’m hoping that’s gonna change in the near future, cause I’ve some great ideas for posts brewing in the background.

One idea that I’m really excited about is a change in the monthly meetings for AtlantaMDF (our SQL Server User Group); like most user groups, we have a meet-and-greet followed by a presentation (or two).  The presentations usually cover some mid-level to advanced topic, and usually assume that the attendees have some knowledge with SQL Server.  We’re going to tackle that assumption.

Starting at our next meeting (Monday, September 12), we’re going to present short presentations before the main one that are targeted to new users of SQL Server; the goal is to a) build up our new members, and b) help grow our speaker pool.  I’m excited to present the first topic “Stuff in the FROM clause” on Monday, so if you’re in the Alpharetta area Monday night, come by and see me.

Also coming up is our fourth SQL Saturday (#89); although I haven’t been as involved with the planning on this one as I have in years past, it’s been exciting to see it unfold.  I’m looking forward to presenting a session on XQuery, and hanging out by the user group booth. If you’re gonna be there, stop by and say Hi!

#sqlpass PreCON,Baby! #sqlsat89

Just ordered my tickets to my first ever pre-con, and it’s for a SQSaturday being hosted by my chapter. I’m excited to attend John Welch’s Data Warehousing with SSIS Deep Dive on September 16, 2011 (BTW, tickets are still available).

Why this precon at this time?  Well, this year, conference funds are tight for me.  With my recent wedding, I’m really limited on both time and personal funds to attend a lot of conferences, so my own personal learning experiences are being restricted.  It was an easy sell to my employer to fund a $100 pre-con.

Second, my job is shifting, and I realized I needed to pick up some skills to finally make the transition.  I have a lot of exposure to ETL processes, but not so much SSIS.  Combine that with the fact that I haven’t done a lot of BI (most of our ETL is clickstream analysis), and there’s a big old gap in my data knowledge. 

Anyway, much longer post to follow on what I’ve been up to; just wanted to drop a quick note to say where I’m going to be before SQLSaturday 89.

#sqlpass–Last call for Community Choice

I’ve been out of pocket for the last couple of weeks (more on that later), but I was very excited to hear that two of my sessions were being considered for the Community Choice slots at PASS Summit this year.  If you haven’t voted yet, please consider voting for me; I’m looking for the opportunity to really grow this next year as a technical speaker, and I’d love to kick it off with a bang at Summit.  Voting closes TOMORROW (July 20, 2011), so vote now!

SQL Server XML 201 [Application and Database Development]
Stuart Ainsworth (Gladiator Technology Services)
 
Basic Guidelines for VLDB’s [Enterprise Database Administration and Deployment]
Stuart Ainsworth (Gladiator Technology Services)

 

VOTE HERE: http://www.sqlpass.org/summit/2011/UserLogin.aspx?returnurl=%2fsummit%2f2011%2fSummitContent%2fCommunityChoice.aspx

 

Just as an aside, I’d also like to plug a couple of other sessions by some AtlantaMDF members:

Bad SQL [Application and Database Development]
Geoff Hiten (Intellinet)
 
ETL Smackdown: PowerShell vs SSIS — with Aaron Nelson [BI Architecture, Development and Administration Topics]
Julie Smith (Key2 Consulting)
Aaron Nelson (@SQLvariant)

 

4 out of the 20 sessions being considered for Community Choice are by speakers from my local chapter, which makes me proud. Thanks for your consideration, and more posts to come soon!

distractions and other news…

Sorry for the absence from blogging for a bit; a lot on my plate.  Next week, I’ll be turning 40 on July 5th, and then getting married on July 9th.  The latter is much bigger news than the former, but both are reasons to celebrate.

Another reason to celebrate?  I got accepted to speak at SQL Saturday 64 (Baton Rouge).  I’m excited to head back home for good food and good times.  I needed a little affirmation after the disappointment over Summit.

Anyway, I may not blog much in the next few weeks; please stay tuned, because good things are happening.

#msteched Columnstore indexes unveiled–DBI312

Live blogging again; hope you find my notes useful (scattered though they are).  I’ve been waiting on this session because it’s a very specific area of interest.  I work a lot with VLDB’s, and performance is always a concern; claims are that Denali’s columnstore may boost performance of certain queries hundred-fold.  Let’s see how they work, and I’m hoping I can convince my boss to set up a test bed to try this out.

Presenter is Eric N Hanson from Microsoft (Twitter). 

We start off with a story; I like story-time.  Actually, it’s a very effective way to break out user cases.

Buzzphrase for Columnstore: “Enabling interaction with data”.  Supposed to be super efficient, and get large amounts of data back from SQL Server Denali.  Internal project name is Apollo; columnstore is only part of the picture.

Area of focus is BI & DW: load large amounts of data, high-read, incremental loads.  Partitioning is mandatory for this feature.

Curious as to why the examples join tables in the WHERE clause, and not the more accepted syntax of JOIN.

K, here comes the magic: example uses a Fact Table with 100 million rows in it.  Clustered on a date column, and a columnstore index.  Clustered index is still B-TREE; columnstore indexes are nonclustered.

Running duplicate queries; using index hint to force optimizer to use the clustered index in one example.  Wow; 100,000,000 rows of data aggregated in a second on a two-year old laptop.  50x speedup on this particular hardware.  According to presenter: “this is the biggest enhancement to SQL Server since we bought the code from Sybase".

And here’s the meat and potatoes; how does this work?  Vertical partitioning stores each column in a separate page.  Columnstore is based on the same code as PowerPivot and the BI engine;  Vertipaq if you want to do more reading on this.  Columnstore data is highly compressed, so smaller footprint to read from disk and can be stored in main memory.

New query execution plan: batch processing.  “the edsel is the way of the future”.  Actually, the idea is that batches of vectors are stored in query plan; highly efficient data representation.  We can also scale to more cores: tests are showing linear acceleration up to 32 cores.

Instead of storing data as a page, data is stored as a column segment which represents about 1,000,000 rows.

Questions have begun; some questions are good, but this is a 300 level session, folks.  If you don’t understand basic SQL syntax (like how to create an index), this may not be the session for you.  Great question about the relevance of traditional indexes after this is unveiled, and Hanson’s response: in most Decision Support Applications, columnstore is the way to go particularly for scans.

Some index hints for choosing the columnstore or ignoring it:

WITH (index(index_name))

OPTION (ignore_nonclustered_columnstore_index) <—use for bad plan selection if necessary.

Same traditional rules for index hints: trust the optimizer first, rewrite second, and then use hints last.

A couple of new icons for query execution plans: columnstore scan, and batch hash table processing.  Each execution operator now operates in either batch mode or row mode; batch mode is what you want for speed. 

New term of interest: dictionary.  A dictionary is storage for unique values with a lookup so that a column can stores highly compressed information.

Most things just work with SQL Server; Backup and Restore, Mirroring, SSMS, etc.

Lots of datatypes don’t work with column store: long decimals, binary, BLOB, uniqueidentifier, long datetimes, CLR,  (n)varchar(max).

query performance restrictions: outer joins, Unions; Stick with Inner Joins, Star Joins (need to look this one up) Aggregation.  About to show a query which doesn’t benefit from batch processing.  Essence is below:

SELECT t.ID, COUNT(t2.ID)
FROM t LEFT JOIN t2 ON t.ID=t2.ID
GROUP BY t.ID

Left Join knocks it out of batch processing; need to rewrite as an INNER JOIN, but note that you lose the NULL values, so you have to use a CTE; need to get slides for his sample, but you do an INNER JOIN in the CTE, and then do an OUTER JOIN. 

WITH CTE( INNER JOIN)
SELECT blah
FROM t OUTER JOIN CTE ON t.ID yada yada.

Adding data to columnstore; basic methods:

1.  Drop and re-add the index before load.  Expensive, but works well with traditional daily builds

2.  Partition switching.  Sweet spot needs to be tested, but easy one is the hour.  NOLOCK queries pre-empt the ability to do paritioned queries.  Need to read up on this, but may be fixed in future version

3  trickle load can be done, but needs to be tested.

Very awesome; I cannot wait until this is actually released in CTP 3, so I can play around with it.

#msteched “Juneau” preview

Sitting in the Database Dev session at Tech Ed, taking a look at “Juneau”, the new development suite for SQL Server.  I can’t wait to see if this thing slices and dices data like a Ronco food processor.  Again, live blogging, so please excuse the scattered thoughts, poor spelling, and other bad habits below.

Database development is hard.  I’d love to use that sometimes when my boss asks me why something is running behind schedule. 

Great question!  Where does the definition of the database live if most of your code is built on ALTER scripts?

We should work declaratively, not scripted; devs write CREATE statement, and let the tools manage the change statements.

Demos forthcoming…

 

Connected development; working directly with the database server.  Much like SSMS, but it’s obviously the VS shell instead.  Operation is very similar to SSMS in that you build and execute queries inside of the shell; I wonder if execution plans work.

Table designer: uh-oh.  Not sure if this is a good idea.  However, the tools do drift detection before building an active script.

Side note to Microsoft presenters: sometimes the patter works, but only when it’s natural.  Usually, it feels scripted, which makes it clunky instead of funny.

 

Offline Development: Import database into a project, to build a model from the database.  Declarative code is the keyword.  Left-side versus right-side development; idea is that information is arranged depending on the perspective from which you are accustomed.  DBA’s will see the traditional database hierarchy; developers will see the namespaces.

Table designer preview again; nice thing is that it is generating scripts on the fly.

SSTD is deployed to a new lightweight test-and-debug single user instance on your desktop.  Cool stuff; no need to deploy to a test server; it’s there already.

Ughhhhh; live demos are painful.  I’m going to remember to NOT DO THAT EVER AGAIN when I present. 

Yay!!! Execution plans are there!!!!  SWEET.

Interesting way to handle connected and disconnected development.  Since you are building CREATE statements, you can’t run them in a connected sense; execute fails.  But what happens when you run it and are building a new object?  Will it execute?

OMG!!!!!!!! THEY ARE SHOWING A STORED PROCEDURE WITH A F’ING WHILE LOOP IN IT!!!!! I can’t believe that one slipped by.

Cooling off; I totally zoned out on the last view steps.  Lemme calm down and move on to the next one.

 

Publish to SQL Azure: More accurately, the project can be targeted to various editions, including Azure.

Nice feature to work online and offline.  Not much else to say; the tool changes the model to fit the deployment target.

 

SSMS is NOT going away, but the majority of the development functionality will be replicated in Visual Studio. Both will have Server Explorers and query windows, but SSMS will still ship as a part of the client tools for SQL Server.

Schema comparison tools still are lackluster compared to Red Gate, but they seem to have improved a little since I last remember them.  Not really much new here compared to the database projects of 2010 and 2008, but it looks like there has been some thought about best practices.

Overall, a good presentation, but I still think there’s a lot left to do.