SQL

Quick and Easy SQL Script

I haven’t been blogging enough lately, so I’m going to try something new to get past this mild case of writer’s block.  Instead of trying to do a long, in-depth post on some esoteric subject, I’m going to just put out a simple SQL script.  This one’s a bit of a hack, but you’ll see a lot of variations on it: Coalescing rows using for XML.  For example, here’s a few questions from StackOverflow:

Cursor in stored procedure

How to Comma separate multiple rows obtained from a SQL Query

Convert multiple rows into one with comma as separator

The basic challenge is to denormalize your resultset so that multiple rows get returned as a single delimited string; while there are several methods for solving this problem, the one that I like (as of SQL 2005) is to use FOR XML PATH, like so:

BEGIN TRAN

DECLARE @t TABLE ( fruit VARCHAR(10) )

INSERT  INTO @t
       
( fruit
       
)
       
SELECT  'apple'
       
UNION ALL
       
SELECT  'banana'
       
UNION ALL
       
SELECT  'tomato'

SELECT  ',' + fruit
FROM    @t
FOR     XML PATH

SELECT  ',' + fruit
FROM    @t
FOR     XML PATH('')

SELECT  STUFF(( SELECT  ',' + fruit
               
FROM    @t
             
FOR
                XML
PATH('')
              ),
1, 1, '') AS fruits

ROLLBACK

 

The multiple results sets are there to illustrate each step in the process; for example, the first output shows what happens when you specify FOR XML PATH with no indicators; SQL Server automatically assigns each row in the resultset to a node named <row>:

<row>,apple</row>
<row>,banana</row>
<row>,tomato</row>

If you specify a tag name in the PATH function, each row will be replaced with that tag name; in the above example, we want an empty tag (‘’).  The result is an XML fragment with no tags:

,apple,banana,tomato

That’s almost what we want, except we need to get rid of the leading comma.  The STUFF command basically replaces the characters in the string supplied (the first parameter; our query FOR XML PATH) starting at a certain position (1 in the example above) and moving forward the specified number of character places (1, the third parameter) with the value of the last parameter (an empty space; the fourth parameter).  The final output is a comma delimited string:

apple,banana,tomato

Straightforward; gotta return to writing.

Stuff in the FROM clause: Virtual Structures

Continuing with my SQL Server basics series, I’m focusing this post on Stuff in the FROM clause: Virtual Structures.  My last post talked about the only Base structure in a relational database, but I’m now moving on to a slightly more complicated concept.  Virtual structures are database objects that don’t hold data in a tabular structure, but can interface with data in a tabular format.  It’s probably best to start with the most common virtual structure, the VIEW:

VIEWs

According to Books Online, VIEWs create "a virtual table whose contents (columns and rows) are defined by a query. Use this statement to create a view of the data in one or more tables in the database.”  A view encapsulates a SQL SELECT statement to return data, and can be used to provide an alternate representation of data from the original tables.  For example, the following VIEW returns columns from both the Person and Employees tables in the AdventureWorks sample database:


USE AdventureWorks2008R2 ;
GO
IF OBJECT_ID ('hiredate_view', 'V') IS NOT NULL
DROP VIEW hiredate_view ;
GO
CREATE VIEW hiredate_view
AS
SELECT
p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO

Views are not just read-only; in certain cases, VIEWS can be used to insert or update data into underlying tables.  However, discussion of how this works is beyond the intent of this introduction; refer to Books Online fore more information.

Table-Valued Functions (TVF)

Views are not the only virtual structures available in SQL Server; a subset of a user-defined function known as table-valued functions (TVFs) also provide interfaces into the underlying base structures.   Books Online defines table-valued functions as “a Transact-SQL or common language runtime (CLR) routine that accepts parameters, performs an action, such as a complex calculation, and returns the result of that action as… a table.”  There are three types of TVF’s (arranged below in order of increasing complexity): simple, complex, and Common Language Runtime.

Simple TVFs

A simple (or inline) table-valued function is probably the easiest virtual structure to grasp beyond a view; in fact, it’s easily compared to a view with parameters.  A simple TVF encapsulates a SQL statement (much like a view), but it allows for the use of parameters to filter the results returned.  For example, the following simple TVF pulls data from multiple tables for a specific StoreID:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'Sales.ufn_SalesByStore', N'IF') IS NOT NULL
   
DROP FUNCTION Sales.ufn_SalesByStore;
GO
CREATE FUNCTION Sales.ufn_SalesByStore (@storeid INT)
RETURNS TABLE
AS
RETURN
(
   
SELECT P.ProductID, P.Name, SUM(SD.LineTotal) AS 'Total'
   
FROM Production.Product AS P
   
JOIN Sales.SalesOrderDetail AS SD ON SD.ProductID = P.ProductID
   
JOIN Sales.SalesOrderHeader AS SH ON SH.SalesOrderID = SD.SalesOrderID
   
JOIN Sales.Customer AS C ON SH.CustomerID = C.CustomerID
   
WHERE C.StoreID = @storeid
   
GROUP BY P.ProductID, P.Name
);
GO

--use the new TVF in a query
SELECT * FROM Sales.ufn_SalesByStore (602);

Complex TVFs

Complex (or multi-statement) TVF’s are much like their simple counterparts, but allow for greater flexibility in T-SQL coding by adopting a procedural approach to returning tabular data.  A complex TVF requires that a table variable (to be covered later) be defined within the function, and then populated by a series of SQL statements.  The below example from Books Online shows how:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'dbo.ufn_FindReports', N'TF') IS NOT NULL
   
DROP FUNCTION dbo.ufn_FindReports;
GO
CREATE FUNCTION dbo.ufn_FindReports (@InEmpID INTEGER)
RETURNS @retFindReports TABLE
(
   
EmployeeID INT PRIMARY KEY NOT NULL,
   
FirstName NVARCHAR(255) NOT NULL,
   
LastName NVARCHAR(255) NOT NULL,
   
JobTitle NVARCHAR(50) NOT NULL,
   
RecursionLevel INT NOT NULL
)
--Returns a result set that lists all the employees who report to the
--specific employee directly or indirectly.*/
AS
BEGIN
WITH
EMP_cte(EmployeeID, OrganizationNode, FirstName, LastName, JobTitle, RecursionLevel) -- CTE name and columns
   
AS (
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, 0 -- Get the initial list of Employees for Manager n
       
FROM HumanResources.Employee e
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
WHERE e.BusinessEntityID = @InEmpID
       
UNION ALL
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, RecursionLevel + 1 -- Join recursive member to anchor
       
FROM HumanResources.Employee e
           
INNER JOIN EMP_cte
           
ON e.OrganizationNode.GetAncestor(1) = EMP_cte.OrganizationNode
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
)
-- copy the required columns to the result of the function
  
INSERT @retFindReports
  
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
  
FROM EMP_cte
  
RETURN
END
;
GO
-- Example invocation
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
FROM dbo.ufn_FindReports(1);

GO

Note that code in a multi-statement TVF can involve several steps before returning the final result-set; caution should be used when using complex TVF’s because the optimizer can not determine the cost of these statements during plan estimation.  In short, the more complex the TVF, the more likely that the plan used by SQL Server will be inaccurate.

CLR TVF’s

With the release of SQL Server 2005, Microsoft embedded a limited subset of the functionality from the .NET framework into SQL Server itself, calling it the Common Language Runtime (CLR).  The CLR allows for .NET developers to write and deploy database objects in a managed language (like VB.NET or C#), and allows for very flexible logic for handling data.  A CLR Function is invoked in a fashion similar to other table-valued functions, but the deployment is much more involved (see Books Online for an example using the following steps):

  1. Code is written and assembled by a compiler (like Visual Studio).
  2. The assembled code must be copied on to the SQL Server.
  3. An ASSEMBLY reference is created in T-SQL, which points to the deployed .dll file, and
  4. a FUNCTION is created which references the ASSEMBLY.

Next up…

A two-fer: Temporary and Semi-Temporary structures!

Stuff in the FROM clause: Base Structures

A few months ago, our user group started thinking about how we could get new members to start coming to our SQL Server user group meetings; we defined new user as one of two categories:

1: new to databases, or
2. new to SQL Server

In both cases, we realized that we needed to start having new content in our monthly sessions that was targeted specifically for them, so we decided to start adding a brief 10-minute fundamental session to the beginning of each meeting. I chose to do the first one, and selected a topic called Stuff in the FROM clause. I quickly realized that this could be a nice little blog series, so I’m attempting to expand a 10-minute presentation into a four-part series. Below is the mind map for the presentation:


Stuff in the FROM clause: Base Structures
Base Structures as a category name is a bit misleading, because there’s really only one base structure in any relational database: the table.  Tables are the core of any relational database; they provide storage for data in rows, arranged by columns.  There’s lots of background information on table design, so I won’t spend too much time on it (even that’s too basic for this discussion), but I will highlight some key features about tables:

  • From an entity design perspective, a table should represent a set of entities.
    • Rows are the un-ordered members of that set
    • Columns are the attributes of the entity.
  • Table attributes have basic types.
  • Tables should have a primary key constraint on one of the columns, which is used to uniquely identify a member of that set.
  • Microsoft SQL Server supports a syntax for computed columns, where data for an attribute can be calculated from other attributes.

For Microsoft SQL Server, the CREATE TABLE syntax may be found at: http://msdn.microsoft.com/en-us/library/ms174979.aspx  The simplest table creation statement looks like the following:

CREATE TABLE table_name (column_name column_type)

Next: Virtual Structures

#TSQL2sDay Roundup

So the launch was early, and the write-up is delayed.  Time has no meaning….  Truthfully, I’m sorry I didn’t manage to squeeze this summary post in a bit sooner; I have the typical DBA excuse: too much to do, and too little time.

One nice thing about this topic is that it seemed to resonate with several bloggers who had either strayed away from T-SQL Tuesday or had never participated; hopefully, the bug to write will stick (and I’m pointing the finger at myself for this one as well).

Beginning at the Beginning:

Pinal Dave in his gentle teaching style covers several different questions and answers about JOIN techniques. Great way to review the basics and get conversations started with beginners.

KenJ is a bit esoteric, but a great reminder of what resources there are to learn more about SQL Server.

 

JOINs In The Real World:

Tracy McKibben demonstrated a very simple tuning tip based on real production experience: the Nested JOIN

 

Rich Brown reminds us that good query construction often involves understanding how the optimizer handles JOINs under the covers.

Andy Galbraith (Yet Another SQL Andy!) reminds us to Respect Your JOINs. It’s funny how often we assume that database queries are logically well-written when we start looking for performance problems.

 

Bob Pusateri points out that WHERE clauses influence JOIN behaviors. You have to look at the entire SQL statement to make sure you’re getting what you’re supposed to be getting.

Matt Nelson reminds us that unnecessary JOINs are bad, and sometimes you need to dig a little deeper when performance tuning.

 

A Little Deeper:

Rob Farley toasted my brain a little on this post about Joins without JOINs.  It’s a well-written explanation of what an Ant-Semi JOIN is.

 

Brad Schulz finished the cooking job with a take on Semi-JOINs, among other things.  Is Brad really Rob in reverse?

Muthukkumaran Kaliyamoorthy covered the internal join mechanisms, and why the optimizer chooses merge, hash, or nested loop joins.

Robert Matthew Cook has a couple of great metaphors for explaining Merge, Hash, and Nested Loop JOINs.

 

Wayne Sheffield pointed out some interesting syntactical ways of writing JOINs. I’m not sure if I’m comfortable with some of them, but they may be of some use in certain edge scenarios. Or, if you just want to mess with the guy who reviews your code.

Richard Douglas offers up a short-but-sweet explanation of the relationship of Key Lookups to JOINs.

 

The Future is a Ticking Time Bomb…

Thomas Rushton points out that Denali may finally force us to clean up our code.

 

 

My Portugese is Not Up To Par…

 
Finally, Ricardo Leka wrote a post that I had to use Google Translate to interpret.  I think I lost something in  translation, because I’m not sure how the example relates to JOIN’s, but thought I would include it anyway.  http://leka.com.br/2011/10/04/t-sql-tuesday-23-joins/

#TSQL2sDay T-SQL Tuesday 23–Early edition

T-SQL Tuesday Logo

 

Time once again for another edition of T-SQL Tuesday!  What’s that, you say?  Early?  Why yes, indeed.  Due to the potential schedule of conflict of PASS’s Summit 2011 occurring on the second Tuesday of October (the normal date for T-SQL Tuesday), I’ve gotten special permission to bump it up a week. 

To participate in this month’s T-SQL Tuesday, your post must go live between midnight UTC on Tuesday, October 4, 2011 and midnight UTC on Wednesday, October 5.

 

Your post needs to link back to this blog, and if you use the image, anchor it to this post.  Make sure you leave a comment or a trackback to this post (so I can find it later, when I write up the summary).

Topic d’jour?  JOINS (I’m in a fundamentals mood lately).  Note that I also like creative and esoteric posts, so if you can find a way to apply SQL as a metaphorical language for community activity, I’ll read it and enjoy it.  If you just want to tell me in a simple fashion the difference between a HASH and MERGE join, I’m cool with that, too.

Do me a favor, though, and please spread the word about the early date.  Looking forward to reading your submissions.

#msteched Columnstore indexes unveiled–DBI312

Live blogging again; hope you find my notes useful (scattered though they are).  I’ve been waiting on this session because it’s a very specific area of interest.  I work a lot with VLDB’s, and performance is always a concern; claims are that Denali’s columnstore may boost performance of certain queries hundred-fold.  Let’s see how they work, and I’m hoping I can convince my boss to set up a test bed to try this out.

Presenter is Eric N Hanson from Microsoft (Twitter). 

We start off with a story; I like story-time.  Actually, it’s a very effective way to break out user cases.

Buzzphrase for Columnstore: “Enabling interaction with data”.  Supposed to be super efficient, and get large amounts of data back from SQL Server Denali.  Internal project name is Apollo; columnstore is only part of the picture.

Area of focus is BI & DW: load large amounts of data, high-read, incremental loads.  Partitioning is mandatory for this feature.

Curious as to why the examples join tables in the WHERE clause, and not the more accepted syntax of JOIN.

K, here comes the magic: example uses a Fact Table with 100 million rows in it.  Clustered on a date column, and a columnstore index.  Clustered index is still B-TREE; columnstore indexes are nonclustered.

Running duplicate queries; using index hint to force optimizer to use the clustered index in one example.  Wow; 100,000,000 rows of data aggregated in a second on a two-year old laptop.  50x speedup on this particular hardware.  According to presenter: “this is the biggest enhancement to SQL Server since we bought the code from Sybase".

And here’s the meat and potatoes; how does this work?  Vertical partitioning stores each column in a separate page.  Columnstore is based on the same code as PowerPivot and the BI engine;  Vertipaq if you want to do more reading on this.  Columnstore data is highly compressed, so smaller footprint to read from disk and can be stored in main memory.

New query execution plan: batch processing.  “the edsel is the way of the future”.  Actually, the idea is that batches of vectors are stored in query plan; highly efficient data representation.  We can also scale to more cores: tests are showing linear acceleration up to 32 cores.

Instead of storing data as a page, data is stored as a column segment which represents about 1,000,000 rows.

Questions have begun; some questions are good, but this is a 300 level session, folks.  If you don’t understand basic SQL syntax (like how to create an index), this may not be the session for you.  Great question about the relevance of traditional indexes after this is unveiled, and Hanson’s response: in most Decision Support Applications, columnstore is the way to go particularly for scans.

Some index hints for choosing the columnstore or ignoring it:

WITH (index(index_name))

OPTION (ignore_nonclustered_columnstore_index) <—use for bad plan selection if necessary.

Same traditional rules for index hints: trust the optimizer first, rewrite second, and then use hints last.

A couple of new icons for query execution plans: columnstore scan, and batch hash table processing.  Each execution operator now operates in either batch mode or row mode; batch mode is what you want for speed. 

New term of interest: dictionary.  A dictionary is storage for unique values with a lookup so that a column can stores highly compressed information.

Most things just work with SQL Server; Backup and Restore, Mirroring, SSMS, etc.

Lots of datatypes don’t work with column store: long decimals, binary, BLOB, uniqueidentifier, long datetimes, CLR,  (n)varchar(max).

query performance restrictions: outer joins, Unions; Stick with Inner Joins, Star Joins (need to look this one up) Aggregation.  About to show a query which doesn’t benefit from batch processing.  Essence is below:

SELECT t.ID, COUNT(t2.ID)
FROM t LEFT JOIN t2 ON t.ID=t2.ID
GROUP BY t.ID

Left Join knocks it out of batch processing; need to rewrite as an INNER JOIN, but note that you lose the NULL values, so you have to use a CTE; need to get slides for his sample, but you do an INNER JOIN in the CTE, and then do an OUTER JOIN. 

WITH CTE( INNER JOIN)
SELECT blah
FROM t OUTER JOIN CTE ON t.ID yada yada.

Adding data to columnstore; basic methods:

1.  Drop and re-add the index before load.  Expensive, but works well with traditional daily builds

2.  Partition switching.  Sweet spot needs to be tested, but easy one is the hour.  NOLOCK queries pre-empt the ability to do paritioned queries.  Need to read up on this, but may be fixed in future version

3  trickle load can be done, but needs to be tested.

Very awesome; I cannot wait until this is actually released in CTP 3, so I can play around with it.

#TSQL2sday: Emulating a FIRST aggregation

tsql2sday

Jes Borland is hosting this month’s T-SQL Tuesday, and it’s all about aggregations.  Here’s an old coding trick of mine to emulate a FIRST aggregation in T-SQL.  Say we have a table that has three columns:

  • ID, a uniqueidentifier
  • Name, a varchar that represents something, and
  • DateStored, a datetime that is set when the row is written to the table

And we populate that table like so:

CREATE TABLE TSQL2sDay_FirstAgg
   
(
     
ID UNIQUEIDENTIFIER
   
, NAME VARCHAR(20)
    ,
DateStored DATETIME DEFAULT GETUTCDATE()
    )
    
INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Orange' )

 

It’s easy to figure out the number of rows associated with each name:

-- SELECT data to verify order of DateStored
SELECT  ID
     
, NAME
     
, DateStored
FROM    TSQL2sDay_FirstAgg      

-- Basic Row Count by Name
SELECT  NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

but how do we figure out what the first ID was for each name along with the number of rows?  You could work something out using the HAVING clause of the SELECT statement, or you could do something like the following:

--SELECT first ID and count of rows by Name
SELECT  FirstID = CONVERT(UNIQUEIDENTIFIER, RIGHT(MIN(CONVERT(VARCHAR(24), DateStored, 121) + CONVERT(VARCHAR(36), ID)),
                                                 
36))
      ,
NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

It looks complicated, but it’s not; let’s step through it.

  1. We have to know some basic information about our data; in this case, we know that the datetime value associated with each row with a common name is different.  In other words, there are no two Peanuts with the same DateStored value.  This is important, because in order for there to be a first value, there must be some method of ALWAYS determining which one WAS first.  If two Peanuts showed up at the same time, the model is broken.
  2. The first thing we do is to CONVERT the DateStored value to a varchar; this allows us to concatenate it with other values.  The format of that varchar string is important; it must be precise, and it must sort in an ascending order.  The ODBC canonical format (with milliseconds) is a good candidate for this.
  3. We then CONVERT the uniqueidentifer to a varchar, and append it to the DateStored varchar value.  This gives us a lengthy string which can be sorted by the first 24 characters.
  4. We find the MIN of the string we constructed; this MIN value is determined by the optimizer based on the sorting value of the numbers in the DateStored value.
  5. We then take the RIGHT-most 36 characters (the length of a uniqueidentifier), and convert it back to a uniqueidentifier (so that we have our type back).

There are probably better solutions for this, but this is a simple trick that works under certain circumstances and is portable to several flavors of SQL.

A simple codebuilder for parsing in T-SQL

If you’ve ever tried to parse a wide character column in T-SQL, you know two things:

  1. It’s a pain to do, and
  2. It’s a pain to do.

A lot of the data I deal with comes in syslog format, which can come in one of two formats: positional (the location of the data element is related to the type of data), and named attributes (which usually only include delimiters for complex strings).  Although I haven’t had much luck automating positional parsing, I’ve recently begun using Excel to help me with the named attributes. 

Here’s an example; I have a table with a message column that is pulling over syslog data from a firewall.  In a given day, I may have millions of rows like the following:

sn=AA17D5028EAA time="2011-01-26 13:40:14 UTC" fw=10.1.100.1 pri=1 c=512 m=522 msg="Malformed or unhandled IP packet dropped" n=1 src=10.1.1.23:32795:X1: dst=10.1.1.1:514:: proto=udp/17

Note that each attribute of this particular syslog message is identified with an attribute name (eg, sn, time, fw, etc).  In order to break out each of the elements in T-SQL, we can split the string using a combination of SUBSTRING and CHARINDEX, like so:

SELECT TOP 1
        m
= CONVERT(INT, SUBSTRING(MESSAGE, CHARINDEX(' m=', MESSAGE) + 3,
                                  
CHARINDEX(' ', MESSAGE, CHARINDEX(' m=', MESSAGE) + 3) - ( CHARINDEX(' m=', MESSAGE)
                                                                                              +
3 )))
      ,
time = CONVERT(DATETIME, SUBSTRING(MESSAGE, CHARINDEX(' time="', MESSAGE) + 7,
                                          
CHARINDEX('UTC"', MESSAGE, CHARINDEX(' time="', MESSAGE) + 7)
                                           - (
CHARINDEX(' time="', MESSAGE) + 7 )))
      ,
fw = CONVERT(VARCHAR(20), SUBSTRING(MESSAGE, CHARINDEX(' fw=', MESSAGE) + 4,
                                           
CHARINDEX(' ', MESSAGE, CHARINDEX(' fw=', MESSAGE) + 4) - ( CHARINDEX(' fw=',
                                                                                                       
MESSAGE) + 4 )))
FROM    syslogng (NOLOCK)

Note the repetition for each column; you need to find the position of a starting delimiter, the position of an ending delimiter, and supply to the SUBSTRING function the position of the starting delimiter, and the difference between the two.  You also need to determine the lingth of the starting identifier, and then I CONVERT to a specific data type.  Whee!

It gets even more fun when the attributes are optional; some syslog messages may have a proto code, and some may not.   When faced with this, you need to include a CASE option, like so:

SELECT TOP 1
        proto
= CONVERT(VARCHAR(20), CASE WHEN CHARINDEX(' proto=', MESSAGE) = 0 THEN NULL
                                         
ELSE SUBSTRING(MESSAGE, CHARINDEX(' proto=', MESSAGE) + 7,
                                                        
CHARINDEX(' ', MESSAGE, CHARINDEX(' proto=', MESSAGE) + 7)
                                                         - (
CHARINDEX(' proto=', MESSAGE) + 7 ))
                                    
END)
FROM    syslogng (NOLOCK)

 

One of our developers is working on a syslog parser in .NET code, but I needed a proof-of-concept, and I didn’t want to keep cutting and pasting to see if it was working.  Looking at the parsing, it’s very formulaic SQL.  When I think formulas, I think Excel, and so I whipped out the following:

image

Note that I have several input columns:

  • start, the starting delimiter
  • end, the ending delimiter (usually a space)
  • colname, the column name I want to use; usually the same as start, but stripped of extra characters.
  • type, the SQL type I want to convert the data to, and
  • optional, a column to decide if the attribute is optional per row or not.

I also have a hidden column (column F), which generates most of the SQL code:

=CONCATENATE("SUBSTRING(message, CHARINDEX(‘", A2, "’, message)+ ", LEN(A2), ", CHARINDEX(‘", B2, "’, message, CHARINDEX(‘", A2, "’, message)+", LEN(A2), ") – (CHARINDEX(‘", A2, "’, message)+", LEN(A2), "))")

This takes the starting and ending delimiters, the length of the starting delimiter, and plugs those values into a valid SQL statement.  I then create a SQL column, using the following formula:

=CONCATENATE(", ", C2,"CONVERT(", D2, ", ",  IF(E2="Y", CONCATENATE("CASE WHEN CHARINDEX(‘", A2, "’, message) = 0 THEN NULL ELSE ",F2, " END"), F2), ")")

If I were better at Excel, I’d use named ranges, but for my purposes, this is OK.   I append a column to the beginning, specify the type, and include a CASE statement based on whether or not my optional column includes a “Y”.

It took me longer to write this blog post than it did to generate a proof-of-concept, parsing each of the named attributes out from a syslog message.

#TSQL2sDay: Resolutions

tsql2sday For this month’s T-SQL Tuesday, Jen McCown asks:

So tell us: what techie resolutions have you been pondering, and why?  Are you heading for a certification? An award? Are you looking to pick up CLR because that guy at the Summit said it’s “bitchin’”? Go crazy…

I’ve already covered a lot of my techie resolutions in this post, but here’s a recap, with some expanded thoughts:

  • I vow to learn something new every month.  I’ve already started on this one, but I need to keep working on it.   For example, I’m working on XML and XQuery this month; next month, I’m thinking SSIS.
  • I vow to be more involved in the technical community.  I’ve slipped out of tweeting (mostly because it’s blocked on our corporate network); I will do more.  I also want to read more blogs, as well as do a LOT more blogging myself.  For example, I plan to participate in every T-SQL Tuesday for 2011.  I also plan to present at least 6 times this year.
  • I will earn my MCITP: Database Developer certification this year.  Been meaning to do it; just haven’t invested the time to do so.

On a personal note, I want to tackle a few more technical projects that have been hovering over my head:

  • I want to do more with pictures and videos.  I have a nice digital camera, and a nice Flip video camera, but I don’t do squat with them.  I’m horrible about leaving them behind when I travel; I will use them as needed.
  • My fiancée is an iPod user (like 90% of the world); I am not (I have an Archos).  Merging our music into iTunes is not going to be fun (especially since I’ve never used it), but in the long run, it’ll be the right thing to do for us.
  • I want to work smarter, not harder, so I can play more.  There’s lots of little services out there (like Remember the Milk, Yodlee.com, Google calendars, etc) which will help me manage my life on the move (shuttling between my apartment, my fiancée’s house, and my office).

Short, sweet, but at least it’s submitted 🙂