Development

Stuff in the FROM Clause: Interfaces

IMG_0323[1]Finally wrapping up this blog series (while standing at my mobile office waiting on my daughter to finish track practice); I’ve previously discussed four different categories of database objects that can be used as the source of various SQL statements:

Base Structures

Virtual Structures

Temporary & Semi-Temporary structures

I want to wrap up with the following category: Interfaces.  Interfaces are methods by which queries can be executed on a remote server (and I use that term to mean remote to SQL Server, not necessarily remote to the host). and the results returned for consumption by a SQL statement.   There are basically three interfaces supported by SQL Server.

OPENDATASOURCE

OPENDATASOURCE is an ad-hoc method of connecting to any OLEDB provider; below is a bit of sample code from MSDN:

SELECT *
FROM OPENDATASOURCE(‘SQLNCLI’, ‘Data Source=London\Payroll;Integrated Security=SSPI’) .AdventureWorks2008R2.HumanResources.Employee

If you’re familiar at all with server naming notations for SQL Server, you can easily identify the four-part schema above (Server.Database.Schema.Object).  Note that OPENDATASOURCE is intended for limited use; heavy connections should consider using linked servers to take advantage of security management (particularly in the case of SQL Authentication).

OPENROWSET

OPENROWSET is similar to OPENDATASOURCE; it’s an ad-hoc method of calling data from an OLEDB provider.  The primary difference is that OPENROWSET supports a querystring, as well as pulling data from a single object.  This allows you a great deal of flexibility because you execute complicated queries using remote resources, and then only return the data that you need.  OPENROWSET can also be used for BULK insertion of data

OPENQUERY

OPENQUERY is intended for use with linked servers; the purpose is to allow code to be executed on a remote server first, and then the affected result sets being used on the local server.  In order to use the OPENQUERY method, you must first define a linked server, and then use a query string to do the actual heavy lifting on the other box.

Interfaces are relatively easy to understand and use, but they need to be carefully evaluated for each situation.  For example, OPENDATASOURCE allows you to quickly retrieve data from a single object on a remote server, but that means that all of the data is retrieved to the local instance.  If the remote server is across a network, then you have to deal with network latency.  OPENROWSET and OPENQUERY provide a work-around (through the use of a querystring), but the querystring can not be parameterized, so it’s a bit clunky to work with.  However, for quick and dirty movement of data from within T-SQL, interfaces can be a valuable tool at the right time.

Stuff in the FROM clause: Temporary & Semi-Temporary structures

So way back in October, I started a blog series on Stuff in the FROM clause; I never finished it.  I’m trying to return to writing, so I thought it would be best if I completed the remaining two posts (so I can tick one more item off my to-do list).  If you really want to catch up, here’s the links to the first two posts:

Stuff in the FROM clause: Base Structures

Stuff in the FROM clause: Virtual Structures

Today’s post is touching on two different categories of database objects used in the FROM clause: temporary and semi-temporary structures.  Temporary structures are database objects which are used within the context of a single scope or session in SQL Server; after their use, they are destroyed.  Semi-temporary structures can be used across multiple scopes or sessions, but they are destroyed when the server is restarted.

TEMPORARY STRUCTURES

INLINE SUBQUERIES

An inline subquery is much like a view; it’s a method of encapsulating a SELECT statement inside another piece of SQL code for performance gains or simplification of the outer SQL statement.  Inline subqueries exist only within the context of the outer SQL statement; they are not reusable from one statement to the next.

SELECT columnlist
FROM (SELECT columnlist
           FROM table) a

Inline subqueries must have be referenced by an alias (in the above example, “a” is the alias).  They can be used like any other object in the FROM clause.

COMMON TABLE EXPRESSIONS

A common table expression (CTE) is similar to an inline subquery; it’s a method of encapsulating a SELECT statement for reuse within another SQL statement; they are not reusable from one statement to the next.

; WITH a AS (SELECT columnlist FROM table)
SELECT columnlist
FROM a

CTE’s are very powerful piece of coding logic that deserves far more attention than this brief write-up; below are a couple of examples for additional reading (including how to do recursion in a CTE):

http://datachix.com/2010/02/10/use-a-common-table-expression-and-the-row_number-function-to-eliminate-duplicate-rows-3/

http://msdn.microsoft.com/en-us/library/ms186243.aspx

TABLE VARIABLES

Table Variables are another powerful temporary structure for managing data in SQL Server; a table variable is destroyed when it falls out of scope, which is more limited than the session of a temp table.  Although there might be some slight performance gains associated with using small table variables (since they are created in memory first), the real benefit stems form their use as building blocks inside user-defined functions and table valued parameters (both to be introduced soon).  The syntax for building a table variable is as follows:

DECLARE @t TABLE (ID int)

A note of caution: table variables are expected to be small (limited rows) by the SQL Server optimizer; execution plans may be negatively impacted if the amount of data being stored in a table variable is larger than x rows (and estimates for x varies greatly from 10 rows to 1000 rows).  Note that table variables can only have one index; a clustered primary key that is created on request during the declaration.

TABLE-VALUED PARAMETERS

Table-valued parameters are related to table variables; they are primarily used to pass entire result sets from one stored procedure to the next.  To be honest, I find the syntax a bit clunky, so I’m borrowing the sample code from Books Online to explain it:


USE AdventureWorks2008R2;
GO

/* Create a table type. */
CREATE TYPE LocationTableType AS TABLE
( LocationName VARCHAR(50)
,
CostRate INT );
GO

/* Create a procedure to receive data for the table-valued parameter. */
CREATE PROCEDURE usp_InsertProductionLocation
   
@TVP LocationTableType READONLY
   
AS
    SET NOCOUNT ON
    INSERT INTO
[AdventureWorks2008R2].[Production].[Location]
          
([Name]
          
,[CostRate]
          
,[Availability]
          
,[ModifiedDate])
       
SELECT *, 0, GETDATE()
       
FROM  @TVP;
       
GO

/* Declare a variable that references the type. */
DECLARE @LocationTVP
AS LocationTableType;

/* Add data to the table variable. */
INSERT INTO @LocationTVP (LocationName, CostRate)
   
SELECT [Name], 0.00
   
FROM
   
[AdventureWorks2008R2].[Person].[StateProvince];

/* Pass the table variable data to a stored procedure. */
EXEC usp_InsertProductionLocation @LocationTVP;
GO

 

Basically, you declare a type to hold your result set, and then create a variable to reference that result set; since types are scope- and session-safe, you can then move data around as a variable (as opposed to using a global temporary table).

LOCAL TEMP TABLEs

Local Temp Tables are a temporary structure; they are stored in tempdb, and look and act very similarly to base tables.  You can create indexes on them, you can ALTER their structure after creation, and you can use them exactly like a standard base table.  The primary difference between a local temp table and a standard table is that the temp table is available only to the session in which it was created, and it’s destroyed after that session is over. 

Syntactically, the CREATE statement for a local temp table is nearly identical to the CREATE statement for a base table; the only difference is that temp tables must have a single hash mark at the beginning of their name, i.e.:

CREATE TABLE #temp (ID int)

or

SELECT 1 as ID
INTO #temp

SEMI-TEMPORARY STRUCTURES

GLOBAL TEMP TABLEs

Global Temp Tables are a variation of  the local temp table; the same basic rules apply, except that global temp tables are not automatically destroyed at the end of a session.  Thus, they can be reused across multiple scopes and sessions just like a base table; however, if the server is restarted, the global temp tables are destroyed. 

Syntactically, the CREATE statement for a local temp table is nearly identical to the CREATE statement for a base table; the only difference is that global temp tables must have two hash marks at the beginning of their name, i.e.:

CREATE TABLE ##temp (ID int)

or

SELECT 1 as ID
INTO ##temp

 

Alright; one more to go: Interfaces.

A day with Andy (and Matt..) #SQLSAT111 #SQLPASS

I first met Andy Leonard at my first PASS Summit (2008); he was sitting at the back of the room in the Chapter Leaders meeting.  At the time, there was this newfangled social technology called Twitter (perhaps you’ve heard of it), and everybody was live tweeting during the meeting (shades of things to come).  I had been following his twitter stream for some time, and I saw a tweet of his go by.  I responded with something clever like “I’m watching you tweet @AndyLeonard”.  Immediately his head popped up, and looked around the room.  After the meeting, I walked over and introduced myself. 

Talking to Andy is probably one of the easiest things to do in the world; there have been time throughout my association with the SQL community where I’ve bumped into him at some event or another, and he always seems to remember who I am, and what’s going on with me.  I’m nowhere near the prolific writer that he is, and neither am I as plugged in to what’s going on, but yet he always seems to slide into a conversation with me like an old friend.  Andy is just a good guy to know, and I’m glad he’s a friend of mine.

Now, we haven’t always agreed on things; during the PASS elections of 2010, Andy and I looked at the process from two very different perspectives.  Some of our discussions got heated online, and going back and reading those posts today makes me a little sad.  But then I have to pause and think about the face-to-face conversations Andy and I had regarding the same subject, and it’s clear that he really does have a heart for the SQL community; even though I don’t always agree with him, I know that he’s trying to encourage people to do the right thing.  Again, he’s a good guy.

Why do I bring this up? Well, it’s because I’m on the team to bring SQL Saturday 111 to Atlanta on April 14.  We’ve invited Andy Leonard to do a precon for us on April 13; $100 to spend a day with Andy (and Matt Masson).  I know that Andy brings that same outlook to his teaching that he does to his community activities; with a certain amount of encouragement, people can do better.  If you’re not already registered for his class, you should.  You’ll learn something about SSIS, and maybe a little bit about friendship.

Quick and Easy SQL Script

I haven’t been blogging enough lately, so I’m going to try something new to get past this mild case of writer’s block.  Instead of trying to do a long, in-depth post on some esoteric subject, I’m going to just put out a simple SQL script.  This one’s a bit of a hack, but you’ll see a lot of variations on it: Coalescing rows using for XML.  For example, here’s a few questions from StackOverflow:

Cursor in stored procedure

How to Comma separate multiple rows obtained from a SQL Query

Convert multiple rows into one with comma as separator

The basic challenge is to denormalize your resultset so that multiple rows get returned as a single delimited string; while there are several methods for solving this problem, the one that I like (as of SQL 2005) is to use FOR XML PATH, like so:

BEGIN TRAN

DECLARE @t TABLE ( fruit VARCHAR(10) )

INSERT  INTO @t
       
( fruit
       
)
       
SELECT  'apple'
       
UNION ALL
       
SELECT  'banana'
       
UNION ALL
       
SELECT  'tomato'

SELECT  ',' + fruit
FROM    @t
FOR     XML PATH

SELECT  ',' + fruit
FROM    @t
FOR     XML PATH('')

SELECT  STUFF(( SELECT  ',' + fruit
               
FROM    @t
             
FOR
                XML
PATH('')
              ),
1, 1, '') AS fruits

ROLLBACK

 

The multiple results sets are there to illustrate each step in the process; for example, the first output shows what happens when you specify FOR XML PATH with no indicators; SQL Server automatically assigns each row in the resultset to a node named <row>:

<row>,apple</row>
<row>,banana</row>
<row>,tomato</row>

If you specify a tag name in the PATH function, each row will be replaced with that tag name; in the above example, we want an empty tag (‘’).  The result is an XML fragment with no tags:

,apple,banana,tomato

That’s almost what we want, except we need to get rid of the leading comma.  The STUFF command basically replaces the characters in the string supplied (the first parameter; our query FOR XML PATH) starting at a certain position (1 in the example above) and moving forward the specified number of character places (1, the third parameter) with the value of the last parameter (an empty space; the fourth parameter).  The final output is a comma delimited string:

apple,banana,tomato

Straightforward; gotta return to writing.

Stuff in the FROM clause: Virtual Structures

Continuing with my SQL Server basics series, I’m focusing this post on Stuff in the FROM clause: Virtual Structures.  My last post talked about the only Base structure in a relational database, but I’m now moving on to a slightly more complicated concept.  Virtual structures are database objects that don’t hold data in a tabular structure, but can interface with data in a tabular format.  It’s probably best to start with the most common virtual structure, the VIEW:

VIEWs

According to Books Online, VIEWs create "a virtual table whose contents (columns and rows) are defined by a query. Use this statement to create a view of the data in one or more tables in the database.”  A view encapsulates a SQL SELECT statement to return data, and can be used to provide an alternate representation of data from the original tables.  For example, the following VIEW returns columns from both the Person and Employees tables in the AdventureWorks sample database:


USE AdventureWorks2008R2 ;
GO
IF OBJECT_ID ('hiredate_view', 'V') IS NOT NULL
DROP VIEW hiredate_view ;
GO
CREATE VIEW hiredate_view
AS
SELECT
p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO

Views are not just read-only; in certain cases, VIEWS can be used to insert or update data into underlying tables.  However, discussion of how this works is beyond the intent of this introduction; refer to Books Online fore more information.

Table-Valued Functions (TVF)

Views are not the only virtual structures available in SQL Server; a subset of a user-defined function known as table-valued functions (TVFs) also provide interfaces into the underlying base structures.   Books Online defines table-valued functions as “a Transact-SQL or common language runtime (CLR) routine that accepts parameters, performs an action, such as a complex calculation, and returns the result of that action as… a table.”  There are three types of TVF’s (arranged below in order of increasing complexity): simple, complex, and Common Language Runtime.

Simple TVFs

A simple (or inline) table-valued function is probably the easiest virtual structure to grasp beyond a view; in fact, it’s easily compared to a view with parameters.  A simple TVF encapsulates a SQL statement (much like a view), but it allows for the use of parameters to filter the results returned.  For example, the following simple TVF pulls data from multiple tables for a specific StoreID:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'Sales.ufn_SalesByStore', N'IF') IS NOT NULL
   
DROP FUNCTION Sales.ufn_SalesByStore;
GO
CREATE FUNCTION Sales.ufn_SalesByStore (@storeid INT)
RETURNS TABLE
AS
RETURN
(
   
SELECT P.ProductID, P.Name, SUM(SD.LineTotal) AS 'Total'
   
FROM Production.Product AS P
   
JOIN Sales.SalesOrderDetail AS SD ON SD.ProductID = P.ProductID
   
JOIN Sales.SalesOrderHeader AS SH ON SH.SalesOrderID = SD.SalesOrderID
   
JOIN Sales.Customer AS C ON SH.CustomerID = C.CustomerID
   
WHERE C.StoreID = @storeid
   
GROUP BY P.ProductID, P.Name
);
GO

--use the new TVF in a query
SELECT * FROM Sales.ufn_SalesByStore (602);

Complex TVFs

Complex (or multi-statement) TVF’s are much like their simple counterparts, but allow for greater flexibility in T-SQL coding by adopting a procedural approach to returning tabular data.  A complex TVF requires that a table variable (to be covered later) be defined within the function, and then populated by a series of SQL statements.  The below example from Books Online shows how:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'dbo.ufn_FindReports', N'TF') IS NOT NULL
   
DROP FUNCTION dbo.ufn_FindReports;
GO
CREATE FUNCTION dbo.ufn_FindReports (@InEmpID INTEGER)
RETURNS @retFindReports TABLE
(
   
EmployeeID INT PRIMARY KEY NOT NULL,
   
FirstName NVARCHAR(255) NOT NULL,
   
LastName NVARCHAR(255) NOT NULL,
   
JobTitle NVARCHAR(50) NOT NULL,
   
RecursionLevel INT NOT NULL
)
--Returns a result set that lists all the employees who report to the
--specific employee directly or indirectly.*/
AS
BEGIN
WITH
EMP_cte(EmployeeID, OrganizationNode, FirstName, LastName, JobTitle, RecursionLevel) -- CTE name and columns
   
AS (
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, 0 -- Get the initial list of Employees for Manager n
       
FROM HumanResources.Employee e
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
WHERE e.BusinessEntityID = @InEmpID
       
UNION ALL
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, RecursionLevel + 1 -- Join recursive member to anchor
       
FROM HumanResources.Employee e
           
INNER JOIN EMP_cte
           
ON e.OrganizationNode.GetAncestor(1) = EMP_cte.OrganizationNode
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
)
-- copy the required columns to the result of the function
  
INSERT @retFindReports
  
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
  
FROM EMP_cte
  
RETURN
END
;
GO
-- Example invocation
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
FROM dbo.ufn_FindReports(1);

GO

Note that code in a multi-statement TVF can involve several steps before returning the final result-set; caution should be used when using complex TVF’s because the optimizer can not determine the cost of these statements during plan estimation.  In short, the more complex the TVF, the more likely that the plan used by SQL Server will be inaccurate.

CLR TVF’s

With the release of SQL Server 2005, Microsoft embedded a limited subset of the functionality from the .NET framework into SQL Server itself, calling it the Common Language Runtime (CLR).  The CLR allows for .NET developers to write and deploy database objects in a managed language (like VB.NET or C#), and allows for very flexible logic for handling data.  A CLR Function is invoked in a fashion similar to other table-valued functions, but the deployment is much more involved (see Books Online for an example using the following steps):

  1. Code is written and assembled by a compiler (like Visual Studio).
  2. The assembled code must be copied on to the SQL Server.
  3. An ASSEMBLY reference is created in T-SQL, which points to the deployed .dll file, and
  4. a FUNCTION is created which references the ASSEMBLY.

Next up…

A two-fer: Temporary and Semi-Temporary structures!

Stuff in the FROM clause: Base Structures

A few months ago, our user group started thinking about how we could get new members to start coming to our SQL Server user group meetings; we defined new user as one of two categories:

1: new to databases, or
2. new to SQL Server

In both cases, we realized that we needed to start having new content in our monthly sessions that was targeted specifically for them, so we decided to start adding a brief 10-minute fundamental session to the beginning of each meeting. I chose to do the first one, and selected a topic called Stuff in the FROM clause. I quickly realized that this could be a nice little blog series, so I’m attempting to expand a 10-minute presentation into a four-part series. Below is the mind map for the presentation:


Stuff in the FROM clause: Base Structures
Base Structures as a category name is a bit misleading, because there’s really only one base structure in any relational database: the table.  Tables are the core of any relational database; they provide storage for data in rows, arranged by columns.  There’s lots of background information on table design, so I won’t spend too much time on it (even that’s too basic for this discussion), but I will highlight some key features about tables:

  • From an entity design perspective, a table should represent a set of entities.
    • Rows are the un-ordered members of that set
    • Columns are the attributes of the entity.
  • Table attributes have basic types.
  • Tables should have a primary key constraint on one of the columns, which is used to uniquely identify a member of that set.
  • Microsoft SQL Server supports a syntax for computed columns, where data for an attribute can be calculated from other attributes.

For Microsoft SQL Server, the CREATE TABLE syntax may be found at: http://msdn.microsoft.com/en-us/library/ms174979.aspx  The simplest table creation statement looks like the following:

CREATE TABLE table_name (column_name column_type)

Next: Virtual Structures

#TSQL2sDay Roundup

So the launch was early, and the write-up is delayed.  Time has no meaning….  Truthfully, I’m sorry I didn’t manage to squeeze this summary post in a bit sooner; I have the typical DBA excuse: too much to do, and too little time.

One nice thing about this topic is that it seemed to resonate with several bloggers who had either strayed away from T-SQL Tuesday or had never participated; hopefully, the bug to write will stick (and I’m pointing the finger at myself for this one as well).

Beginning at the Beginning:

Pinal Dave in his gentle teaching style covers several different questions and answers about JOIN techniques. Great way to review the basics and get conversations started with beginners.

KenJ is a bit esoteric, but a great reminder of what resources there are to learn more about SQL Server.

 

JOINs In The Real World:

Tracy McKibben demonstrated a very simple tuning tip based on real production experience: the Nested JOIN

 

Rich Brown reminds us that good query construction often involves understanding how the optimizer handles JOINs under the covers.

Andy Galbraith (Yet Another SQL Andy!) reminds us to Respect Your JOINs. It’s funny how often we assume that database queries are logically well-written when we start looking for performance problems.

 

Bob Pusateri points out that WHERE clauses influence JOIN behaviors. You have to look at the entire SQL statement to make sure you’re getting what you’re supposed to be getting.

Matt Nelson reminds us that unnecessary JOINs are bad, and sometimes you need to dig a little deeper when performance tuning.

 

A Little Deeper:

Rob Farley toasted my brain a little on this post about Joins without JOINs.  It’s a well-written explanation of what an Ant-Semi JOIN is.

 

Brad Schulz finished the cooking job with a take on Semi-JOINs, among other things.  Is Brad really Rob in reverse?

Muthukkumaran Kaliyamoorthy covered the internal join mechanisms, and why the optimizer chooses merge, hash, or nested loop joins.

Robert Matthew Cook has a couple of great metaphors for explaining Merge, Hash, and Nested Loop JOINs.

 

Wayne Sheffield pointed out some interesting syntactical ways of writing JOINs. I’m not sure if I’m comfortable with some of them, but they may be of some use in certain edge scenarios. Or, if you just want to mess with the guy who reviews your code.

Richard Douglas offers up a short-but-sweet explanation of the relationship of Key Lookups to JOINs.

 

The Future is a Ticking Time Bomb…

Thomas Rushton points out that Denali may finally force us to clean up our code.

 

 

My Portugese is Not Up To Par…

 
Finally, Ricardo Leka wrote a post that I had to use Google Translate to interpret.  I think I lost something in  translation, because I’m not sure how the example relates to JOIN’s, but thought I would include it anyway.  http://leka.com.br/2011/10/04/t-sql-tuesday-23-joins/

#TSQL2sDay T-SQL Tuesday 23–Early edition

T-SQL Tuesday Logo

 

Time once again for another edition of T-SQL Tuesday!  What’s that, you say?  Early?  Why yes, indeed.  Due to the potential schedule of conflict of PASS’s Summit 2011 occurring on the second Tuesday of October (the normal date for T-SQL Tuesday), I’ve gotten special permission to bump it up a week. 

To participate in this month’s T-SQL Tuesday, your post must go live between midnight UTC on Tuesday, October 4, 2011 and midnight UTC on Wednesday, October 5.

 

Your post needs to link back to this blog, and if you use the image, anchor it to this post.  Make sure you leave a comment or a trackback to this post (so I can find it later, when I write up the summary).

Topic d’jour?  JOINS (I’m in a fundamentals mood lately).  Note that I also like creative and esoteric posts, so if you can find a way to apply SQL as a metaphorical language for community activity, I’ll read it and enjoy it.  If you just want to tell me in a simple fashion the difference between a HASH and MERGE join, I’m cool with that, too.

Do me a favor, though, and please spread the word about the early date.  Looking forward to reading your submissions.

#sqlsat89 XML 201 Slide decks uploaded

Just a quick note to say I uploaded my slides from last weekend’s SQL Saturday 89; the event was a blast, and I hope to have my wrap up posted soon.   Thanks to all of those who attended; I felt like it was a great class, and I’m hoping to continue to improve in the future.

SQL Server XML 201

The xml datatype in SQL Server expands the potential of the relational platform to store increasingly complex forms of data, but without the use of the appropriate query language, much of that potential can remain unused. This session will cover the basics of SQL Server XQuery and FLWOR; the assumption is that attendees will have some basic exposure to XML (including the use of XML in SQL Server). Learn how to “run queries within a query”, and how to extract XML data into a tabular format. Coverage will include the five basic XML methods (.exist(), .value(), .query(), .nodes(), and .modify()) and FLWOR (for, let, where, order by, and return).