October 2011

Stuff in the FROM clause: Virtual Structures

Continuing with my SQL Server basics series, I’m focusing this post on Stuff in the FROM clause: Virtual Structures.  My last post talked about the only Base structure in a relational database, but I’m now moving on to a slightly more complicated concept.  Virtual structures are database objects that don’t hold data in a tabular structure, but can interface with data in a tabular format.  It’s probably best to start with the most common virtual structure, the VIEW:

VIEWs

According to Books Online, VIEWs create "a virtual table whose contents (columns and rows) are defined by a query. Use this statement to create a view of the data in one or more tables in the database.”  A view encapsulates a SQL SELECT statement to return data, and can be used to provide an alternate representation of data from the original tables.  For example, the following VIEW returns columns from both the Person and Employees tables in the AdventureWorks sample database:


USE AdventureWorks2008R2 ;
GO
IF OBJECT_ID ('hiredate_view', 'V') IS NOT NULL
DROP VIEW hiredate_view ;
GO
CREATE VIEW hiredate_view
AS
SELECT
p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO

Views are not just read-only; in certain cases, VIEWS can be used to insert or update data into underlying tables.  However, discussion of how this works is beyond the intent of this introduction; refer to Books Online fore more information.

Table-Valued Functions (TVF)

Views are not the only virtual structures available in SQL Server; a subset of a user-defined function known as table-valued functions (TVFs) also provide interfaces into the underlying base structures.   Books Online defines table-valued functions as “a Transact-SQL or common language runtime (CLR) routine that accepts parameters, performs an action, such as a complex calculation, and returns the result of that action as… a table.”  There are three types of TVF’s (arranged below in order of increasing complexity): simple, complex, and Common Language Runtime.

Simple TVFs

A simple (or inline) table-valued function is probably the easiest virtual structure to grasp beyond a view; in fact, it’s easily compared to a view with parameters.  A simple TVF encapsulates a SQL statement (much like a view), but it allows for the use of parameters to filter the results returned.  For example, the following simple TVF pulls data from multiple tables for a specific StoreID:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'Sales.ufn_SalesByStore', N'IF') IS NOT NULL
   
DROP FUNCTION Sales.ufn_SalesByStore;
GO
CREATE FUNCTION Sales.ufn_SalesByStore (@storeid INT)
RETURNS TABLE
AS
RETURN
(
   
SELECT P.ProductID, P.Name, SUM(SD.LineTotal) AS 'Total'
   
FROM Production.Product AS P
   
JOIN Sales.SalesOrderDetail AS SD ON SD.ProductID = P.ProductID
   
JOIN Sales.SalesOrderHeader AS SH ON SH.SalesOrderID = SD.SalesOrderID
   
JOIN Sales.Customer AS C ON SH.CustomerID = C.CustomerID
   
WHERE C.StoreID = @storeid
   
GROUP BY P.ProductID, P.Name
);
GO

--use the new TVF in a query
SELECT * FROM Sales.ufn_SalesByStore (602);

Complex TVFs

Complex (or multi-statement) TVF’s are much like their simple counterparts, but allow for greater flexibility in T-SQL coding by adopting a procedural approach to returning tabular data.  A complex TVF requires that a table variable (to be covered later) be defined within the function, and then populated by a series of SQL statements.  The below example from Books Online shows how:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'dbo.ufn_FindReports', N'TF') IS NOT NULL
   
DROP FUNCTION dbo.ufn_FindReports;
GO
CREATE FUNCTION dbo.ufn_FindReports (@InEmpID INTEGER)
RETURNS @retFindReports TABLE
(
   
EmployeeID INT PRIMARY KEY NOT NULL,
   
FirstName NVARCHAR(255) NOT NULL,
   
LastName NVARCHAR(255) NOT NULL,
   
JobTitle NVARCHAR(50) NOT NULL,
   
RecursionLevel INT NOT NULL
)
--Returns a result set that lists all the employees who report to the
--specific employee directly or indirectly.*/
AS
BEGIN
WITH
EMP_cte(EmployeeID, OrganizationNode, FirstName, LastName, JobTitle, RecursionLevel) -- CTE name and columns
   
AS (
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, 0 -- Get the initial list of Employees for Manager n
       
FROM HumanResources.Employee e
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
WHERE e.BusinessEntityID = @InEmpID
       
UNION ALL
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, RecursionLevel + 1 -- Join recursive member to anchor
       
FROM HumanResources.Employee e
           
INNER JOIN EMP_cte
           
ON e.OrganizationNode.GetAncestor(1) = EMP_cte.OrganizationNode
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
)
-- copy the required columns to the result of the function
  
INSERT @retFindReports
  
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
  
FROM EMP_cte
  
RETURN
END
;
GO
-- Example invocation
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
FROM dbo.ufn_FindReports(1);

GO

Note that code in a multi-statement TVF can involve several steps before returning the final result-set; caution should be used when using complex TVF’s because the optimizer can not determine the cost of these statements during plan estimation.  In short, the more complex the TVF, the more likely that the plan used by SQL Server will be inaccurate.

CLR TVF’s

With the release of SQL Server 2005, Microsoft embedded a limited subset of the functionality from the .NET framework into SQL Server itself, calling it the Common Language Runtime (CLR).  The CLR allows for .NET developers to write and deploy database objects in a managed language (like VB.NET or C#), and allows for very flexible logic for handling data.  A CLR Function is invoked in a fashion similar to other table-valued functions, but the deployment is much more involved (see Books Online for an example using the following steps):

  1. Code is written and assembled by a compiler (like Visual Studio).
  2. The assembled code must be copied on to the SQL Server.
  3. An ASSEMBLY reference is created in T-SQL, which points to the deployed .dll file, and
  4. a FUNCTION is created which references the ASSEMBLY.

Next up…

A two-fer: Temporary and Semi-Temporary structures!

Stuff in the FROM clause: Base Structures

A few months ago, our user group started thinking about how we could get new members to start coming to our SQL Server user group meetings; we defined new user as one of two categories:

1: new to databases, or
2. new to SQL Server

In both cases, we realized that we needed to start having new content in our monthly sessions that was targeted specifically for them, so we decided to start adding a brief 10-minute fundamental session to the beginning of each meeting. I chose to do the first one, and selected a topic called Stuff in the FROM clause. I quickly realized that this could be a nice little blog series, so I’m attempting to expand a 10-minute presentation into a four-part series. Below is the mind map for the presentation:


Stuff in the FROM clause: Base Structures
Base Structures as a category name is a bit misleading, because there’s really only one base structure in any relational database: the table.  Tables are the core of any relational database; they provide storage for data in rows, arranged by columns.  There’s lots of background information on table design, so I won’t spend too much time on it (even that’s too basic for this discussion), but I will highlight some key features about tables:

  • From an entity design perspective, a table should represent a set of entities.
    • Rows are the un-ordered members of that set
    • Columns are the attributes of the entity.
  • Table attributes have basic types.
  • Tables should have a primary key constraint on one of the columns, which is used to uniquely identify a member of that set.
  • Microsoft SQL Server supports a syntax for computed columns, where data for an attribute can be calculated from other attributes.

For Microsoft SQL Server, the CREATE TABLE syntax may be found at: http://msdn.microsoft.com/en-us/library/ms174979.aspx  The simplest table creation statement looks like the following:

CREATE TABLE table_name (column_name column_type)

Next: Virtual Structures

#TSQL2sDay Roundup

So the launch was early, and the write-up is delayed.  Time has no meaning….  Truthfully, I’m sorry I didn’t manage to squeeze this summary post in a bit sooner; I have the typical DBA excuse: too much to do, and too little time.

One nice thing about this topic is that it seemed to resonate with several bloggers who had either strayed away from T-SQL Tuesday or had never participated; hopefully, the bug to write will stick (and I’m pointing the finger at myself for this one as well).

Beginning at the Beginning:

Pinal Dave in his gentle teaching style covers several different questions and answers about JOIN techniques. Great way to review the basics and get conversations started with beginners.

KenJ is a bit esoteric, but a great reminder of what resources there are to learn more about SQL Server.

 

JOINs In The Real World:

Tracy McKibben demonstrated a very simple tuning tip based on real production experience: the Nested JOIN

 

Rich Brown reminds us that good query construction often involves understanding how the optimizer handles JOINs under the covers.

Andy Galbraith (Yet Another SQL Andy!) reminds us to Respect Your JOINs. It’s funny how often we assume that database queries are logically well-written when we start looking for performance problems.

 

Bob Pusateri points out that WHERE clauses influence JOIN behaviors. You have to look at the entire SQL statement to make sure you’re getting what you’re supposed to be getting.

Matt Nelson reminds us that unnecessary JOINs are bad, and sometimes you need to dig a little deeper when performance tuning.

 

A Little Deeper:

Rob Farley toasted my brain a little on this post about Joins without JOINs.  It’s a well-written explanation of what an Ant-Semi JOIN is.

 

Brad Schulz finished the cooking job with a take on Semi-JOINs, among other things.  Is Brad really Rob in reverse?

Muthukkumaran Kaliyamoorthy covered the internal join mechanisms, and why the optimizer chooses merge, hash, or nested loop joins.

Robert Matthew Cook has a couple of great metaphors for explaining Merge, Hash, and Nested Loop JOINs.

 

Wayne Sheffield pointed out some interesting syntactical ways of writing JOINs. I’m not sure if I’m comfortable with some of them, but they may be of some use in certain edge scenarios. Or, if you just want to mess with the guy who reviews your code.

Richard Douglas offers up a short-but-sweet explanation of the relationship of Key Lookups to JOINs.

 

The Future is a Ticking Time Bomb…

Thomas Rushton points out that Denali may finally force us to clean up our code.

 

 

My Portugese is Not Up To Par…

 
Finally, Ricardo Leka wrote a post that I had to use Google Translate to interpret.  I think I lost something in  translation, because I’m not sure how the example relates to JOIN’s, but thought I would include it anyway.  http://leka.com.br/2011/10/04/t-sql-tuesday-23-joins/