Something New

Using an #Azure Logic App to move @AzureDevOps Work Items

A big part of my job these days is looking for opportunities to improve workflow. Automation of software is great, but identifying areas to speed up human processes can be incredibly beneficial to value delivery to customers. Here’s the situation I recently figured out how to do:

  1. My SRE team uses a different Azure DevOps project than our development team. This protects the “separation of duties” concept that auditors love, while still letting us transfer items back and forth.
  2. The two projects are in the same organization.
  3. The two projects use different templates, with different required fields.
  4. Our workflow process requires two phases of triage for bugs in the wild: a technical phase (provided by my team), and a business prioritization (provided by our Business Analyst).
  5. Moving a card between projects is simple, but there were several manual changes that had to be made:
    1. Assigning to a Business Analyst (BA)
    2. Changing the status to Proposed from Active
    3. Changing the Iteration and Area
    4. Moving the card.

To automate this, I decided to use Azure Logic Apps. There are probably other ways to approach this problem (like Powershell), but one of the benefits of the Logic Apps model is that it uses the same security settings as our Azure DevOps installation. It just simplifies some of the steps I must go through. The simplest solution I could implement was to move the work item when changing the Assigned To field to a Business Analyst. This allows us to work the card, add comments, notes, but when the time comes to hand over to our development team for prioritization, it’s a simple change to a key attribute and save.

Here’s the Logic Apps workflow overview:

The initial trigger is a timer; every 3 minutes, the app runs and looks for work items that exist in a custom AzureDevOps query. This functionality is built into the Logic Apps designer as an Action for the Azure DevOps connector. The query exists in our SRE project, and simply identifies WorkItems that have been assigned to our Business Analyst Group. Note that the BA group is a team in the SRE project.

SELECT
    [System.Id]
FROM workitems
WHERE
    [System.TeamProject] = @project
    AND [System.WorkItemType] <> ''
    AND [System.State] <> ''
    AND [System.AssignedTo] IN GROUP '[SRE]\BA <id:56e7c8c7-b8ef-4db9-ad9c-055227a30a26>'

Once this query returns a list of work items to the LogicApp, I then use a For Each step in the designer, and embed a Rest API action.

The Rest API action offers maximum flexibility to update values for a work item; there is also an Update action, but the options were limited. There was once gotcha; you have to add the content-type, or it throws an error: application/json-patch+json

The code is below; it’s JSON, and the syntax is that you specify an operation (“add” for both updates and creates), a path to the field you want to change (path), and the value you want to set it to. In this case, I’m changing the Project, The Area Path, the Iteration Path, the State of the Work Item, and adding a comment to the Symptom field.

[
  {
    "op": "add",
    "path": "/fields/System.TeamProject",
    "value": "Dev"
  },
  {
    "op": "add",
    "path": "/fields/System.AreaPath",
    "value": "Dev"
  },
  {
    "op": "add",
    "path": "/fields/System.IterationPath",
    "value": "Dev"
  },
  {
    "op": "add",
    "path": "/fields/System.State",
    "value": "Proposed"
  },
{ 
    "op": "add",
    "path": "/fields/Symptom",
    "value": "Triaged by SRE Team.  See Repro Steps tab for information."
  }
]

Sending monthly scheduled email from an Azure DevOps query

One of my tasks over the last few years is to keep management and senior management aware of software deployments for our hosted services. This started out as a CAB (Change Advisory Board), but all of our deployments quickly became standard, and it basically became a monthly review of what had happened (which is not what a CAB meeting is supposed to be). I figured a meeting wasn’t necessary, so I was looking for a way to show what we’ve done in an easy to digest method.

The problem is that Azure DevOps doesn’t offer a scheduled email functionality out of the box. There is a Marketplace scheduler that you can use as part of a build, but unfortunately, t didn’t work in our environment for some reason. I stumbled on the concept of Power Automate, but Azure DevOps is a premium connector. However, we do have an Azure subscription, so Logic Apps it is.

Below is the flow that I came up with. At first it seemed relatively straightforward to pull together, but the stumbling block was the fact that the HTML tables are VERY rudimentary. No styling, no hyperlinks, nothing. That’s the reason for the additional variable steps.

The initialize variable state is where I define a string variable to handle the output of the Create HTML Table step. It’s empty, until I set it later (in the Set Variable) step. The Create HTML table was mostly easy, except that I wanted a defined border, and a hyperlink that would allow recipients to click on the link and get to the specific work item.

[ba]https://your_org_here/your_project_here/_queries/edit/{{ID}}[ea]{{ID}}[ca]

The set variable then takes the output of the Create HTML table step, and replaces the placeholders with appropriate HTML tags. In this case, I added a table border, and made a hyperlink out of the ID column.

replace(replace(replace(replace(body('Create_HTML_table'), '<table>', '<table border=10>'), '[ba]', '<a href="'), '[ea]', '">'),'[ca]','</a>')

The email step then uses this variable in the body, and the final product looks something like this:

Stuff in the FROM clause: Virtual Structures

Continuing with my SQL Server basics series, I’m focusing this post on Stuff in the FROM clause: Virtual Structures.  My last post talked about the only Base structure in a relational database, but I’m now moving on to a slightly more complicated concept.  Virtual structures are database objects that don’t hold data in a tabular structure, but can interface with data in a tabular format.  It’s probably best to start with the most common virtual structure, the VIEW:

VIEWs

According to Books Online, VIEWs create "a virtual table whose contents (columns and rows) are defined by a query. Use this statement to create a view of the data in one or more tables in the database.”  A view encapsulates a SQL SELECT statement to return data, and can be used to provide an alternate representation of data from the original tables.  For example, the following VIEW returns columns from both the Person and Employees tables in the AdventureWorks sample database:


USE AdventureWorks2008R2 ;
GO
IF OBJECT_ID ('hiredate_view', 'V') IS NOT NULL
DROP VIEW hiredate_view ;
GO
CREATE VIEW hiredate_view
AS
SELECT
p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO

Views are not just read-only; in certain cases, VIEWS can be used to insert or update data into underlying tables.  However, discussion of how this works is beyond the intent of this introduction; refer to Books Online fore more information.

Table-Valued Functions (TVF)

Views are not the only virtual structures available in SQL Server; a subset of a user-defined function known as table-valued functions (TVFs) also provide interfaces into the underlying base structures.   Books Online defines table-valued functions as “a Transact-SQL or common language runtime (CLR) routine that accepts parameters, performs an action, such as a complex calculation, and returns the result of that action as… a table.”  There are three types of TVF’s (arranged below in order of increasing complexity): simple, complex, and Common Language Runtime.

Simple TVFs

A simple (or inline) table-valued function is probably the easiest virtual structure to grasp beyond a view; in fact, it’s easily compared to a view with parameters.  A simple TVF encapsulates a SQL statement (much like a view), but it allows for the use of parameters to filter the results returned.  For example, the following simple TVF pulls data from multiple tables for a specific StoreID:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'Sales.ufn_SalesByStore', N'IF') IS NOT NULL
   
DROP FUNCTION Sales.ufn_SalesByStore;
GO
CREATE FUNCTION Sales.ufn_SalesByStore (@storeid INT)
RETURNS TABLE
AS
RETURN
(
   
SELECT P.ProductID, P.Name, SUM(SD.LineTotal) AS 'Total'
   
FROM Production.Product AS P
   
JOIN Sales.SalesOrderDetail AS SD ON SD.ProductID = P.ProductID
   
JOIN Sales.SalesOrderHeader AS SH ON SH.SalesOrderID = SD.SalesOrderID
   
JOIN Sales.Customer AS C ON SH.CustomerID = C.CustomerID
   
WHERE C.StoreID = @storeid
   
GROUP BY P.ProductID, P.Name
);
GO

--use the new TVF in a query
SELECT * FROM Sales.ufn_SalesByStore (602);

Complex TVFs

Complex (or multi-statement) TVF’s are much like their simple counterparts, but allow for greater flexibility in T-SQL coding by adopting a procedural approach to returning tabular data.  A complex TVF requires that a table variable (to be covered later) be defined within the function, and then populated by a series of SQL statements.  The below example from Books Online shows how:

USE AdventureWorks2008R2;
GO
IF OBJECT_ID (N'dbo.ufn_FindReports', N'TF') IS NOT NULL
   
DROP FUNCTION dbo.ufn_FindReports;
GO
CREATE FUNCTION dbo.ufn_FindReports (@InEmpID INTEGER)
RETURNS @retFindReports TABLE
(
   
EmployeeID INT PRIMARY KEY NOT NULL,
   
FirstName NVARCHAR(255) NOT NULL,
   
LastName NVARCHAR(255) NOT NULL,
   
JobTitle NVARCHAR(50) NOT NULL,
   
RecursionLevel INT NOT NULL
)
--Returns a result set that lists all the employees who report to the
--specific employee directly or indirectly.*/
AS
BEGIN
WITH
EMP_cte(EmployeeID, OrganizationNode, FirstName, LastName, JobTitle, RecursionLevel) -- CTE name and columns
   
AS (
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, 0 -- Get the initial list of Employees for Manager n
       
FROM HumanResources.Employee e
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
WHERE e.BusinessEntityID = @InEmpID
       
UNION ALL
       
SELECT e.BusinessEntityID, e.OrganizationNode, p.FirstName, p.LastName, e.JobTitle, RecursionLevel + 1 -- Join recursive member to anchor
       
FROM HumanResources.Employee e
           
INNER JOIN EMP_cte
           
ON e.OrganizationNode.GetAncestor(1) = EMP_cte.OrganizationNode
          
INNER JOIN Person.Person p
          
ON p.BusinessEntityID = e.BusinessEntityID
       
)
-- copy the required columns to the result of the function
  
INSERT @retFindReports
  
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
  
FROM EMP_cte
  
RETURN
END
;
GO
-- Example invocation
SELECT EmployeeID, FirstName, LastName, JobTitle, RecursionLevel
FROM dbo.ufn_FindReports(1);

GO

Note that code in a multi-statement TVF can involve several steps before returning the final result-set; caution should be used when using complex TVF’s because the optimizer can not determine the cost of these statements during plan estimation.  In short, the more complex the TVF, the more likely that the plan used by SQL Server will be inaccurate.

CLR TVF’s

With the release of SQL Server 2005, Microsoft embedded a limited subset of the functionality from the .NET framework into SQL Server itself, calling it the Common Language Runtime (CLR).  The CLR allows for .NET developers to write and deploy database objects in a managed language (like VB.NET or C#), and allows for very flexible logic for handling data.  A CLR Function is invoked in a fashion similar to other table-valued functions, but the deployment is much more involved (see Books Online for an example using the following steps):

  1. Code is written and assembled by a compiler (like Visual Studio).
  2. The assembled code must be copied on to the SQL Server.
  3. An ASSEMBLY reference is created in T-SQL, which points to the deployed .dll file, and
  4. a FUNCTION is created which references the ASSEMBLY.

Next up…

A two-fer: Temporary and Semi-Temporary structures!

Three Myths about Agile Development

I recently attended Microsoft Tech Ed in Atlanta, and while there wasn’t much new being announced about SQL Server (I had heard about many of the features for Denali at PASS Summit 2010), I did find myself drawn to several sessions regarding Agile principles and development.  My shop has been using the Scrum method for about 2 years now, and it was nice to have a refresher.  I also participated in (and overheard) a lot of conversations about Agile methods, and it made me realize two very important things:

  1. Many people who claimed to be using Agile methods had never read the Agile Manifesto, and
  2. There are several misconceptions in play regarding Agile development.

The point of this blog post is dual-fold; first I want to encourage you to read the Agile Manifesto.  If you’ve read it before, read it again.  And then, read it a third time (it’s short, so easy to read).  Done that?  Good, because here’s the crux of my argument:

If you want to do Agile development, you must adhere to the principles of the Agile Manifesto.

It’s simple, really; you shouldn’t claim to be a SQL Server developer if you’ve never written a T-SQL statement.  You can’t call yourself a cubist if you haven’t studied the works of Picasso.  You shouldn’t claim to be doing Agile development if you don’t adhere to the principles of the Agile Manifesto.

And, that leads us to the second part of this post; I believe that lots of us think we’re adhering to the methods and principles of Agile development, but there are at least three basic myths about Agile development which keep development teams from being as agile as they can be; here’s my take on them:

Myth 1: Daily meetings with business people are an impediment to rapid development.

I actually got into a fervent discussion with gentleman at TechEd about this subject during a Birds of the Feather Session on Scrum.  he claimed to be a Scrum Master for 6 teams (including several overseas), and that he barred business people from entering into the daily standup in order to keep them from dragging the meeting astray.  I think that’s wrong, and here’s why (from the Agile Manifesto principles):

Business people and developers must work together daily throughout the project.

While it’s true that the daily standup in Scrum need not be the daily interaction, it makes sense that business people LISTEN (but not INTERACT) in that meeting in order to understand on what issues the development team is working, and how those issues interplay with each other (Note: scrum calls this the chicken and the pig; business people need to know what’s going on across the development team, but shouldn’t be involved at this point.  However, the daily standup can spur additional conversations).   If your development team chooses to have a daily standup without business people, your team members MUST interact with business people in order to handle changing requirements; they must also communicate at that time what the priorities of the development organization are, and why this particular project is not progressing because some other project takes priority.

Agile development depends on the interaction between developers and business people; to isolate half of the team from the other half of the team will cause disruption to the process.  That leads us to our second myth:

Myth 2: Your development team can be agile in a vacuum.

I call this the Agile-Waterfall mindset; your business organization is separate from your development team.  Your developers are practicing some form of Agile development, but the organization is used to handing off a set of requirements to the developers, and then having them return a product at periodic intervals.  Think of this as the complement to Myth 1; Business people aren’t deemed to be an impediment, but the organization hasn’t endorsed agile development throughout.  Daily meetings with developers aren’t deemed to be a priority by the business people; the organization has developed a culture of handing off responsibilities, and expecting them to be fulfilled without daily guidance.

By definition, you can not have an agile team without input from both developers and business people.   If you want to respond to changing requirements (as frustrating as that can be to developers), you must have input from business people as soon as those requirements change.  Again, you need to handle prioritization, as changing requirements do not necessarily merit immediate priority.

Myth 3: Self-organizing teams self-manage efficiently.

A couple of great principles from the Agile Manifesto deal with communication:

The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

The best architectures, requirements, and designs emerge from self-organizing teams.

While I believe in the wisdom of these two principles, I don’t want to de-emphasize the need for good, basic software design principles.  Most enterprise development consists of intertwined projects and resources; in order to minimize maintenance issues, adherence to consistent programming standards is a must.  Developers have different naming standards, procedural methodology, and architectural perspectives; a good team has a playbook that ALL members of the development team (regardless of what project team they serve) follow. If you have one database developer that makes heavy use of schemas, and another one that doesn’t, maintaining each other’s code requires some additional effort on their parts.  Furthermore, when teams are self-formed of roughly equally-experienced developers, resolution of architectural decisions can be difficult.

Development teams need an enforcer; a good manager goes a long way toward resolving interpersonal conflicts before they get started.  Just because teams communicate well (and good communication includes conflict), it does not necessarily mean that those same teams will develop quality code in an efficient manner.  Good teams need good direction.

Summing Up.

If you’ve made it this far, I hope I’ve given you some food for thought, as well as encouraged you to go back and revisit the Agile Manifesto, as well as your own organizational processes.  Let me sum up with a final thought from the Agile Manifesto:

At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Something new for 2011: bunches of little stuff

IMG_0015 OK, it’s the last day of March, and I’m phoning in this blog post.  It’s not that I haven’t spent a lot of time learning something new this month, it’s that I’ve got a pile of stuff to get through before leaving on vacation next week, and I really just don’t have more than 15 minutes to write up some of the things I’ve been working on.  So, here’s a short list with some links, and a commitment to do better next month.

Data Modeling

I spent a lot of time working on revisiting Data Modeling and Use Case Diagramming.  Although I’m still not a fan of UML, I have come to appreciate the benefit of simplifying the language we use to describe things to do.  I’ve been working a lot with someone who is very detail-oriented; as a conceptual person, it’s a challenge at times to bring those two paradigms together.

XQuery

I recently presented at SQL Saturday 70 on FLWOR, so I had to really brush up on my skills using XML.  I wanted to be able to answer all kinds of questions, so I did a deeper dive into the functions, and really focused on how to do some basic queries with XQuery in SQL Server.  I’m working on a blog series for this, but just haven’t found the time to put my fingers on the keyboard.

SSIS/StreamInsight

I’ve said it before; I’m probably the only ETL guy that uses SQL Server, but not SSIS.  A recent project at work caused me to have to build a small prototype using SSIS, and I learned quite a bit (and some newbie “gotchas”; again, I smell a blog post in the works).  Julie Smith, Rob Volk, and Andy Leonard all pointed me in the general direction of an interesting new product from Microsoft: StreamInsight.  I built the demos, and played with it, but I’ve got a long way to go before I can actually do something with it.

 

Anyway, sorry for the lack of insight; I need to dedicate more time to actually writing stuff down when I learn it, but perhaps that’s a lesson in and of itself.    

I’m doing it wrong…

me_doing_it_wrong At some point in your career, you have to realize that you’re going about it in the wrong way.   It may hit you like a ton of bricks, or it might be a subtle realization, but either way you realize that things aren’t working out for you like you expected.  I’ve had a couple of those moments throughout my career; one was shortly after I flunked out of graduate school.  Nothing says “you’re doing it wrong” than sitting outside of your advisor’s office for a meeting that never happens.

I’ve had other epiphanies in my career, such as the time when my ethical standards were a little higher than my employers; when I got sent home by a GM after a discussion over my responsibilities, I started polishing my resume.   I was doing it wrong by working for the wrong company.

Recently, I’ve begun to realize that I’m not living up to my full potential in my career.  I’ve spent the last several years building an enterprise solution for my company that has become the core product of that company.  It’s a good product, and I’m proud of it.  However, like many small companies that have grown up fast,  our company is built on a complex ecosystem of ever-changing goals and feature requests.  We built a system based on assumptions, and we’ve become one of the leaders of our industry because we’re often the first to deliver a product for a niche market.  Many of the assumptions we made didn’t pan out, and the applications we’ve built have slowly degenerated into a mass of tangled wire and unrealistic expectations.  I realized this as I’ve struggled to add a new feature and retrofit it into this existing solution; it’s taking more and more time to solve development problems because we’re not sure what features are still being used by some employee in a dark corner of the building.

As I was rewriting a stored procedure for the fifth time trying to eke out a few more milliseconds of performance, I realized that I was thinking like an engineer.  Engineers find creative solutions to problems in a very hands-on way; they worry about wiring things together so that they work, and they work well.  Engineers are worried about the microcosm; as every geek’s favorite engineer (Scotty from Star Trek) would say “In four hours, the ship blows up.”  That’s pretty straightforward; under condition x, outcome y is to be expected in a certain amount of time.

The problem?  My title says Architect.  I’m supposed to be thinking about the big picture, not just how a couple of applications are wired together.  I’m supposed to understand (and enforce) the rules about how events become data, and how data becomes information.  I should be more concerned with defining the specifications for our system than trying to figure out this damned stored procedure (for the fifth time).  Maybe we shouldn’t even have this particular stored procedure; maybe with a little tweaking, we could eliminate the problem altogether.

So what does this mean for me?  Well, as part of my New Year’s resolutions, I’ve been determined to learn something new every month.  This month, I’ve been focused on what does it mean to be a Data Architect, and I’ve been trying to find a little time every day to transform myself from an engineer to an architect.  I’m not going to master all of these subjects at once, but here’s my working list (from high-level goals to specific action items).  I expect this list to evolve, but it’s a start.

High Level Goal: A Data Architect needs to establish the standards for information and data in the enterprise.

  • I need to document the information architecture of our division of the company, using a standard data flow diagram notation.  I need to spend some time daily refreshing my memory on that notation.
  • I need spend time with employees throughout the organization, discovering what the business entities are, and what the vocabulary for those entities are. 
  • After discovery, I need to publish a standard vocabulary document and data-dictionary, showing how we capture that information today:
    • I need to propose changes to our business vocabulary, and
    • I need to propose changes to our database schema to standardize our notation.

High Level Goal: A Data Architect needs to understand the nature of the enterprise’s information on all levels: physical, logical, and procedural.

  • I need to talk to our production DBA’s an understand how our database servers are set up physically, including the clustering structure, the drive arrays, the SAN, etc.
  • I need to talk to our engineers to understand how data gets to the databases.
  • I need to talk to our product owners to understand what information they want from the data, and what’s the best way to deliver it.

High Level Goal: A Data Architect needs to recommend the best architecture for information management, including a plan on how to get there from here.

  • I need to refresh my memory on all aspects of SQL Server, not just the parts I use on a daily basis.
  • After discovery, I need to recommend ways to improve efficiency in our data capture processes.
  • I need to listen to all voices in the organization, even those I don’t normally agree with.  I can’t afford to throw away good ideas simply because I don’t always like the originator of those ideas.

More to come, but this is what I’ve been working on so far this month (February 2011).

Something new for 2011: XML & XSD, part 2

I’m continuing my study of XML and XSD’s for January, and I realize that I ended my last post a bit abruptly.  I explained that I can cast an XML datatype to a SQL Server datatype, without giving a lot of background on WHY that’s important.  

Understanding Types.

Without going into too much detail about type, the basic reason for specifying a type for data transformations is validity; if you are expecting integer data, and the XML provides a string, then the basic contract is broken.  An XSD defines a type of data expected, and if some other type is provided, the XML is invalid.

For example, run the following code:

IF NOT EXISTS ( SELECT  *
               
FROM    sys.xml_schema_collections xsc
               
WHERE   name = 'MismatchDataType' )
       
CREATE XML SCHEMA COLLECTION MismatchDataType  AS
       
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:element name="IntValue" type="xsd:integer"/>
        </xsd:schema>'
       
GO

DECLARE @x XML(MismatchDataType)
SET @x = '<IntValue>100</IntValue>'
--SET @x = '<IntValue>String</IntValue>'

DROP XML SCHEMA COLLECTION MismatchDataType
GO

It runs fine, but if you uncomment the second SET statement (where a string value is specified), you get the following error:

Msg 6926, Level 16, State 1, Line 4
XML Validation: Invalid simple type value: ‘String’. Location: /*:IntValue[1]

What’s important to remember is that once you specify a type for an element, you may only cast that XML type to a matching SQL Server type (i.e., integer to integer, string to (n)varchar, etc.) when using the XQuery methods in SQL Server (.value(), etc.).  This is easily debuggable to a seasoned database professional; if the XML type is string, and you store a value as 100, you can easily convert that to either an integer or varchar value:

SELECT @x.value('IntValue[1]', 'integer'), @x.value('IntValue[1]', 'varchar(3)')

 

If you don’t specify a type, SQL Server can make certain assumptions regarding type conversion; however, typing your XML is one of those basic “good habits” that is foundational to application design.  Knowing what to expect from your data, regardless of whether or not it’s stored in XML or a database makes troubleshooting a lot easier in the future.

Complex vs. Simple Types

The examples I’ve used so far all rely on what is known as a simple type in XML; a simple type contains no sub-elements or attributes.  A complex element can  contain either sub-elements or attributes.  An XSD collection is especially useful when defining complex elements; the XSD allows database professionals to enforce validity in the shape of their XML, including which elements are required or not.

Most of the examples I’ve used so far have been simple elements, but a complex element enforced via an XSD would look something like  (apologies for the formatting):

CREATE XML SCHEMA COLLECTION XMLSample  AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Parent">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Child" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
        </xsd:schema>'
       
GO

 

In essence, a complex type is the heart of a strongly-typed XML document;  one of the major benefits of XML is the ability to encapsulate hierarchical data, and a complex type enforces the relationship between the elements (and attributes) encapsulated in that hierarchy much like foreign keys do for a relational database.   The presence or absence of elements in the data when compared to the XSD validate the nature of the dataset.

A stopping point…

Unfortunately for you, I need to stop at this point.  I promised myself to learn something new every month, and I feel like I have.  However, there’s so much more to learn about this topic, and I’ve simply run out of time.  I debated about spending a few more weeks on this, but then realized that I need to move on (I can always return to it in a few months) in order to stay energized about learning something new.  When I do return to this topic, I’ll be sure to post a few summary links to keep everything related.

Something new for 2011: XML and XSD

As part of my New Year’s resolution for 2011, I vowed to do a deep-dive on something technical every month; for January, I’m focusing on XML.  I’ve been using XML and XQuery in SQL Server for a while now (even presenting on it), but I still don’t consider myself an expert in the area.  For example, I use a lot of untyped XML to transfer data between databases; I’ve never really tackled XSD (XML Schema Definition Language), and now’s the time.  I’m reading The Art of XSD by Jacob Sebastian to help get me started.

What’s XSD?  In a nutshell, it’s an XML document which validates the structure of another XML document.  From the perspective of a database developer, an XSD document describes how data should look in a dataset; if the data doesn’t match the description (i.e, if a table is missing a column), that dataset is invalid.  The XSD document can be very precise, or it can offer options for the dataset, but in either case, the point of an XSD is to document the expectations about the dataset.  XML without XSD is untyped; XML with an XSD is typed (although XSD’s do more than just provide information about the data types contained within the XML).

Let’s take a look at an untyped XML statement:

DECLARE @NoXSD XML
SET
@NoXSD = '<Test1>Hello World!</Test1>'
SELECT @NoXSD

 

Simple and straightforward; I created an XML variable, and populated it with an XML fragment.  I then pulled the data out of that fragment.  In this example, we have an element named Test1; what happens if we have a typo when we populate the variable?

SET @NoXSD = '<Test2>Hello World!</Test2>'
SELECT @NoXSD

 

Nothing happens.  It’s a well-formed XML fragment (no root tag, but it does have starting and ending tags); the XML engine in SQL Server doesn’t know that our fragment is supposed to have an element named Test1, so it accepts the fragment as valid.  This is where an XSD comes in:

IF EXISTS( SELECT * FROM sys.xml_schema_collections WHERE name = 'TestSchema' )
DROP XML SCHEMA  COLLECTION TestSchema
GO

CREATE XML SCHEMA COLLECTION TestSchema AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Test1" />
</xsd:schema>'
GO

DECLARE @XSD XML ( TestSchema --use the schema to validate (type) the xml
SET @XSD = '<Test1>Hello, World!</Test1>'

SELECT @XSD

 

Since the XML fragment matches the XSD,  the assignment of data works; what happens when we assign a fragment that doesn’t match?

SET @XSD = '<Test2>Hello, World!</Test2>'

We get a big fat error message:

XML Validation: Declaration not found for element ‘Test2’. Location: /*:Test2[1]

Straightforward, right?  But now what?  Well, let’s type the data in our schema:

IF EXISTS( SELECT * FROM sys.xml_schema_collections WHERE name = 'TestSchema' )
DROP XML SCHEMA  COLLECTION TestSchema
GO

CREATE XML SCHEMA COLLECTION TestSchema AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Test1" type="xsd:string"/>
</xsd:schema>'
GO

DECLARE @XSD XML ( TestSchema )
SET @XSD = '<Test1>Hello, World!</Test1>'

 

So; what does this mean?  It means that we can now use the XQuery methods built into SQL Server to cast the data from the XML datatype to a SQL Server data type.

SELECT @XSD .value ( '(//Test1)[1]' , 'varchar(50)' )

 

More to come, but that’s a good stopping place for now; we’ve built a simple XSD, and validated a simple datatype.  I plan to spend some time learning about optional data elements next.