I’m continuing my study of XML and XSD’s for January, and I realize that I ended my last post a bit abruptly. I explained that I can cast an XML datatype to a SQL Server datatype, without giving a lot of background on WHY that’s important.
Understanding Types.
Without going into too much detail about type, the basic reason for specifying a type for data transformations is validity; if you are expecting integer data, and the XML provides a string, then the basic contract is broken. An XSD defines a type of data expected, and if some other type is provided, the XML is invalid.
For example, run the following code:
IF NOT EXISTS ( SELECT *
FROM sys.xml_schema_collections xsc
WHERE name = 'MismatchDataType' )
CREATE XML SCHEMA COLLECTION MismatchDataType AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="IntValue" type="xsd:integer"/>
</xsd:schema>'
GO
DECLARE @x XML(MismatchDataType)
SET @x = '<IntValue>100</IntValue>'
--SET @x = '<IntValue>String</IntValue>'
DROP XML SCHEMA COLLECTION MismatchDataType
GO
It runs fine, but if you uncomment the second SET statement (where a string value is specified), you get the following error:
Msg 6926, Level 16, State 1, Line 4
XML Validation: Invalid simple type value: ‘String’. Location: /*:IntValue[1]
What’s important to remember is that once you specify a type for an element, you may only cast that XML type to a matching SQL Server type (i.e., integer to integer, string to (n)varchar, etc.) when using the XQuery methods in SQL Server (.value(), etc.). This is easily debuggable to a seasoned database professional; if the XML type is string, and you store a value as 100, you can easily convert that to either an integer or varchar value:
SELECT @x.value('IntValue[1]', 'integer'), @x.value('IntValue[1]', 'varchar(3)')
If you don’t specify a type, SQL Server can make certain assumptions regarding type conversion; however, typing your XML is one of those basic “good habits” that is foundational to application design. Knowing what to expect from your data, regardless of whether or not it’s stored in XML or a database makes troubleshooting a lot easier in the future.
Complex vs. Simple Types
The examples I’ve used so far all rely on what is known as a simple type in XML; a simple type contains no sub-elements or attributes. A complex element can contain either sub-elements or attributes. An XSD collection is especially useful when defining complex elements; the XSD allows database professionals to enforce validity in the shape of their XML, including which elements are required or not.
Most of the examples I’ve used so far have been simple elements, but a complex element enforced via an XSD would look something like (apologies for the formatting):
CREATE XML SCHEMA COLLECTION XMLSample AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Parent">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Child" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>'
GO
In essence, a complex type is the heart of a strongly-typed XML document; one of the major benefits of XML is the ability to encapsulate hierarchical data, and a complex type enforces the relationship between the elements (and attributes) encapsulated in that hierarchy much like foreign keys do for a relational database. The presence or absence of elements in the data when compared to the XSD validate the nature of the dataset.
A stopping point…
Unfortunately for you, I need to stop at this point. I promised myself to learn something new every month, and I feel like I have. However, there’s so much more to learn about this topic, and I’ve simply run out of time. I debated about spending a few more weeks on this, but then realized that I need to move on (I can always return to it in a few months) in order to stay energized about learning something new. When I do return to this topic, I’ll be sure to post a few summary links to keep everything related.