SQL Server A to Z – XML

I can’t believe it’s already December.  This year has just flown by.  The rush of holiday events has already started, everyone at work is pushing to have their projects completed before the end of the year.  It’s a madhouse, I tell ya.  I’ll be honest, though, for all the running around, I love this time of year.  The decorations, the carols, seeing friends and the odd family member, eggnog (don’t be stingy with the bourbon, people!), and of course the sheer gluttony.

But I’m also relieved when January 2 arrives.  :-)

The alphabet is also coming to a close.  We’re up to the letter X and that stands for… Xmas???  No, XML.

A lot of DBAs question the need for XML within a database, you should be storing your data in a relational format, after all. But XML has its uses, especially when exchanging data between disparate systems. So even if you’re not going to store your data in an XML format, it’s definitely handy to be able to output your data as XML.

Support for xml started with sql 2000, the FOR XML statement. FOR XML allowed you to return a row in an xml format, rather than a relational format. You had three modes to choose from: RAW, AUTO, and EXPLICIT. With EXPLICIT mode, you had quite a bit of flexibility in how you wanted the data formatted. The downside, however, was that that format needed to be designated and maintained in the query itself. In every query using XML formatting. Get where I’m going with this? It was complex and pretty darn unwieldy.

SQL Server 2005 brought relief with some new enhancements to the FOR XML statement. It also introduced an XML datatype. This made working with XML in SQL Server a much more pleasant experience. We can now store XML data and schemas directly in the database, or in variables, or return relational data as an XML type, not just as text in an XML format. Service Broker, event notifications, and DDL triggers, features introduced in SQL 2005, all use XML. So, like it or not, it’s in a DBA’s best interest to know at least the basics when it comes to XML.

XML Basics

Before we get into manipulating XML data in SQL Server, we should probably understand a little about the language itself. If you’re already familiar with XML, you may want to skip down the the next section, I won’t be offended.

Elements are the main building blocks of XML. If you’re familiar with HTML syntax, an element there would be the <BODY> or <IMG> tags. XML elements follow a similar structure. An attribute is information about the element. In the example below, customer, FName and LName are all elements, and id is an attribute.

<customer id="12">

In XML, you can store information in an element-centric format or in an attribute-centric format. Take a look at the examples below. In the first, we’re using an element-centric format. In the second, we’re storing the same information in an attribute centric format.



<customer id="12" FName="John" LName="Smith"></customer>

Since the customer element in our second, attribute-centric, example is empty, we could use a shortcut form for the element:

<customer id="12" FName="John" LName="Smith" />

See? Simple.

XML in SQL Server

Let’s take a look at a very basic example of returning relational data as an XML datatype within SQL Server.

select top 10 ContactID, Title, FirstName, LastName, Phone from Person.Contact FOR XML AUTO, TYPE

The first thing we notice is that this looks, for the most part, like any other T-SQL query. There’s really no fancy XML syntax for formatting. The difference is in the FOR XML clause. That’s telling SQL Server we want the results in XML format. The AUTO directive tells SQL Server to go ahead and format it based on the table structure. And lastly, the TYPE directive tells SQL Server to return that data as an XML datatype. If I were to remove that TYPE directive, it wouldn’t change the format of the returned result, but that result would be an nvarchar(max) type, rather than an XML datatype. So what do I get back? Rather than a whole bunch of rows and columns, I get a single value that looks like this.

If I click on that result in Management Studio, the data inside that XML datatype looks like this:

<Person.Contact ContactID="1" Title="Mr." FirstName="Gustavo" LastName="Achong" Phone="398-555-0132" />
<Person.Contact ContactID="2" Title="Ms." FirstName="Catherine" LastName="Abel" Phone="747-555-0171" />
<Person.Contact ContactID="3" Title="Ms." FirstName="Kim" LastName="Abercrombie" Phone="334-555-0137" />
<Person.Contact ContactID="4" Title="Sr." FirstName="Humberto" LastName="Acevedo" Phone="599-555-0127" />
<Person.Contact ContactID="5" Title="Sra." FirstName="Pilar" LastName="Ackerman" Phone="1 (11) 500 555-0132" />
<Person.Contact ContactID="6" Title="Ms." FirstName="Frances" LastName="Adams" Phone="991-555-0183" />
<Person.Contact ContactID="7" Title="Ms." FirstName="Margaret" LastName="Smith" Phone="959-555-0151" />
<Person.Contact ContactID="8" Title="Ms." FirstName="Carla" LastName="Adams" Phone="107-555-0138" />
<Person.Contact ContactID="9" Title="Mr." FirstName="Jay" LastName="Adams" Phone="158-555-0142" />
<Person.Contact ContactID="10" Title="Mr." FirstName="Ronald" LastName="Adina" Phone="453-555-0165" />

Notice how SQL Server has formatted the XML based on the table definition. Each contact is an element, and each column is an attribute (attribute-centric). We could assign that output to a variable. We could even store it in a table, if we wanted to.
We could also use is as a subquery, using XQuery functions. For example:

SELECT (SELECT top 10 ContactID, Title, FirstName, LastName, Phone FROM Person.Contact FOR XML AUTO, TYPE).query(
   for $c in /Person.Contact
     <Contact lastname="{data($c/@LastName)}"/>

This sifts through the XML data and returns only the LastName fields.

  <Contact lastname="Achong" />
  <Contact lastname="Abel" />
  <Contact lastname="Abercrombie" />
  <Contact lastname="Acevedo" />
  <Contact lastname="Ackerman" />
  <Contact lastname="Adams" />
  <Contact lastname="Smith" />
  <Contact lastname="Adams" />
  <Contact lastname="Adams" />
  <Contact lastname="Adina" />

So why would I want to do this?

Good question. There are a few reasons for using XML in SQL Server, other than portability between systems. It might be that you don’t know the structure of your data in advance, some of it might be structured, but some of it might not be. XML can be used to hold the unstructured data. If you need to retain order in your data when it’s stored, XML can do this. Or maybe you have sparse data, a table with a huge number of columns but most of them are usually empty. Storing this data in XML format might save space. Or perhaps you just want the ability to create indexes on your XML to expedite your searches.

But I think the main reasons still revolve around portability. XML is a very common standard for any type of system or rdbms. It’s a common language that everyone speaks. It makes sense to use it to transfer data between different environments.

Further Reading

What’s New in FOR XML in Microsoft SQL Server 2005

W3 Schools: XML Tutorial

Also recommended:

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>