Mapping XML column with XSD schema validation in EF Code First

By Mirek on (tags: Entity Framework, SQL Server, XML, XSD, categories: architecture, code)

The XML data type allows storing a xml content in our SQL Server database. The great feature is that we can also attach an XSD schema to it and have the xml content automatically validated. Here I am going to show you how to set it up together with Entity Framework Code First.

Storing XSD Schema in database and mapping entity

One could ask: what is the point of validating the xml in SQL Server when I can validate it directly in my application logic before I send it to the db? Well, simply for one reason: when you allow database access from other clients, ex. Excel and you want to have the xml content being validated on db insert or update.
Let say we have such a simple xml

<?xml version="1.0" encoding="utf-8" ?>
<VendorsDetails>
  <CompanyName>Company Co. Ltd.</CompanyName>
  <EmployeeCount>385</EmployeeCount>
  <FoundingDate>2001-06-03</FoundingDate>
  <SaleStatistic year="2013">
    <TotalTransactions>156</TotalTransactions>
    <Income>15789438</Income>
    <Grow>35</Grow>
  </SaleStatistic>
  <SaleStatistic year="2014">
    <TotalTransactions>285</TotalTransactions>
    <Income>3452872</Income>
    <Grow>57</Grow>
  </SaleStatistic>
</VendorsDetails>

and corresponding XSD schema

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="VendorsDetails"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="VendorsDetails">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="CompanyName" type="xs:string" />
        <xs:element name="EmployeeCount" type="xs:nonNegativeInteger" />
        <xs:element name="FoundingDate" type="xs:date" />
        <xs:element name="SaleStatistic" minOccurs="0" maxOccurs="unbounded" >
          <xs:complexType>
            <xs:all>
              <xs:element name="TotalTransactions"  type="xs:nonNegativeInteger" />
              <xs:element name="Income"  type="xs:nonNegativeInteger" default="0" />
              <xs:element name="Grow" type="xs:integer" />
            </xs:all>
            <xs:attribute name="year" type="xs:nonNegativeInteger" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Now we want to have the xml data accessed via Entity Framework context with this simple Vendor entity

public class Vendor
    {
        public int Id { get; set; }
 
        public string Name { get; set; }
 
        public string Address { get; set; }
 
        [Column(TypeName = "xml")]
        public string DetailsXml { get; set; }
 
        [NotMapped]
        public XDocument DetailsElement
        {
            get { return DetailsXml != null ? XDocument.Parse(DetailsXml) : null; }
            set { DetailsXml = value.ToString(); }
        }
    }

As you can see the DetailsXml property is decorated with a column attribute to tell the EF that this should be mapped to an xml data type column in SQL Server.

But what we want to achieve is to have the xml content stored in DetailsXml property be automatically validated on database insert and update. To have that we need to assure two things:

  1. Create a xml schema collection on our database providing required XSD file.
  2. Set the type of DetailsXml column to validated xml document

Since point one can be achieved only by raw sql I have decided to use a entity framework migrations to set that. You can also use a Seed method of database initializer.
Lets enable migrations on the project, make sure the database does not exists and add a initial migration. You should get a migration that will create your database structure from the scratch, like this

public partial class InitialCreate : DbMigration
{
    public override void Up()
    {
        CreateTable(
            "dbo.Vendors",
            c => new
                {
                    Id = c.Int(nullable: false, identity: true),
                    Name = c.String(),
                    Address = c.String(),
                    DetailsXml = c.String(storeType: "xml"),
                })
            .PrimaryKey(t => t.Id);
    }
 
    public override void Down()
    {
        DropTable("dbo.Vendors");
    }
}

Now to accomplish point 1 and 2 we need to add some sql in Up method

public override void Up()
{
    CreateTable(
        "dbo.Vendors",
        c => new
            {
                Id = c.Int(nullable: false, identity: true),
                Name = c.String(),
                Address = c.String(),
                DetailsXml = c.String(storeType: "xml"),
            })
        .PrimaryKey(t => t.Id);
    
    Sql(string.Format("CREATE XML SCHEMA COLLECTION dbo.VendorsDetailsSchema AS '{0}';", File.ReadAllText("VendorsDetails.xsd")));
    Sql("ALTER TABLE dbo.Vendors ALTER COLUMN DetailsXml XML(DOCUMENT dbo.VendorsDetailsSchema)");
}

As you can see first we create a xml schema collection providing the content of xsd file read from application execution directory. Then we alter the DetailsXml column to be xml validated column. We cannot do it during table creation because it is not supported in current version of EF.
That’s it. Now you will get a sql update exception when trying to store or update invalid xml content.

Updating XSD Schema

Now what about updating the xsd schema in the database. When your xml content schema change you will need to define a new xsd schema and upload it to the sql server. However all xml entries that already exist in the database  would not stick to new schema. This is actually solved in a SQL Server in a way that you are not allowed to change the existing xml schema, but instead you can add new xml schemas which targets different xml namespaces. So if we already have an xsd schema that targets an empty xml namespace, the only thing we can do is to add an xsd which targets some non empty namespace.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="VendorsDetailsVer2"
           targetNamespace="VendorsDetails.Ver2"
           xmlns="VendorsDetails.Ver2"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="VendorsDetails">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="CompanyName" type="xs:string" />
        <xs:element name="EmployeeCount" type="xs:nonNegativeInteger" />
        <xs:element name="CEOName" type="xs:string" />
        <xs:element name="FoundingDate" type="xs:date" />
        <xs:element name="SaleStatistic" minOccurs="0" maxOccurs="unbounded" >
          <xs:complexType>
            <xs:all>
              <xs:element name="TotalTransactions"  type="xs:nonNegativeInteger" />
              <xs:element name="Income"  type="xs:nonNegativeInteger" default="0" />
            </xs:all>
            <xs:attribute name="year" type="xs:nonNegativeInteger" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Our new xsd schema targets xml documents in namespace VendorsDetails.Ver2, contains extra field CEOName and lacks of one child element in SaleStatistic. To apply this schema to our database we need another database migration, which can be created with following command.

Add-Migration "VendorsDetails2" -Force

The we put some sql command in the Up method which will add new xsd schema to our database schema collection called VendorsDetailsSchema

Sql(string.Format("ALTER XML SCHEMA COLLECTION dbo.VendorsDetailsSchema ADD '{0}';", File.ReadAllText(@"VendorsDetails2.xsd")));

Now we can store xml documents which match new schema

<?xml version="1.0" encoding="utf-8" ?>
<VendorsDetails  
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="VendorsDetails.Ver2">
  <CompanyName>Company Co. Ltd.</CompanyName>
  <EmployeeCount>385</EmployeeCount>
  <CEOName>Richard Dean Anderson</CEOName>
  <FoundingDate>2001-06-03</FoundingDate>
  <SaleStatistic year="2013">
    <TotalTransactions>156</TotalTransactions>
    <Income>15789438</Income>
  </SaleStatistic>
  <SaleStatistic year="2014">
    <TotalTransactions>285</TotalTransactions>
    <Income>3452872</Income>
  </SaleStatistic>
</VendorsDetails>

Keep in mind that the xmlns attribute must match the namespace name which the xsd targets, otherwise the xml is validated incorrectly. Our xml schema collection now contains two schemas, first targets empty namespace and second targets VendorsDetails.Ver2 namespace. That means we are allowed to insert xml documents with both no namespace defined and with namespace set to VendorsDetails.Ver2  and both will be valid, since Sql Server will use appropriate schema to validate the content of the xml.

Updating XML content to new schema

Now there is another topic that I would like to write about. Since we have now two xml schemas and we can store two kind of xml documents we also have to handle two cases in our application logic. This might not be the desired state and we may want to update the “old” xml documents to stick to the latest version of xsd schema and get rid of the old xsd schema from the xml schema collection. In other words we might want to have kind of migration for the xml content in our database. Lets then see how it could look from the EF migration perspective. Our Up method of migration could look like this

   1: public override void Up()
   2: {
   3:     Sql("ALTER TABLE dbo.Vendors ALTER COLUMN DetailsXml XML;");
   4:  
   5:     Sql("DROP XML SCHEMA COLLECTION dbo.VendorsDetailsSchema;");
   6:  
   7:     //migrate xml content to the new schema here
   8:     
   9:     Sql(string.Format("CREATE XML SCHEMA COLLECTION dbo.VendorsDetailsSchema2 AS '{0}';", File.ReadAllText(@"D:\test\VendorsDetails2.xsd")));
  10:  
  11:     Sql("ALTER TABLE dbo.Vendors ALTER COLUMN DetailsXml XML(DOCUMENT dbo.VendorsDetailsSchema2);");
  12: }

As you can see first we need to remove the xml schema constraints on target column (line 3). This is required for next command (line 5) which drops the xml schema collection totally. Now it is the moment to migrate the content of all xml entries to the new schema and we should do it in line 7. After that we can create new xml schema collection with just one xsd schema (the most recent one) and again update the constraint on table column.

Now we will focus on the xml content migration. We have here two options

  1. XSLT transformations
  2. XML Data Modification Language (XML DML)

XML DML is a special language used in T-SQL that allows modification on direct content of xml field in sql table. It facilitates xml modifications by providing a comprehensible syntax like replace, delete, insert and so on. However I will leave this topic for your own research and will focus on xslt transformations in this post.
Since Sql Server does not support xslt transformations natively we have two options. Either we load the xml documents into our application’s memory, transform it and then update back to the sql server, or we send the xslt sheet and the transformation logic to the sql server and perform the transformation there on the server process. Since first approach seems to have quite big overhead, especially if xml’s are big or there are many of them, we will focus on the second approach.

What we need first is the xslt transformation method imported into sql server from external library. Lets create such library.
Create new library project in Visual Studio (let name it SQLHelpers) and add one class XslTransform to it.

   1: using System;
   2: using System.IO;
   3: using System.Xml;
   4: using System.Xml.Xsl;
   5: using System.Data.SqlTypes;
   6:  
   7: public class XslTransform
   8: {
   9:     public static SqlXml Transform(SqlXml inputDataXML, SqlXml inputTransformXML)
  10:     {
  11:         var memoryXml = new System.IO.MemoryStream();
  12:         var xslt = new XslCompiledTransform();
  13:         xslt.Load(inputTransformXML.CreateReader());
  14:         xslt.Transform(inputDataXML.CreateReader(), null, memoryXml);
  15:         return new SqlXml(memoryXml);
  16:     }
  17: }

This static method will be imported as a sql function in our database. Notice that it accepts parameters of SqlXml type which is properly mapped into a xml type in sql server.
Now we have to build our SQLHelpers.dll library and import it as assembly into sql server

create assembly XsltTransform
from 'D:\test\SQLHelpers.dll';

Next we create an sql function which maps to the Transform method.

create function ApplyXsltTransform( @inputXML xml, @inputTransform xml )
returns xml
as external name XsltTransform.XslTransform.Transform;

Please notice that here we refer to the XsltTransform assembly name, not the dll library.

In next sql script we create the stored procedure which facilitates the xml transformation, so we can easily call it from the application code.

CREATE PROCEDURE UpdateVendorsDetails @xslt_content nvarchar(MAX) 
AS
DECLARE @xslt XML = @xslt_content;
BEGIN
  UPDATE Vendors SET DetailsXml = dbo.ApplyXsltTransform(DetailsXml, @xslt );
END;

The result of executing this procedure will be the update on every xml document stored in Vendors.DetailsXml column. Every xml will be processed with xslt transformation provided as procedure parameter.
The last thing we need to assure before we are done, is if the clr integration feature is enabled. This is be default off and to switch it on we need to run following command

sp_configure 'clr enabled', 1;
RECONFIGURE;

As you can see there is quite a number of commands and operations required to have the xlt transformations possible on sql server. Moreover we need to have a server level permission to change last mentioned setting, which is not always an acceptable solution. However there is a way to do it and I just showed you how.
All above sql commands must be executed before we run the migration on the database. When we are ready with it, the last modification is to add a call to the UpdateVendorsDetails procedure and pass the content of xslt transformation to it.

public override void Up()
{
    Sql("ALTER TABLE dbo.Vendors ALTER COLUMN DetailsXml XML;");
 
    Sql("DROP XML SCHEMA COLLECTION dbo.VendorsDetailsSchema;");
 
    Sql(string.Format("EXEC    dbo.UpdateVendorsDetails @xslt_content = '{0}';", File.ReadAllText(@"D:\test\VendorDetail1To2.xsl")));
 
    Sql(string.Format("CREATE XML SCHEMA COLLECTION dbo.VendorsDetailsSchema2 AS '{0}';", File.ReadAllText(@"D:\test\VendorsDetails2.xsd")));
 
    Sql("ALTER TABLE dbo.Vendors ALTER COLUMN DetailsXml XML(DOCUMENT dbo.VendorsDetailsSchema2);");
}

Having that in place our xmls are automatically updated to the new schema. The xml schema collection is recreated with the only one schema VendorsDetails2.xsd and table column is appropriately updated.

The question is only if whole this effort is worth of such functionality. We must keep in mind that xslt transformation on a big amount of xml documents can still take a lot of resources and time, even if it is done on the sql server side. Maybe we should design it in a different way so we can avoid the xml transformation, or maybe we should allow to have many versions of xml simultaneously. We always need to answer those and similar question and the answers are very often different depending on what are the expectations and requirements.