Lots of customers are asking for a way to validate their document structure based on contents, not merely the structure of the document as BizTalk does by default.
Here's a simple method of achieving document validation within BizTalk Server 2004. Mind you, there is ofcourse a performance trade-off, as BizTalk needs to load the entire message into a DOM tree in order to perform this validation.
First, make sure your schemas actually define your datatypes in a correct manner. It's easy to define the structure and think you're done, but you rarely are. Another hint is to use include schemas, as the actual data definitions are quite frequently re-used, and this promotes definition re-use. An example for a dutch zipcode definition:
<?xml version="1.0" encoding="utf-16"?>
<xs:schema
xmlns:basetypes="http://basetypes.macaw.nl"
xmlns="http://SchemaValidation.inputSchema"
xmlns:b="http://schemas.microsoft.com/BizTalk/2003"
targetNamespace="http://schemas.macaw.nl/samples/inputSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import
schemaLocation=".\definitions.xsd"
namespace="http://basetypes.macaw.nl" />
<xs:annotation>
<xs:appinfo>
<b:references>
<b:reference
targetNamespace="http://basetypes.macaw.nl" />
</b:references>
</xs:appinfo>
</xs:annotation>
<xs:element name="zipcode">
<xs:complexType>
<xs:sequence>
<xs:element
minOccurs="0"
ref="basetypes:dutch-zipcode" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
followed by the include schema defining:
<?xml version="1.0" encoding="utf-16"?>
<xs:schema
xmlns="http://basetypes.macaw.nl"
xmlns:b="http://schemas.microsoft.com/BizTalk/2003"
attributeFormDefault="unqualified"
elementFormDefault="qualified"
targetNamespace="http://basetypes.macaw.nl"
version="200401"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="dutch-zipcode">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="^[1-9]{1}[0-9]{3} ?[A-Z]{2}$" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
As you can see, the dutch-zipcode has a pattern defined, which is a regular expression comforming to the format of the dutch zipcode. This ensures any zipcode defined within the zipcode element conforms to this specification. However, BizTalk only checks the contents of messages after specifically being instructed to do so, as it pertains a performance trade-off. Telling BizTalk Server 2004 to check the actual contents of inbound messages can easily be achieved by creating a custom pipeline within which you drop an out-of-the-box pipeline component called the XmlValidator:
Ensure you use the design-time property called "Document schemas" to designate our schema as the target for validation.
Deploy the pipeline, configure your ports, drop a correct message into the defined folder and see that it is processed.
Subsequently, drop a message into the folder not adhering to the regular expression defined within the pattern and see BizTalk Server 2004 rejecting the message and setting it's state to "Suspended - Not Resumable" combined with an event in the eventlog telling you why it was denied.
That's all,
along with my apologies for being absent for such a long time!