XML Document Property Parsing in SharePoint (3 of 5): Determining Document Content Type for XML Parsing

This is the third in a five-part series on how to use the built-in XML parser in WSS V3 to promote and demote document properties in your XML files, including InfoPath 2007 forms.

Read part one here.

Read part two here.

In order for the built-in XML parser to determine the document’s content type, and thereby access the content type definition, the document itself must contain the content type as a document property. The parser looks for a special processing instruction in your XML documents to identify the document’s content type. You can include processing instructions that identify the document’s content type by content type, and/or document template.

How the Parser Determines Document Content Type

When a user uploads an XML document to a document library, WSS V3 invokes the built-in XML parser. Before the parser can promote document properties, it must first determine the content type, if any, of the document itself.

The parser first looks at the Field element in the document library schema that represents the content type ID column on the document library. The parser examines the Field element for the location in the document where the content type ID should be stored. The parser then determines if the content type ID is indeed stored in the document at this location. If no content type ID is specified at that location, the parser assigns the default content type to the document. The parser then uploads the document and promotes any document properties accordingly.

If the document does contain a content type ID at the specified location, the parser determines if the content type with that ID is also associated with the document library. If it is, the parser uploads the document and promotes any document properties accordingly.

If the parser doesn't find an exact match, it examines ID's of the content types on the document library to determine if one or more are children of the document content type. If so, the parser assigns the closest child content type to the document. The parser then uploads the document and promotes any document properties accordingly.

Note The parser examines the list for content types that are children of the document content type because, in most cases, the document will have been assigned a site content type. In such cases, the matching list content type will be a child of the site content type.

If the parser finds no content type match at all, it then looks at the Field element in the document library schema that represents the document template column on the document library, if such a column is present. If the document library does contain a document template column, the parser examines the Field element for the location in the document where the document template should be stored. The parser then determines if the document template is indeed stored in the document at this location.

If the document does contain a document template, the parser compares the template with the document templates specified in each content type on the document library. If the parser finds a content type with the same document template as the document, the parser assigns that content type to the document. If there are multiple content types with the same document template as the document, the parser simply assigns the first such content type it finds. The parser then uploads the document and promotes any document properties accordingly.

Finally, if the parser cannot find a content type match, the parser assigns the default content type to the document. The parser then uploads the document and promotes any document properties accordingly.

The flow chart below illustrates the checks the parser performs to determine a document's content type.

For more information on how the parser promotes and demotes specific document properties, part two of this series.

One aspect of the parser's operation that bears emphasis is the fact the parser looks to the document library's content type and document template columns to determine where in the XML file to look for those matching document properties. Therefore, in order for promotion and demotion to work correctly, all content types on a given document library must contain content type and document template column definitions that specify the same location for those document properties as the document library columns. Otherwise, the parser will be looking in the wrong location within the document for those properties.

In the next entry, we’ll take a look at how you actually specify the content type of the document, inside that document itself.