Document Parsers in SharePoint (4 of 4): Parser Schema and Interface

For these four entries, I’m going to go over in detail how to construct and register a custom parser that enables you to promote and demote properties between your custom file types and Windows SharePoint Services.

Read part one here.

Read part two here.

Read part three here.

Today, I’ll round out the document parser information I’m presenting by talking about how to register your custom parser with WSS. I’ll also give you a quick overview of the ISPDocumentParser interface, which your parser needs to implement to communicate with WSS.

Document Parser Definition Schema Overview

To register a custom document parser with WSS, you must add a node to the document parser definition file that identifies your parser and the file type or types it can parse.

You can specify the file type or types a document parser can parse either by file extension, or file type program ID.

WSS stores the document parser definition file, DOCPARSE.XML, at the following location:

Web Server Extensions\12\CONFIG\DOCPARSE.XML

The document parser definition schema is as follows:

<docParsers>

    <docParser/>

</docParsers>

Following is a list of the elements in the document parser definition schema.

docParsers

Required. Represents the root element of the document parser definition schema.

docParser

Required. Each docParser element represents a document parser and its associated file type. This element contains the following attributes:

· Name Required string.The file type associated with the parser. For docParser elements within the ByExtension element, set the Name attribute to the file extension. For docParser elements within the ByProdId element, set the Name attribute to the program Id of the file type. To associate a parser with multiple file types, add a docParser element for each file type.

· ProgId Required string. The program ID of the parser. This represents the ‘friendly name’ of the parser. This enables you to upgrade a parser without having to edit its document parser definition entry in the DOCPARSE.XML file. However, this prevents you from installing different versions of the same parser side-by-side.

Document Parser Definition Example

Below is an example of a document parser definition file.

<docParsers>

    <docParser name="abc" ProgId="AdventureWorks.AWDocumentParser.ABCParser"/>

    <DocParser name="AWApplication.Document" ProgId="AdventureWorks.AWDocumentParser.ABCParser"/>

</docParsers>

Document Parser Interface Overview

In order for a custom document parser to perform document property promotion and demotion in WSS, it must implement the following document parser interfaces. These interfaces enable the document parser to be invoked by WSS, and send and receive document properties when so invoked.

· ISPDocumentParser

Represents a custom document parser. This class includes the methods WSS uses to invoke the document parser.

· IParserPropertyBag

Represents a property bag object used to transmit document properties between the document parser and WSS. Includes methods that enable the document parser to access the content type and document library schemas for the specified document.

· IEnumParserProperties

Represents an enumerable document property collection. Includes methods the document parser can use to enumerate through the document property collection.

· IParserProperty

Represents a single document property. Includes methods for the document parser to get and set the document property value and data type.