Dates in XML Schemas: Specifying your own date format?

This topic came up in the win_tech_off_topic list, and it caught my attention. The concept is that specifying a date in your XML document that does not conform to ISO 8601 format cannot be validated as a date type, but rather you must contrive your own simple type. For processors that use the post-schema validation Infoset (PSVI) for strong typing, this presents some non-obvious problems.

I agree with the W3C throwing a stake in the ground and calling what a date should be, greatly simplifying processing for XML consumers. On the other hand, conforming to the W3C specification of a date means you either change the XML storage of the date, which may not be feasible in all cases, or you validate the document as a string with your own simple type facets. Ultimately I lean toward ISO 8601 for specifying the format of dates within XML as storage and for PSVI typing, but the idea of specifying a date format within the schema itself has some merit.

Instead of saying what the format of the date field is within the schema, you might consider storing the date in a locale-agnostic format. ISO8601 specifically addresses this concern by specifying the format for which dates should be stored for consistency across locales, which is the choice of date format used by Microsoft products (including SQL Server). Note that there are other date specifications as well, such as RFC 822, which allow a different variation of dates (but the standardization of which is largely US-centric). In either case, storage of dates will likely be different than the representation of those same dates. Whatever mechanism you use to consume the XML should be responsible for manipulating the international standard into the current locale format for rendering.

Now, flipping back to the other side of the argument. I recognize that we cannot reliably always control the format of the XML being validated; we might only be able to develop a schema for the current instance document.

If something is in a format other than ISO8601 (for instance, MM/DD/CCYY), then validating that data simply using the xsd:date type will not pass validation. If you know the pattern string for the date format (MM/DD/CCYY), then you can validate the date value according to its pattern instead of its intrinsic xsd:date datatype. For this type of validation, you can use a regular expression as a simple type restriction facet instead of specifying a date datatype to validate the contents. This is necessary as the type is not stored as a date (at least according to the XML schema concept of a date). Basically, you make up a new type called my:date and specify the simple type within the schema. This is similar to what happened with the support of dc:date in RSS, addressing the same type of issue. The real downside to the latter approach is that you lose type semantics of the date value itself, and the date value is only represented as a string with constraints. Supposing you were going to use the schema to build a strongly-typed dataset, you would run into a problem in trying to specify a date value: you could not use the System.DateTime type, you would instead have to use its ToString() method, specifying the date format to insert data. Strongly-typed datasets do not support simple type facets using the xsd.exe tool, so you lose the validation of your custom date without re-applying the schema to the serialized verison of the date object. Mapping from object to XML, or vice-versa, causes some loss of fidelity between types.

So, I see that being able to specify the date format within the schema itself might be a better approach than using your own simple type if the semantics of the date type can be preserved. I also see the downside to this approach in that it makes representing those dates within an object model that uses PSVI for typing difficult in that your format for a date may not be consistently represented across platforms, especially when only a 2-digit year is specified.

To currently preserve type fidelity between the serialized and object representations of XML, store dates using ISO 8601 and render them using your locale-specific format. Maybe this type of flexibility is on the radar screens for a working group somewhere: after all (as Oleg noted), something like 9 different working drafts related to XPath and XQuery have been released recently.

[Listening to: Man of Constant Sorrow - Dan Tyminski; Soggy Bottom Boys - O Brother, Where Art Thou? (03:10)]