XML Element and Attribute Name Guidelines

Article
08/12/2009

Some time ago, a dev team here at Microsoft asked me to review their XML vocabulary that they had designed. They wanted to know if the element and attribute names in their vocabulary design were good ones.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCNote: I suspect that this is one of those religious issues. I’m not suggesting that this is the one and only way to design XML vocabularies – this is just what I do. There may be good reasons to use different rules. (I have broken every rule here, probably. Well, that’s the story of my life – make mistakes, suffer from the effects, and try not to do it again. J)

Once you release some software that uses a specific vocabulary, it tends to be set in stone. Even if there’s a problem with the vocabulary, it often is more problematic to try to change it after the fact than to continue to work around the problem, so designing vocabularies deserves some thought beforehand.

Much of what I’m presenting here derives from the Guidelines for Names from the .NET Framework Design Guidelines. Krzysztof Cwalina and Brad Abrams are the authors of the book Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries. Some important excerpts of that book are published on MSDN for easy reference, but I consider that book to be a ‘must read’ on the list of professional .NET developers.

Pascal Casing For Tags Longer than Three Characters

Use Pascal casing for element and attribute names that are longer than three characters.

There are mainly three schemes used for casing:

Pascal casing: The first letter in the tag and the first letter of each subsequent concatenated word are capitalized. For example:

<ForegroundColor>Black</ForegroundColor>

Lower casing with hyphens: tags are lower-case, and hyphens separate words. XSLT uses this style:

<apply-templates />
<value-ofselect="title" />

Camel casing: The first letter of a tag is lowercase and the first letter of each subsequent concatenated word is capitalized. For example:

<fatalErrorMessage>Contact your IT support department.</fatalErrorMessage>

I prefer Pascal casing for two reasons:

I find the XML easier to read.
If you use an XML to Object mapper such as LINQ to XSD, then the types that the mapper generates will conform to the .NET Framework Design Guidelines. If you use the LINQ to XML approach of pre-atomization of element and attribute names, those classes will contain public fields that conform to the guidelines.

I find that I’m somewhat happy that XSLT uses lower casing with hyphens, because if I use Pascal casing for my tags, it makes the elements and attributes in sequence constructors a little easier to see:

<xsl:stylesheetxmlns:xsl='https://www.w3.org/1999/XSL/Transform'version='1.0'>
<xsl:templatematch='/Books'>
<SelectedAuthors>
<Author>
<xsl:value-ofselect='Author/@Name'/>
</Author>
</SelectedAuthors>
</xsl:template>
</xsl:stylesheet>

Capitalization Rules for Acronyms

It’s better to use acronyms in tags only when they are widely known and well understood – HTML, for example. Per the .NET Framework design guidelines, capitalize all letters of a two letter acronym, and capitalize the first letter of acronyms that are three characters or longer.

<IO>true</IO>
<Html>false</Html>
<GenerateXhtml>true</GenerateXhtml>

Don’t capitalize each word in compound words that are written as a single word. Examples: Endpoint, Lifetime, Diskdrive, Hashtable, and Grandchild.

One word that I have problems remembering how to capitalize is FileName. Some dictionaries don’t define it as a closed-form compound word (but some do). The .NET Framework guidelines capitalize it as FileName.

Don’t Use Abbreviations

Unless you are designing a highly-specialized vocabulary such as Open XML markup, optimize for readability instead of shortness of tags.

One abbreviation that it’s ok to use is the abbreviation for identifier (Id). Use Pascal casing for Id.

Just for fun: The .NET Framework Design Guidelines says that ‘Ok’ is an abbreviation. Actually, it might be an acronym. There are a number of theories on the origin of ‘OK’, but my favorite comes from a slogan during the American Presidential election of 1840. That election resulted in the oldest written usage of 'OK'. The democratic candidate, President Martin Van Buren, was nicknamed 'Old Kinderhook' (after his birthplace in New York State), and his election campaign had a slogan, ‘Old Kinderhook is OK’. Van Buren wasn’t re-elected. But following the .NET Framework design guidelines, I would use Pascal casing for ‘Ok’.

The designers of Open XML had very good reasons for making element and attribute names short – documents can be very long, and an increase in name length can have an impact on performance and the memory that is used by tools, so they were justified in having names such as w:p, w:t, and w:sdt. Unless you are designing one of those highly specialized vocabularies, it’s better not to use single character tags.

Word Selection

Avoid language keywords such as ‘default’, ‘abstract’, ‘break’, and ‘event’. If you use a tool that generates C# or some other language for de-serialization of the XML, or if you use the approach of pre-atomization of XName objects, code that exactly matches the element or attribute name won’t be valid.

For convenience, here are links to the C#, VB.NET, and C++ references, so that it’s easy to validate that you are not using keywords:

C# Language Reference

Visual Basic Reference

C++ Reference

Choose easily readable identifier names. For example, a property named HorizontalAlignment is more readable in English than AlignmentHorizontal. AdjustIndentation is more readable than IndentationAdjust.

Optimize for readability over brevity.

XML Namespaces

I put elements in namespaces, and don’t put attributes in namespaces. Elements need to be in namespaces for a variety of reasons. There are some tools that require namespaces.

But attributes are a different story. For one thing, for all practical purposes, attributes inherit the namespace of their element. You can’t access the attribute without accessing the element, and you can’t get to the element without using its namespace. The designers of XML had to allow for namespaces for attributes because there are some special purpose attributes that sometimes need to be added to elements in an existing vocabulary. The xml:space and xml:lang attributes are examples. By placing these in the special xml namespace, we avoid any possible collisions between names.

Another reason that attributes allow namespaces – I’ve written XSLT transforms where the first operation is to transform the XML into a new tree with new attributes on some elements. The purpose of these attributes is to aid further transforms. By creating these attributes in my own namespace, I can make sure that I avoid name collisions.

This is, I believe, why even if there is a default namespace for elements, attributes are always by default in no namespace.

Sometimes I use LINQ to XML trees as a means of passing more complicated configuration information into a method, or returning the results of a query into SQL or Open XML. In this case, I’m not really using LINQ to XML as XML, but as a hierarchical data store that is LINQ friendly. Of course, in this situation, using XML namespaces would be silly.

There are many more things to consider around designing XML vocabularies, such as when to use attributes vs. when to use child elements, avoiding magic values, etc. This is a much larger discussion, and could fill an entire book, I think.