Hierarchical Tags and XMP


A question came in via the blog from Hans regarding our implementation of hierarchical tags using XMP. He writes:



You chose to use a forward slash as a delimiter (path/path/keyword), which requires that this character cannot be used within a keyword.


Two problems with this approach:
1, This limitation is not in the XMP standard, XML has also an explicit, different way of how special characters are supposed to be handeled.
2, Since XMP allows forward slash – all other apps allow them, too. You can freely enter “Bugs Bunny/Mickey Mouse” or “nuclear/atomic” as a keyword. BUT: Vista Photo Gallery (and Vista Explorer) will think of it as a hierarchical keyword. Not good.


Thanks for the question, Hans.  I’m sorry to hear that our implementation of hierarchical keywords is causing you problems. We actually made a conscious decision to allow users to manually enter a slash in a keyword (as in “Washington/Seattle”) and have that interpreted as a hierarchy. We chose this solution with the behavior of other XMP-supporting applications in mind.


It is true that XMP doesn’t support any notion of hierarchical keywords, but we didn’t want to invent a brand new way of storing this metadata. We chose to use a simple character separator for hierarchical keywords so that the hierarchy would be visible and available for all of the existing applications that support XMP.


Why slash versus something else? There is no perfect choice here. We wanted to use a character that would show up correctly in every application that didn’t support hierarchical keywords (so extended Unicode characters were out of the question) and that would make sense when users saw it (which would not be true for, say, “}”). Additionally, we wanted to choose a character that was not as likely to be used in a normal keyword (which would rule out “.” or “-“, for instance).


This left us very few choices. The backslash was considered too DOS/Windows-specific. Ideally, we *do* want third-party applications that support XMP to adopt this convention, so we didn’t want to use a solution that seemed Windows-specific. The slash character existed in English punctuation long before computers. Other possibilities were “:” (similarly Mac-specific) and “|” (which is very hard to read when placed between words).


We knew there was a risk that someone might use a slash in a keyword (such as the examples you gave), but believed these cases would be rare and easily avoided. Without knowing the specifics of your scenario, I would imagine that you could enter “Bugs Bunny” and “Mickey Mouse” as separate tags, which would be more useful for searching and browsing.


Again, thank you for contacting us about this issue.  Getting feedback on how our solutions work in the real world is critical to making ongoing improvements in future versions.


– David Parlin, Principal Program Manager


Comments (3)

  1. Bil lD says:

    It seems the whole metadata construct must have been totally rushed for Vista and Microsoft Photo Info Tool.

    The whole point about XMP is that it is supposed to be extensible, so a company with MS’s resources should have been able to come up with an appropriate schema to address a set of hierarchies without having to resort to a common character such as "/" to delimit such hierarchies.

    The Photo Info Tool is also really flawed. There is a lot of information available on the internet about photo metadata that seems to have been ignored. See

    http://www.cpanforum.com/threads/4113

  2. Miniboss says:

    David,

    Thanks for discussing this issue. I am not so sure your decision for implementing hierarchical keywords under the Dublin Core element <dc:subject> cuts to the bone of the intention of this element. This is what Dublin Core says in the specs about <dc:subject>: "Typically, the topic will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element.". So in essence, you could write whatever you want. But if you study the description of the standard, you can see that there are explicit suggestions to use Controlled Vocabularies (classification codes) or refining elements. Your brand new colleagues from iView (the future MS Expression Media), along with the Adobe Lightroom developers, did the only logical thing: they used a separate XMP namespace for hierarchical categories.

    With xmlns:MicrosoftPhoto, you already established such a new namespace, currently used by WPF. It would not be a big deal to simply talk to the WPF folks and extend that for this purpose. You can still write the keywords (without the path) into <dc:subject>. This type of dualism is already in place with the photo rating (gets written back into the XMP block as as xmp:rating as well as MicrosoftPhoto:rating)

    What would you gain? Interoperability, less pain for the user, more compliance to the XMP standard. I would not underestimate the the folks that use Photoshop for many years and used the slash character. There is really no need to have such a limitation. I’d be more than delighted if I could make you revisit this issue for the next service pack.

    🙂

    Hans Fremuth

    http://www.thumbsplus.de

  3. Miniboss says:

    Practical example:

    Many people will probably have a category PLACES. Makes sense – unless you live in on of the largest cities in Germany, refered to as Frankfurt/Main. There is also another Frankfurt, refered to as Frankfurt/Oder. Germany does not add a state acronym behind the city, it is very common to identify this with a qualifier (Frankfurt/Main means that Frankfurt that is located at the Main river).

    Poor people in Frankfurt/Main or similar towns – they have to name their town different if they want to tag them with Photo Gallery. Or, worse, they get a surprise in case they already tagged them in Photoshop or another application (which allows this).

    I bet there are other common occurences for the slash beeing an integral part of a term that can serve as a meaningful tag.

    😉

    Hans Fremuth