Why (good) Xml is much better than plain text


There are many reasons, sure, and probably there are also reasons why plain text files can be better, but I would like to remark just only one reason, just because I fighting with it right now:


Xml is human readable


Or at least, it should be.


I’m dealing with the HL7 standard for healthcare. HL7 files are text files with some strange delimiters such ^ and |. Luckily we can use the BizTalk HL7 Accelerator, that allow us to abstract from the HL7 details.


A sample of an HL7 file:


MSH|^~\&|REG|MCM|BTS||199601121005||ADT^A04|000001|P|2.2
EVN|A04|199601121005||01||199601121000
PID|||191919^^^MYHOS^MR~123-45-6789^^^USSSA^SS|253763|SMITH^JOHN^Q||19560129|M|||123MAIN^^BUFFALO^NY^98052^""||(123)555-0100||S|M|10199925^^^MYHOS^AN|123-45-6789
PD1|S|F|NormalString^A^+1^-1^ISO^simpletext&Test&HCD^GI^simpletext&NormalString&ISO^I|NormalString^Test&Test^Test
^Test


^Test^Test^AE^simpletext^simpletext&Test&ISO
^P^NormalString^M10^MC^simpletext&NormalString&HCD^A|N|simpletext|I|I|N|NormalString^+1^M11^


simpletext&NormalString&L,M,N^RRI^simpletext&
NormalString&HCD|NOVALUE^NormalString^Test^Test^NormalString^Test|N
PV1|1|I|2000^2012^01^hey&test&DNS^test^test^test^test^test||||004777^MILLER^CONNIE^A.|||SUR||||2|A0


Where is the Patient Name? is “the substring between the fifth and the sixth | (pipe), in the third line (the line starting with PID). And remember, spaces are represented as ^(strange little hat)


The HL7 Accelerator comes with Xsd schemas to map these flat files. A sample message type ADT A04 (the above) looks something like this (just a small piece):


<ns0:ADT_A04_22_GLO_DEF xmlns:ns0="http://microsoft.com/HealthCare/HL7/2X">
 <EVN_EventType>
  <EVN.1_EventTypeCode>A04</EVN.1_EventTypeCode>
  <EVN.2_DateTimeOfEvent>199601121005</EVN.2_DateTimeOfEvent>
  <EVN.3_DateTimePlannedEvent>199601121000</EVN.3_DateTimePlannedEvent>
  <EVN.4_EventReasonCode>01</EVN.4_EventReasonCode>
 </EVN_EventType>
 <PID_PatientIdentification>
  <PID.1_SetIdPatientId>191919</PID.1_SetIdPatientId>
  <PID.2_PatientIdExternalId>
    <
PID.5_PatientName>
       <
PN.0_FamiliyName>Doe</PN.0_FamiliyName>
       <
PN.1_GivenName>John</PN.1_GivenName>
    </
PID.5_PatientName>
[…]


we still deal with HL7 codes and semantic structure, but it’s much easier to work the Patient Name. It's located in “the FamilyName element under PatientIdentification” 🙂


Comments (8)
  1. Dz says:

    XML is good thing, just not for all cases in everyone’s life – for example what i don’t like about xml – it takes a lot of bytes to describe info, that is usefull just for developer, not for end user with dsl internet connection splited to 8 computers 🙂

    it’s not only about internet bandwich, what about server cpu waste, when you need all that info for developer to cipher?

    if you still want clearness in HL7 – write your formater, witch will make a tree from current HL7 block.

    it’s only my opinion, i have nothing against xml, just calculator is for calculating, although you can use it for fly killing.

  2. Tzury says:

    XML is the-facto a standard in cross-platform data transfering.

    And now, that al major companies set down in one room and got into agreement, let’s take advantage of this opportunity and make newer version with binary format. Anyway xml is targeting machines and not the human eye. Binary will make it faster and lighter!

  3. Travis Owens says:

    While XML does add a large amount of overhead to the file, this is totally irrelevant if the file is compressed and sent over the wire as text compresses 11 to 1 or better.

    ex: that 5.5meg XML file (which was 2.5megs as HL7) zips down to 500k.

    There are many articles about zipping webservices and using .zip in .Net

  4. Travis Owens says:

    I also forgot to mention that since HIPAA is now in place, I would greatly frown upon sending a file as plain text over the wire, even if the connection is encrypted.

    I see compression as a form of weak encryption and it blocks the most basic forms of sniffing.

  5. dz says:

    actualy it would be interesting to compare performance of compressing data + encrypting dictionary vs 3des.

  6. dhtoran says:

    Lots of comments! I just wanted to talk about ONE reason pro-Xml-against-Txt. But there are reasons in both directions. The worst thing about Xml is the space that it takes, of course.

    <FamilyName>Smith</FamilyName> takes about 30 characters just to say Smith! So if you need to save space or bandwith, Xml is not good.

    I think that Xml (or any tagged format) was not popular in the past late 80’s and early 90’s just because of the space and bandwith. Now, these things are not an issue (usually).

  7. John says:

    XML is great. HL7 V3 uses XML BUT it’s very poorly done. You should have a look. You’ll soon LOVE pipes and carets.

Comments are closed.

Skip to main content