As part of a separate task, the XML team came up with a list of frequently encountered issues in System.XML; mainly points that we felt were interesting because they were the source of a lot of difficulty for our users. These questions ranged from rarely used (or misused) methods to difficult XML constructs. We focused specifically on scenarios that were particularly difficult to debug. When we had completed the exercise, it occurred to the team that we should publish the list.
So what follows is the first in a multi-part series in which we will outline each of the questions we defined and provide an explanation of the correct usage, along with some sample code. However, our motives are not all selfless in this exercise; we are hoping to hear back from you in the comments section if there are any good questions we have missed. Let us know!
Q1: Invalid Literals
There are a number of reserved characters in XML that cannot be included as literals in an XML string – characters such as “&” and “<”. These characters must be escaped and the XML standard provides three methods for doing this: characters references (&), entity references (&) and CDATA sections.
This idea will seem like basic XML101 information for experienced XML users but it can be a difficult pain point for those new to XML. The problem is compounded because the error messages returned by the XML processor can be confusing. The error is also a simple one for a new user to stumble upon because the characters are often contained in regular text that may be used as the source for the XML data. The invalid literals are:
· & (&)
· < (<)
· > (>)
· ‘ (')
· “ (")
Here is a sample list of incorrect literal strings, along with the correct usage and the exception message that is given for the incorrect usage. The messages all produce a line and position number which greatly assist the debugging process.
Character String Correct Usage Exception Message A & B A & B An error occurred while parsing EntityName. Line X, position Y. A &c B A &c B ' ' is an unexpected token. The expected token is ';'. Line X, position Y. A &# B A &# B Invalid syntax for a decimal numeric entity reference. Line X, position Y. A < B A < B Name cannot begin with the ' ' character, hexadecimal value 0x20. Line X, position Y.
A & B
A & B
An error occurred while parsing EntityName. Line X, position Y.
A &c B
A &c B
' ' is an unexpected token. The expected token is ';'. Line X, position Y.
A &# B
A &# B
Invalid syntax for a decimal numeric entity reference. Line X, position Y.
A < B
A < B
Name cannot begin with the ' ' character, hexadecimal value 0x20. Line X, position Y.
Refer to section 2.4 of the XML Standard for more information on invalid literals.
Note: There will be a later post that deals with invalid XML characters in general and the correct way to deal with them.
Some future FAQ post topics include but are not limited to:
- Conformance Levels
- Reporting Validation Warnings
- Correct Encodings
- ProhibitDTD setting
- XLinq Bridge Classes
Program Manager | Data Programmability