Reading XML File With JScript


I am Titus working as a SDET in JScript team. Sometime back I came across a situation where the requirement was to pass a XML file and get a Tree Listing back. The Tree Listing should have all nodes in the file along with proper parent/child relationship as well as a good way to differentiate between nodes with/without values. Let’s call nodes with value as properties. I achieved this by using JScript. In this blog you will learn how to read/parse XML file using Microsoft’s XML DOM and use this to create the Tree Listing. 

Let’s take a sample XML file, say test.xml (can be a URL or a file on your system) to get a clear picture of the kind of Tree Listing required and later we will look at the actual code.

 

The XML file can be looked as

Root Node, name is BookList, has 2 child nodes

childnode0, name is Book and has two properties,

Prop0: Author has a value Paul

Prop1: Price has a value 10.3

childnode1, name is Book and has three properties

Prop0: Author has a value Joe

Prop1: Price has a value 20.95

Prop2: Title has a value Web 2.0

The Required Tree Listing after parsing test.xml is

nName

NodeName

nValue

NodeValue

cNodes

List of Child Nodes

cProps

List of Child Properties

 

The ReadXMLFile function in the code listing below returns the Tree Listing as required.

Many a times you know the XML file contents and are interested in the list of only a specific node. Making a call to ReadXMLFile with second argument as the node name gives just such a list.

Referring test.xml, a call to ReadXMLFile(“test.xml”, “Author”) gives a list like

Whereas a call to ReadXMLFile(“test.xml”, “Book”), returns the list like the below one

If you have carefully noticed the Tree listing, cNodes as well as cProps is an Array. so by using the proper index value, one can reach the desired node.

Here goes the actual code:

var NODE_ELEMENT = 1;
 
var NODE_ATTRIBUTE = 2;
 
var NODE_TEXT = 3;
 
/**** INTERNALLY USED FUNCTIONS ****/
 
/*
* Builds up xmlNode list on parentXMLNode
* by iterating over each node in childNodesLst
*/
 
function getXMLNodeList_1(childNodesLst,
 
parentXMLNode)
 
{
 
    var i;
 
    var curNode;
 
    var arrLen
 
    //traverse nodelist to get nodevalues and all child nodes
 
    for (i = 0; i < childNodesLst.length; i++) {
 
        //we will ignore all other node types like
 
        //NODE_ATTRIBUTE, NODE_CDATA_SECTION, …
 
        if (childNodesLst[i].nodeType == NODE_ELEMENT
 
        || childNodesLst[i].nodeType == NODE_TEXT) {
 
            if (childNodesLst[i].nodeType == NODE_TEXT) {
 
                //we got the value of the parent node, populate
 
                //parent node and return back
 
                parentXMLNode.nValue = childNodesLst[i].nodeValue;
 
                return;
 
            }
 
            //we have a new NODE_ELEMENT node
 
            curNode = new XMLNode(childNodesLst[i].nodeName, childNodesLst[i].nodeValue);
 
            if (childNodesLst[i].hasChildNodes) {
 
                getXMLNodeList_1(childNodesLst[i].childNodes, curNode);
 
                if (curNode.nValue != null) {
 
                    //we need to add this as a property to the parent node
 
                    if (parentXMLNode.cProps == null) {
 
                        parentXMLNode.cProps = new Array();
 
                        parentXMLNode.hasCProps = true;
 
                    }
 
                    arrLen = parentXMLNode.cProps.length;
 
                    parentXMLNode.cProps[arrLen] = curNode;
 
                } else {
 
                    //we need to add this as child node to the parent node
 
                    if (parentXMLNode.cNodes == null) {
 
                        parentXMLNode.cNodes = new Array();
 
                        parentXMLNode.hasCNodes = true;
 
                    }
 
                    arrLen = parentXMLNode.cNodes.length;
 
                    parentXMLNode.cNodes[arrLen] = curNode;
 
                }
 
            } else {
 
                //no use of such a node
 
                //mark currNode as null for GC collection
 
                curNode = null;
 
            }
 
        }
 
    }
 
    return;
 
}
 
/*
* Generates appropriate XMLNodeList from nodes
* in childNodes
*/
 
function getXMLNodeList(childNodes)
 
{
 
    var xmlNode = new XMLNode(null, null);
 
    getXMLNodeList_1(childNodes, xmlNode);
 
    var xmlNodeList = null;
 
    if (xmlNode.hasCNodes) {
 
        xmlNodeList = xmlNode.cNodes;
 
    } else if (xmlNode.hasCProps) {
 
        xmlNodeList = xmlNode.cProps;
 
    }
 
    return xmlNodeList;
 
}
 
/**** INTERNALLY USED FUNCTIONS ****/
 
/* XMLNde DataStruct */
 
functionXMLNode(ndName, ndVal)
 
{
 
    this.nName = ndName; //XMLNode name
 
    this.nValue = ndVal; //the value(if any) associated with XMLNode
 
    //As of now only property nodes have associated values
 
    this.hasCNodes = false; //Bool to mark presense of Child Nodes
 
    this.cNodes = null; //List of child nodes (of type XMLNode)
 
    this.hasCProps = false; //Bool to mark presense of Property Nodes
 
    this.cProps = null; //List of property nodes (of type XMLNode)
 
}
 
/* Exposed Functions */
 
function ReadXMLFile(fileName, tagName)
 
{
 
    if (arguments.length < 1 || arguments.length > 2)
 
    return null;
 
    var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
 
    //load the file sync'ly
 
    xmlDoc.async = false
 
    try {
 
        xmlDoc.load(fileName);
 
    } catch(e) {
 
        //failed to load xml file
 
        return null;
 
    }
 
    //lets get the child nodes
 
    var childNodes = null;
 
    if (arguments.length == 2) {
 
        try {
 
            childNodes = xmlDoc.getElementsByTagName(tagName);
 
        } catch(e) {
 
            return null;
 
        }
 
    } else {
 
        childNodes = xmlDoc.childNodes;
 
    }
 
    return (getXMLNodeList(childNodes));
 
}
 
var xmlNodes;
 
xmlNodes = ReadXMLFile("http://www.noweb.com/test.xml");
 
//For a file on you system
 
//xmlNodes = ReadXMLFile ("C:\\My Documents\\test.xml");
 
//root node name is
 
var RootNodeName = xmlNodes[0].nName;
 
xmlNodes = ReadXMLFile("http://www.noweb.com/test.xml", "Book");
 
var cntBooks = xmlNodes.length;
 
xmlNodes = ReadXMLFile("http://www.noweb.com/test.xml", "Author");
 
var authorName = xmlNodes[0].nValue;

Hope you enjoyed the blog!

Thanks,

Titus

Comments (12)

  1. Jeria says:

    Instead of using JavaScript "constants" for the node types, why not implement it properly so that the element returns the proper integer code:

    http://developer.mozilla.org/en/docs/DOM:element.nodeType

  2. prakash says:

    How to make it work in firefox?

  3. how about e4x says:

    here’s an even better idea:

    http://en.wikipedia.org/wiki/E4X

    also already part of FF and ECMA4/ AS3.

    a lot less convoluted and a lot more elegant looking that the solution above.

  4. TNO says:

    I’m pretty sure the arguments collection is being deprecated. I know JScript.NET doesnt support it, and the ES4 committee is looking for a excuse to drop it last I checked.

    Also, you have a typo on your XMLNode function.

    I’m also curious to why you use new XMLNode() instead of just returning a new object. Your method takes up more memory (holding onto prototype) and takes longer to execute in comparison to:

    function XMLNode(ndName,ndVal){

       return {

           nName : ndName,

           nValue : ndVal,

           hasCNodes : false,

           cNodes : null,

           hasCProps : false,

           cProps : null

       }

    }

    It may sound like splitting hairs, but if I had to parse a few megabytes, I just might want to have things run faster and smaller.

  5. anphanax says:

    Is there ever a different between MSXML.DOMDocument and Microsoft.XMLDOM? I checked the registry on my machine and they’re both going to CLSID {2933BF90-7B36-11D2-B20E-00C04F983E60}. That goes to "%SystemRoot%system32msxml3.dll", and my understanding is that that’s version 3 of the library.

    Why not use "MSXML2.DOMDocument.6.0", which maps to CLSID {88d96a05-f192-11d4-a65f-0040963251e5} (which uses c:WINDOWSsystem32msxml6.dll) instead?

    Unless I’m insane or mis-remembering, Version 6 performs some operations a lot faster than 3 (like selectNodes()). Yeah, if a 7 comes out the code will need adjustment, but from what I’ve seen I don’t mind making a minor adjustment to a constant somewhere.

    See http://blogs.msdn.com/xmlteam/archive/2006/10/23/using-the-right-version-of-msxml-in-internet-explorer.aspx

  6. Yawn.  This is a great exercise for a CS 101 class, but converting an XML document to its JavaScript-native equivalent representation has been done a thousand times already and is a foundational skill for a web developer (I sometimes use it as an interview question).  Google "XML to JSON" and you get the idea.

    Now, some native browser support for E4X would be nice.

  7. Gerome says:

    What?!

    1.) Fix the Node Constants in JScript: [bug 256]

    http://webbugtrack.blogspot.com/2007/10/bug-256-dom-nodetype-constants-are-not.html

    2.) Why on earth are you using ActiveX for this?  What part of Web Standards slipped by you?

    Use XMLHTTPRequest (the "almost" native) one added in IE7, with a fallback to ActiveX _only_ if the user is on a really old version of IE.

    3.) Does the term JSON ring a bell? It has been around for ages, and does what you are trying to do a 100 times better.

  8. Titus says:

    @Gerome: The reason for using ActiveX was we wanted it to work even from non browser-hosts especially cscript.

    @TMO: Thanks for pointing out the extra memory consumption. Actually the intended audience for the blog is mainly novice Jscript programmer who wants to learn how to parse an xml file using jscript, so we overlooked on memory, performance optimizations.

    Thanks all for your invaluable comments.

  9. Ion Todirel says:

    but but, this will only work in IE, I don’t think any developers who do cross-browser app’s will actually use this

  10. Dusan says:

    What a nice example telling us clearly why is JSON so much better than XML.

    Just store your books in an js file like this :

    var BookList =  

    [

     { Author: "Paul", Price : 10.30 } ,

     { Author: "Joe", Price : 20.95, Title: "Web 2.0" }

    ] ;

    Why XML ?

  11. Marcel says:

    More vendor lock in with code that ties corporations to IE, I really wish no one used proprietry browser code in this day and age…

    If you want to provide a serious method for XML then E4X is still waiting to be put into IE. Please oh please by IE9.

  12. but but, this will only work in IE, I don’t think any developers who do cross-browser app’s will actually use this

Skip to main content