A nifty regular expression for replacing XML syntax


Someone recently asked me a regular expression query and I thought it was interesting and post it as an example here.  They wanted to find all occurances of the following style of XML:

<foobar>

      < property = “blah” />

</foobar>

And they wanted to replace all of the above into one string

<foobar property = “blah”/>

And here is the nifty regular find what and replace with expression that does it all for you!  I highlighted the tagged expressions in pink.

Find what:  \<{:w}\>:Wh*\<{:Wh*:a+:Wh*=:Wh*”:a+”:Wh*}/\>:Wh*\</\1\>

Notice that the characters < > all needed to be escaped via the escape character “\” since we want the actual characters themselves

Also, recall that :w is any alphabetic string, :Wh is any white space, and :a is any alphanumeric string

Based on my previous tutorials, I’ll leave it to you to thoroughly decipher this regex query 🙂

As for the replace:

Replace with: <\1\2/>

Your replace string become <TaggedExpression1TaggedExpression2/>

Regards,

Fiona


Comments (6)

  1. Replacing XML with text based regular expressions is usually wrong unless you are 100 percent sure of what the contents of the file will be.

  2. Rob says:

    Are there plans to include full support for the regular expressions, as they are currently supported in .NET, in some future version of VS.NET? It would be great not to have to keep track of two different sets of "regular expression" syntax… for instance, w instead of :a, s instead of :Wh, etc…

  3. Fiona says:

    We are looking to integrate the .Net regular expression for a future version of VS. Thanks for your post!

    -Fiona

  4. rien says:

    Well, all this seems nice but:

    – your starting example is not well formed XML, it is not even XML at all;

    – stating that element types are only made of alphabetic characters is very restrictive;

    – attributes values may not only be enclosed in double quotes but also in single quotes, which case your expression does not cover;

    – and after all, is not XSLT supposed to do the job well better than those regexp thingy ?