String.Unformat: I've created a monster

I don’t know if you’re the same, but when coding away I often find myself wishing for a String.Unformat function – call it the evil twin of String.Format. With String.Format I can build up strings like this;

var result = String.Format(

     "https://{0}:{1}/{2}",

     "localhost",

     "12345",

     "TestPage.aspx");

... which will return "https://localhost:12345/TestPage.aspx".

But what if I want to do the opposite; split out the contents of this URL, in one easy statement, much like String.Format? What if I could do this...?

string input = @"https://localhost:12345/TestPage.aspx";

object[] results = input.Unformat(@"https://{0}:{1}/{2}");

CollectionAssert.AreEquivalent(

    new object[] { "localhost", "12345", "TestPage.aspx" },

    results);

Of course something similar is already possible with Regular Expressions, but I find them much more unwieldy, they result in longer code, they’re overkill for what I’m trying to achieve, and they’re difficult to remember the syntax (oh, and even harder to get it right when coding late at night!). Take this equivalent example and judge for yourself;

string input = @"https://localhost:12345/TestPage.aspx";

string matchingformat = @"^https://(?<C1>.+):(?<C2>.+)/(?<C3>.+)$";

Regex expression = new Regex(matchingformat);

Match match = expression.Match(input);

Assert.AreEqual<object>("localhost", match.Groups["C1"].Value);

Assert.AreEqual<object>("12345", match.Groups["C2"].Value);

Assert.AreEqual<object>("TestPage.aspx", match.Groups["C3"].Value);

To get around this I have created my own String.Unformat function... all it does is take a simple format string and convert it into a regular expression by applying a number of transforms to it. And how do we do these transforms? With regular expressions, of course J

Now before we dive in here, I must make a point. This is not a replacement for regular expressions. It will be difficult to get the precision you need for anything complex – I’m only after a quick and simple solution to the basic case problem. Knowing this, read on...

How it does it

My sample code (and a bunch of demonstrations in the form of unit tests) is attached. Basically there are five phases to the code;

1. It escapes any special Regular Expression characters (e.g. ^, [, ], $ and so on) with a backslash so that they don’t accidentally affect the matching... they are assumed to be literal in the input string.

2. It replaces the {0} match syntax with regular expression (?<C1>.+) syntax.

3. It adds begin (^) and end ($) markers to the match string.

4. It performs the match.

5. It loops through the results, extracting matches and adding them to an object array to return.

Have a look in the attached and see what you think.

But...

The regular expressions I’m using are compiled where possible... but of course I’m applying multiple transforms (i.e. regular expressions) to a string just to do a simple match.

What does this mean? It means this will always be slower than if you just wrote a regular expression yourself (check out here and here if you want to)... so if performance really is that key to your code, or you’re doing this in a long loop, you might want to avoid it. If on the other hand you can see a great use for this when performance isn’t key – give it a go and let me know what you think!

I have also not had time to write a full suite of real unit tests, so if you find a bug shout up.

Conclusion

What do you think? Have I created a monster, or cooked up a treat?

After writing this I came across an interesting conversation here... it seems I have not been alone in my desires for String.Unformat.

 

StringFormatting.zip