Regex 101 Discussion S6 - Change the extension on a file

Regex 101 Exercise S6 - Change the extension on a file

Given a filename including path, change the extension to .out.

Input string example: 

C:\utility\Processor.cs

*****

I said in the exercise description that this take a bit of care. One first blush (what a weird turn of phrase), one might think that this is a simple problem. But when you dig into it a little deeper, you will find that it remains a simple problem, mostly because the regex defaults give you the right behavior in this case. But not always, so I'm going use this as a stepping off point to talk about something that is close to many people's hearts this time of the year.

I'd like to talk about greed.

Though Michael Douglass may have said that "Greed is Good" in Wall Street, things aren't so clear-cut in the world of regular expressions. I started to write something about greediness and non-greediness, but then I realized that I already had. So go read that, and get back to me.

Now, back to the exercise. :

If you are a seasoned regex professional, you are likely used to writing non-greedy expressions more often, because they are on the whole more well-behaved than greedy ones. So, here's the first thing you probably wrote:

(?<Path>.+?)
\.
(?<Extension>.+)

Which works fine on the example I gave, but if you add in a few more test cases:

C:\utility\processor.test.cs
C:\utility\fun.stuff\processor.cs

You'll find that it not working correctly. The problem is that the first match is a non-greedy one, so it's giving you a minimal match - a match up to the first period, not to the last one. If you switch to greedy on the path match, things work right:

(?<Path>.+)
\.
(?<Extension>.+)

and the replacement string to use with this is simply:

${Path}.out

that was less than earthshaking, but I did notice that most of the respondents to the original post got the answer wrong, so at least it wasn't totally trivial.

Bonus exercise. Change the "Extension" match to be non-greedy (.+?), and explain the results.

So that's the last of the simple exercises, though looking at the intermediate ones, they don't really get that much harder.