Regex 101 Discussion I6 - Remove font directives from HTML

Regex 101 Exercise I6 - Remove font directives from HTML

Remove all <font…> or </font> directives from an HTML string.

*****

I've decided to start linking my answers back to the original posts, since the answers given there are often as good or better than the one that I give.

The most obvious way to write this is:

<font.*>|</font>

That's pretty straightforward - match either a <font...>, or a </font>. But it's also wrong, since the ">" in the first part will match the last ">" in the string. We need the non-greedy qualifier:

<font.*?>|</font>

That does what we want it to do (assuming we use singleline and ignorecase options...)

Other ways of doing this showed up in the comments. Maurits suggested using 3 regexes, or a simple one:

</?font.*?>

I don't know whether I prefer that one over mine. It is shorter, though it's a bit harder for me to read the /? part.

Kbiel suggest a version without the non-greedy option:

</?font[^>]*>

which also works well, though I prefer the non-greedy version due to readability.