Regex 101 Exercise I7 – Make sure all characters inside <> are uppercase


Regex 101 Exercise I7 – Make sure all characters inside <> are uppercase

 

Comments (13)

  1. Maurits says:

    Hmm… you mean, replace each lowercase character inside <> with its uppercase equivalent?  Probably best done with a MatchEvaluator…

    Have the regex look like this: (<.*?>)

    Have the MatchEvaluator return the String.ToUpper of the captured string

    That should do it!

    Of course, a cheap way to do it is just ToUpper()-ify the whole darn string… meets the requirements 😉

  2. Sheva says:

    Good call Maurits, but if you actually implement it in code, you will find it’s really quite tricky:

    Regex regex = new Regex(@"<(?<slash>/?)(?<tag>[^>]+)>", RegexOptions.Compiled | RegexOptions.IgnoreCase);

    String resultHtml = regex.Replace(inputHtml, delegate(Match match)

               {

                   String slash = match.Groups["slash"].Value;

                   String tag = match.Groups["tag"].Value.ToUpperInvariant();

                   return String.Equals(slash, "/") ? String.Format("</{0}>", tag) : String.Format("<{0}>", tag);

               });

  3. Sheva says:

    Wait a minute, why not ensure all html tag names are lowercase? anyway lowercase tag names are in compliance with XHTML specs.

    Sheva

  4. Maurits says:

    I was thinking more like this… after checking the docs I realized I don’t even need the parentheses.

    using System;

    using System.Text;

    using System.Text.RegularExpressions;

    class RegExSample

    {

    static string CapText(Match m)

    {

    return m.Value.ToUpper();

    }

    static void Main()

    {

    string text = "<b><u><i>example</i></u></b>";

    string pattern = "<.*?>";

    System.Console.WriteLine("text=[" + text + "]");

    string result = Regex.Replace(text, pattern,

    new MatchEvaluator(RegExSample.CapText));

    System.Console.WriteLine("result=[" + result + "]");

    }

    }

  5. Maurits says:

    Sheva, how did you get your lines to indent in the comment?

  6. Sheva says:

    Great, Maurits, you tell me something important, actually using capture here doesn’t make any sense.

    Regex regex = new Regex(@"<[^>]+>", RegexOptions.IgnoreCase);

    String resultHtml = regex.Replace(inputHtml, delegate(Match match)

    {

          return match.Value.ToUpperInvariant();

    });

    Console.WriteLine(resultHtml);

    As to your question, I just write thecode in VS, and copy it from there to here:)

    Sheva

  7. kbiel says:

    I too am confused about what Eric is trying to do with "Make sure all characters inside <> are uppercase".  We seem to be missing some context, like make sure with what action?  Should we replace with uppercase or do we just want to reject those tags that have lowercase in them for some reason?  What is the point of this exercise?

    Since Maurits and Sheva have shown ways to match and ToUpper, I’ll go the second route.  A match with the following pattern is a reject:

    (?<=<[^>]*)[a-z](?=[^>]*>)

  8. Sheva says:

    Kbiel, your regex pattern can only match the single charactor tag name for instance <b> <i> <p> etc, use this instead: (?<=<[^>]*)[a-zA-Z]+(?=[^>]*>)

  9. Maurits says:

    Actually, kbiel’s pattern does work and correctly rejects <aBC>, <AbC>, <ABc>, etc. while permitting <ABC>.

  10. Maurits says:

    Here’s another way, inspired by a simplified version of kbiel’s regex:

    < # the start of a tag

    [^>]* # any amount of stuff INSIDE THE TAG

    [p{IsLower}] # EGADS! A HORRIBLE LOWER-CASE CHARACTER! GET THE TORCHES AND PITCHFORKS!

    .*? # the rest of the tag

    > # the end of the tag