Regex 101 Exercise S2 - Verify a string is a hex number - Discussion

[Update: got rid of the 0x in front of the sample string...]

Our task was the following:

S2 - Verify a string is a hex number

Given a string, verify that it contains only the digits 0-9 and the letters a through f (either in uppercase or lowercase).

Sample string:

00A838FF

-------------------------------------

We talked about character classes last time. The character class to match the valid characters is:

[0-9a-fA-F]

We now need to get a string of those. The simplest way to do this is to use one of the predefined quantifiers:

[0-9a-fA-F]+

where "+" means "specifies one or more matches". There are predefined ones - "*" means "zero or more matches", and "?" means "zero or one matches". All three of these quantifiers are what is known in regex circles as "greedy" - they match as many characters as possible. In other words, if you use "+", the engine will choose a match of 100 characters over a match of 1 character if given the freedom to match. In this case, that means that if you match against:

0000ABCD

the expression will match that whole string, not just the first "0". We will talk about greediness at length in later exercises, so don't fret about it now.

As we did last week, we would add the anchors to make sure we're matching the whole string:

^[0-9a-fA-F]+$

and we've satisfied the goal of the exercise. Note that because of the anchors, there is only one possible match, and greediness doesn't enter into the picture.

Are we done? Well, mostly. It might be that what you wanted was to limit the hex number to 8 characters (so it would fit into 4 bytes). Doing that is left as an exercise to the reader...

All these shortcut quantifiers - "*", "+", and "?" - are really just simpler ways to writing quantifiers using the full version of the "{<n>}" syntax that I discussed last week.