Regex 101 Exercise S3 - Validate a zip+4 zip code - Discussion

Exercise S3 - Validate a zip+4 zip code.

The US has a 5 digit zip code with an optional 4 digit suffix. Write a regex to validate that the input is in the proper format:

Sample strings

98008
98008-4893

****

This one is fairly similar to what we've done in the past. The most obvious way to match the first chunk of digits ("chunk" is a regex "term of art" that refers to a section of characters that you want to match (not really...)). We can do that with :

\d{5}

And we can easily match the second version with:

\d{5}-\d{4}

I got an email recently where the writer asked, "I've heard that it sometimes makes more sense to use two regexes rather than a single more complex one". Though crafting a single regex that covers all the cases can be an interesting intellectual exercise (a good idea if you want to avoid the heartbreak of flabby neurons), it sometimes makes more sense to cut your losses and simply use several regexes in sequence, and get out of work before happy hour is over.

Which is a long-winded way to say that I could just declare victory at this point, but that wouldn't be very educational (I desperately hoped to link to a .wav file of Daryl Hannah saying "edu cational" from Splash, but alas, repeated web searches proved fruitless). So, onward.

Regex provides an "or" option where you can match one of several things. To do that in this case, we would write:

^
(
\d{5}-\d{4} # zip + 4 format
| # or
\d{5} # standard zip format
)
$

which would match either of these. This is a reasonable way to write this match.

The final way is to use one of the quantifiers I discussed before. If we use the "?" quantifier, we can write:

^
\d{5}           # 5 character zip code
(-\d{4})?       # optional "+4" suffix
$

I think this would be my preferred solution.

Though we used parenthesis for grouping, they actually have other uses as well in regex. Tune in next week, where the word for the day will be "capture". Or, perhaps, "spongiferous".

What do you think of the series so far? If you've used regex before, it should seem simple to you. What would you change? What would you leave the same?