Regex 101 Exercise S3 – Validate a zip+4 zip code – Discussion


Exercise S3 – Validate a zip+4 zip code.


The US has a 5 digit zip code with an optional 4 digit suffix. Write a regex to validate that the input is in the proper format:


Sample strings


98008
98008-4893


****


This one is fairly similar to what we’ve done in the past. The most obvious way to match the first chunk of digits (“chunk” is a regex “term of art” that refers to a section of characters that you want to match (not really…)). We can do that with :


\d{5}


And we can easily match the second version with:


\d{5}-\d{4}


I got an email recently where the writer asked, “I’ve heard that it sometimes makes more sense to use two regexes rather than a single more complex one”. Though crafting a single regex that covers all the cases can be an interesting intellectual exercise (a good idea if you want to avoid the heartbreak of flabby neurons), it sometimes makes more sense to cut your losses and simply use several regexes in sequence, and get out of work before happy hour is over.


Which is a long-winded way to say that I could just declare victory at this point, but that wouldn’t be very educational (I desperately hoped to link to a .wav file of Daryl Hannah saying “edu cational” from Splash, but alas, repeated web searches proved fruitless). So, onward.


Regex provides an “or” option where you can match one of several things. To do that in this case, we would write:


^
(
\d{5}-\d{4}     # zip + 4 format
|              # or
\d{5}           # standard zip format
)
$


which would match either of these. This is a reasonable way to write this match.


The final way is to use one of the quantifiers I discussed before. If we use the “?” quantifier, we can write:


^
\d{5}           # 5 character zip code
(-\d{4})?       # optional “+4” suffix
$


I think this would be my preferred solution.


Though we used parenthesis for grouping, they actually have other uses as well in regex. Tune in next week, where the word for the day will be “capture”. Or, perhaps, “spongiferous”.


What do you think of the series so far? If you’ve used regex before, it should seem simple to you. What would you change? What would you leave the same?


 

Comments (8)

  1. Vladimit says:

    To validate an input, i was using this pattern:

    ————————

    ^d{5}(?:-d{4})?$

    ————————

    it is very similar to yours, but makes input like this:

    99999-999999999 incorrect

  2. ericgu says:

    Vladimit,

    I don’t understand your comment. The only difference between your regex and mine is that you used the non-capture "?:" inside the parenthesis.

    Eric

  3. Ron Mexico says:

    I like the series. I have already read up on regexes fairly well, but for some reason I just can’t deal with them too well. In my opinion no discussion on regex is too simple.

  4. Chris Haas says:

    I like the series, too. I’ve done some pretty deep regex before for screen scraping but I always enjoy re-learning something since I usually learn at least one new thing. And right away on the first or second day I learned about IgnorePatternWhitespace, something I always saw but never took the time to figure out what it was. The only thing that would be nice for me would be if you showed the regex in both expanded form with comments as well as the condensed one line version. After so many years of seeing single lined regular expressions I actually have a harder time reading them when you put them on multiple lines. But otherwise keep it up!

  5. dru says:

    —————–v——

    ^d{5}(?:-d{4})?$

    —————–^——

    I saw this($) as the the key element in Vladimit regex.

    Love the series. Can’t wait for Search and Replace.

  6. Steve Henke says:

    Great series. This is a nice way to gradually learn and relearn concepts that I haven’t taken the time to fully understand.

    What is the history of some of these patterns and symbols, e.g. why ^ and $? Are there pattern matching differences between .NET and other implementations, or only in way .NET classes provide for handling of searches, matches, etc.?

  7. Brian says:

    I’m enjoying the series. I’ve never really had to use regex before, though I can see where it will be useful in the future.

  8. It’s a great series. For those that don’t have regular expression experience, it is very valuable.

    I don’t have experience of this kind myself, all I’ve done is read a book or two, and I am finding the classroom format (read, assignment, discussion) to be very helpful.