Regular Expressions (REGEX): Basic symbols


Welcome back to the RegEx guide. Last post we talked a little bit about the basics of RegEx and its uses. I mentioned the most important thing is to understand the symbols. Today we'll ease in with some of the basics to get us going, but later we will expand on these and see some other options we have.

. is used to represent any single character, aside from a newline, so it will feel very similar to the windows wildcard ?

\ is the escape character for RegEx, the escape character has two jobs:

  1. Take special properties away from special characters: \. would be used to represent a literal dot character. \\ is used for a literal back slash character.
  2. Add special properties to a normal character: \d is used to look for any digit (we'll see more of these in a bit)

We can use {} to specify quantity in a few different ways by attaching them to characters or symbols.

  1. {exact number} so something like \d{2} says "look for exactly two digits"
  2. {min,max} so something like \d{2,4} says "look for at least two digits, but keep grabbing them until you have more than 4"
  3. {min,} will check for a minimum with no max cap, so \d{2,} says "look for at least 2 digits, but keep grabbing them until you see something that isn't a digit"
  4. + is a shortcut for {1,0} so you can say "one or more"
  5. * is a shortcut for {0,} so you can say "zero or more" (be careful with that one!)
  6. ? is {0,1} so you can say "this may or may not be here". Could be useful for links that may or may not have an "s" for "http"/"https"

Character classes like \d are the real meat & potatoes for building out RegEx, and getting some useful patterns. These are case sensitive (lowercase), and we will talk about the uppercase version in another post. Three of these are the most common to get started:

  1. \d looks for digits
  2. \s looks for whitespace
  3. \w looks for word characters
  4. We will talk about \p in a future post to match more specific symbol groups.

Lets put it together and try a couple things. We'll still use -match and $matches[0] for now, but we'll use some other things to leverage RegEx once we are comfortable with the basic symbols.

We'll use the same shell as we had in the last post and the same MOCK_DATA as before. This time, lets match emails. Try it yourself first!

Hint:

 
The emails seem to all be First letter followed by last name, so just some word character. Then an @ symbol, more word character, then a dot, then more word characters! 

Answer:

"\w+@\w+\.\w+"

Putting it together:

#grab our data
$file = get-content "$PSScriptRoot\MOCK_DATA.txt"

#make our pattern
$regex = "\w+@\w+\.\w+"

#loop through each line
foreach ($line in $file)
{
#if our line contains our pattern, write the matched data to the screen
if($line -match $regex)
{
$matches[0]
}
}

Results:

bseamon0@bbc.co
 jbrotherwood1@house.gov
 jduffan2@google.ru
 eleates3@home.pl
 spaquet4@about.com
 ltrainer5@squarespace.com
 kgrotty6@pinterest.com
 chilliam7@amazon.co
 mlumber8@reference.com
 chuitson9@free.fr
 ntrewa@imgur.com
 iferneyhoughb@jigsy.com
 washlingc@slideshare.net
 acrushamd@flavors.me
 blundbecke@unblog.fr
 aadairf@spiegel.de
 bwilderg@photobucket.com
 mcurrmh@shareasale.com
 baberkirderi@netscape.com
 rgrzelewskij@twitpic.com
 rproomk@reddit.com
 tnernl@deviantart.com
 cgodartm@reverbnation.com
 xbosdetn@xing.com
 ktippetto@webs.com
 ameneyerp@illinois.edu
 jhicksq@amazon.com
 bspoorsr@answers.com
 lbriffetts@businessweek.com
 tmethringhamt@instagram.com
 mberryu@businessinsider.com
 mschankev@blog.com
 lgoodredgew@tinyurl.com
 dgoaksx@timesonline.co
 ncornuauy@about.com
 msculleyz@wisc.edu
 abenettolo10@dot.gov
 ipaaso11@cdc.gov
 hdowse12@usatoday.com
 splacidi13@dyndns.org
 rdadswell14@newsvine.com
 csalsberg15@telegraph.co
 cpimmocke16@senate.gov
 jvader17@disqus.com
 amerton18@jimdo.com
 eclitsome19@clickbank.net
 jmelmore1a@elpais.com
 hscotney1b@soundcloud.com
 rcouling1c@statcounter.com
 ecowpland1d@myspace.com

Once again, you can find it on git here.

Hope you're enjoying RegEx so far, and starting to see how it can be pretty useful! Next time we will take a look at grouping to extract different pieces of data, and using [regex]instead of just $matches.

As always, don't forget to rate, comment and share! Let me know what you think of the content and what topics you'd like to see me blog about in the future.

Comments (1)

  1. Dave Rendón says:

    very useful, thanks for sharing!

Skip to main content