Regular Expressions (REGEX): Grouping & [RegEx]


Welcome back to the RegEx crash course. Last time we talked about the basic symbols we plan to use as our foundation. This week, we will be learning a new way to leverage our patterns for data extraction and how to rip our extracted data into pieces we care about.

[RegEx]

The [Regex] data type has some cool static members,  but we're mostly going to play with the plural method matches(<data>,<pattern>) if you don't know what static members are you can check this post or this help data.

A lot of the time, when we work with RegEx we are using it to extract everything that matches our pattern in a large amount of data. Using $matches like we did in the previous posts means we have to write a lot of looping and if statements. With [regex]::matches()we can condense all that and it could work on a big blob of text instead of just a list of individual lines. This means that if there is more than 1 match per line we can still get it!

If we take a look at some sample data that it returns, we can see that we actually get a pretty rich match object:

 
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 3534
Length : 23
Value : ecowpland1d@myspace.com

The thing we care about is the value property, but you'll notice it even tells you the starting character and how many characters long it is.

Let's take a look at how we might modify the email match from the earlier post to use this:

#grab our data as one big blob (-raw)
$file = get-content "$PSScriptRoot\MOCK_DATA.txt" -raw

#make our pattern
$regex = "\w+@\w+\.\w+"

#extract all matches and display the value property
[RegEx]::Matches($file,$regex).value

Now it all looks a lot sleeker, go [RegEx]!

Grouping

Grouping is a way that we can logically break up our extraction. I use this for 2 main reasons:

  1. The data I want isn't unique on its own, but the data around it is. Now I can match the unique piece and rip out what I want to use.
  2. The data I want is all there, but I plan to use pieces of it for different things

Grouping can be done by wrapping sections of your pattern in parenthesis. The full pattern will always match as group 0, which is why we were typing $matches[0] to start. Each individual group then gets pulled out in numerical order.

Maybe we grab some data by copy/pasting out of outlook and it looks like this: Brenda Seamon <bseamon0@bbc.co.uk> If its in a large block of text we might want to use RegEx to extract it like we have before. This time, we want to grab the First, Last, and Email to use for things. They're all in our data, and we can use grouping to pull them out individually.

Let's start by finding a pattern that gets all of our data. I used this one: $pattern = "\w+\s+\w+\s+<\w+@\w+\.\w+>"

  1. 1+ word characters (first name)
  2. 1+ space characters
  3. 1+ word characters (last name)
  4. 1+ space characters
  5. <
  6. The rest of our email pattern, like we used before.
  7. >

We can see that words with our test:

 
$data = "Brenda	Seamon	<bseamon0@bbc.co>"
$pattern = "\w+\s+\w+\s+<\w+@\w+\.\w+>"
$data -match $pattern
$matches[0]

Now that we know it works, lets try grouping up the pieces we want by putting parens around the first, last and email sections: $pattern = "(\w+)\s+(\w+)\s+"

 
$data = "Brenda	Seamon	<bseamon0@bbc.co>"
$pattern = "(\w+)\s+(\w+)\s<(\w+@\w+\.\w+)>"
$data -match $pattern
$matches[0]

"All match: {0}
First name: {1}
Last name: {2}
Email: {3}
" -f $matches[0],$matches[1],$matches[2],$matches[3]
 
All match: Brenda	Seamon	&amp;amp;amp;lt;bseamon0@bbc.co&amp;amp;amp;gt;
First name: Brenda
Last: name Seamon
Email: bseamon0@bbc.co

We can also name these groups using ?<NAME>inside of the parens. This makes our pattern start to look really bananas if we saw it without context $pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>"

 
$data = "Brenda	Seamon	<bseamon0@bbc.co>"
$pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>"
$data -match $pattern
$matches[0]

"All match: {0}
First name: {1}
Last name: {2}
Email: {3}
" -f $matches[0],$matches["first"],$matches["last"],$matches["email"]

Notice $matches is a hash table, and our group names become the keys. Let's try grabbing the groups using [RegEx]

$data = "Brenda	Seamon	<bseamon0@bbc.co>"
$pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>"
$results = [Regex]::Matches($data,$Pattern)
$results[0].groups["email"].value

Its a bit more work, since we need to keep track of which result we are on in a group of matches. This scales nicely for looping though!

 
$data = "Brenda	Seamon	<bseamon0@bbc.co>, Joeann	Brotherwood	<jbrotherwood1@house.gov>, Jake	Duffan	<jduffan2@google.ru>"
$pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>"
$results = [Regex]::Matches($data,$Pattern)
$people = @()

foreach($person in $results)
{
    $obj = [pscustomobject]@{
        "First Name" = $person.Groups["first"].value
        "Last Name" = $person.Groups["last"].value
        Email = $person.Groups["email"].value
    }

    $people += $obj
}

$People
 
First Name Last Name   Email                  
---------- ---------   -----                  
Brenda     Seamon      bseamon0@bbc.co        
Joeann     Brotherwood jbrotherwood1@house.gov
Jake       Duffan      jduffan2@google.ru 

Hopefully you've had fun playing with RegEx so far. We will take a look at some other symbols and some little tricks we can use grouping and ?for in future posts!

As always, don't forget to rate, comment and share! Let me know what you think of the content and what topics you'd like to see me blog about in the future.

Comments (2)

  1. Anonymous says:
    (The content was deleted per user request)
  2. Hi Kory, the fifth code example contains HTML tags (& a m p 😉
    Waiting for the next post 😉

Skip to main content