PowerShell for Non-N00bs: Formatting Time With RegEx

So, last episode we learned how to format the output of a LastBootUpTime property from the WIN32_OperatingSystem WMI query to something human readable using WMI.  What if that didn't exist? 

 PSH> (Get-WmiObject -Query 'SELECT LastBootUpTime FROM Win32_OperatingSystem').LastBootUpTime
20090712112652.125000-420

So, we have this ugly string.  Hhow do we convert it to something we can use (namely, a string that can be input into a DateTime object)?  That's where the RegEx Named Matches comes in.

PSH> (Get-WmiObject -Query 'SELECT LastBootUpTime FROM Win32_OperatingSystem').LastBootUpTime -Match "^(?<year>\d\d\d\d)(?<mon>\d\d)(?<day>\d\d)(?<hr>\d\d)(?<min>\d\d)(?<sec>\d\d)(?<etc>.*)"
True

Say what?  It's actually fairly straightforward.  Let's roll it back to the simplest case: 

PSH> (Get-WmiObject -Query 'SELECT LastBootUpTime FROM Win32_OperatingSystem').LastBootUpTime -Match "^(?<year>\d\d\d\d)"
True

Okay, not as daunting.  But "^(?<year>\d\d\d\d)" has almost all the basic elements that comprise the rest of the long (and intimidating) string.  Let's look at them one at a time:

  • ^ in this context means "At the beginning of the line."  It doesn't match a particular character in the target string; it anchors the pattern to the start of target string.  If it's absent, the RegEx parser will start the first time it matches the first character in the pattern, which might not correspond to the first character in the target string.
  • () in this context means "remember the matching string for this pattern."  If it's absent, the RegEx parser will tell us if or if not the pattern matches the target string, but not which part of the target string matches the pattern.  If we just want a pass/fail result, that's fine; here we want to tokenize and extract data, so we want to return the portion of the target string matching the pattern.
  • ?<year> in this context (specifically, in a saved match, that is, a parenthetical string in a RegEx pattern) gives this matching string a name for reference to the contents later.  In this case, the magic variable $matches will have a key named, "year" that will contain the portion of the target string matching the pattern.
  • \d\d\d\d in this context means "four digits".  By itself, \d means "one digit", that is, 0 - 9.  Here, we want four of them.

All together, we're saying, "At the beginning of the target string, take the first for digits and allow us to refer to them later as the 'year' portion of the matching string."

The (one or more) parts of the string that match the pattern(s) are stored in the variable $matches (singluar).  $matches[0] is always the string that satisfies the whole regular expression.  Then it gets wierd.  If the remembered matches (the parts of the pattern in parentheses) have names (the ?<name> rigamarole), then those matching parts are accessed as keys to the $matches hash.  Once the named matches are removed from the pattern, then any unnamed, but remembered matches (lacking the ?<name> string) are accessed as elements to the $matches array.  Got that?

If we named our remembered matches, we can access them as if $matches is a hash, e.g., $matches.year.  If we forgot to name them, then we have to access them as if $matches is an array, e.g., $matches[1].  $matches[0] is always the whole matching string, so $matches[1] is the first unnamed remembered match, $matches[2] is the second, and so on.  This holds true even if the pattern is something like "(/d/d/d/d)(?<mon>/d/d)(/d/d)" - The data will be accessed as $matches[1], $matches.mon, then $matches[2], even though that last remembered match is the third remembered match.

Best way to avoid this drain bamage?  Always name our remembered matches.

Okay, where were we?  Oh, that's right.  We spent umpteen hundred words to get the first four digits from the LastBootUpTime string.  That's the bad news.  The good news is that everything we just underwent applies to the month, day, hour, minute and second values. 

This leaves one last snippet of that ugly RegEx pattern: "(?<etc>.*)"  We know what the parentheses do, and what "?<etc>" means.  All that's left is ".*" 

  • "." in this context means, "match any single character."  It's the universal-catch-all.
  • "*" in this context means, "zero or more of the previous metacharacter."  "a*" would mean "zero or more a characters."  ".*" means "zero or more any character," which is a roundabout way of saying "everything else."

So what's with the 'True' that -match returns?  -match is a Boolean operator.  "Does this string match that pattern?"  It has to return a true or false pattern.  So let's put it to use:

PSH> if ((Get-WmiObject -Query 'SELECT LastBootUpTime FROM Win32_OperatingSystem').LastBootUpTime -Match "^(?<year>\d\d\d\d)(?<mon>\d\d)(?<day>\d\d)(?<hr>\d\d)(?<min>\d\d)(?<sec>\d\d)(?<etc>.*)") { Get-Date ("{0}-{1}-{2} {3}:{4}:{5}" -f $matches.year, $matches.mon, $matches.day, $matches.hr, $matches.min, $matches.sec); }

Wednesday, July 12, 2009 11:26:52 AM

That's looks vaguely familiar.