PowerShell Diversion #3: Discussion

[Follow-up discussion for PowerShell Diversion #3]

So, it looks like 35 people attended the different sessions for PowerShell Week.  The following one-liner can generate a list of their names and email addresses from the individual attendee lists (linked in the original post):

1..5 |
    ForEach-Object { Import-Csv ("Session$_.csv") } |
        Select-Object Name, Email -Unique |
            Sort-Object @{Expression={$_.Name.Split(" ")[-1]}} |
                Export-Csv CombinedList.csv –NoTypeInformation

How does this work?  Well, the first two lines are reading the entries from all of the individual CSV files:

1..5 | ForEach-Object { Import-Csv ("Session$_.csv") } |

The Range Operator (..) is a nice way to quickly loop through a set of consecutive numbers.  We then take advantage of PowerShell’s excellent variable substitution to generate the name of the current file.  That is, “Session$_.csv” becomes “Session1.csv”, “Session2.csv”, etc.  This result sin every entry from each of the CSV files being sent along the pipeline for processing.

Next, the Select-Object cmdlet grabs the Name and Email fields for each entry and the “-Unique” switch ensures that any duplicates (and there are quite a few across the files) are discarded:

Select-Object Name, Email -Unique

At this point we have a collection of objects, one for each unique attendee.   Technically, we’ve done enough to now send the email to each attendee as mentioned in the original post.  However, for ease of use (and because the original problem asked for it), we should sort the names alphabetically.

Since the names are given as a single string containing all parts of the name, we need a way to break this into the individual words, so we can sort on the last name.   The good news is that Sort-Object allows us to do all of this in a single step:

Sort-Object @{Expression={$_.Name.Split(" ")[-1]}}

We simply supply the cmdlet with a hash table that defines an expression who’s result is used as the item to be sorted – in this case it will be the last name of the attendee. 

The expression uses the Split() method of string objects to generate an array where the first name is in position 0, the second name in position 1, etc.  In order to make sure we use the last name in all cases, we use the index “-1”, which translates as “the last item in the array” (“-2”, would be the “second to last item”, and so on).  If we didn’t use this and had assumed all attendees had only a first and last name, then “James Clerk Maxwell” would be sorted incorrectly, and really that’s no way to treat one of the greatest physicists of all time.

Finally, we use Export-Csv to generate a new file with the combined list of attendees.  The “-NoTypeInformation” switch prevents the cmdlet adding the object type name to the head of the file.

To answer the additional question about who attended the most/fewest sessions, we need to modify the code like this:

1..5 |
    ForEach-Object { Import-Csv ("Session$_.csv") } |
        Group-Object Name |
            Sort-Object Count –Descending

From the results, it is clear that “Marie Curie” attended the most sessions (all five – she can’t get enough scripting), and several people (including “Grant Imahara”,  “Leonhard Euler” and  “Douglas Adams”) need a bit more convincing about the wonders of Powershell as they attended only a single sessions each.