Retrieving the Two Code Groups – VB

[Table of Contents] [Next Topic]

There are two groups of paragraphs in our document that are styled as “Code”.  The first group contains the C# code that we want to test.  The second group contains a single paragraph that is the output of the code in the first group.  Next in the process of formulating our query, we want to retrieve each block of code as a separate group.

This blog is inactive.
New blog:

Blog TOCThe problem is, the GroupBy extension method doesn’t do what we want.  It groups all items together in the collection, regardless of if they are separated by other items.  It would join our two groups of code, which we want to keep separate.

For instance, if we amend the code to group the paragraphs, adding one more query to the bottom of our string of queries, as follows:

Dim defaultStyle As String = _
    CStr( _
            ( _
                From style in styleDoc.Root _
                    .Elements(w + “style”) _
                Where( _
                    CStr(style.Attribute(w + “type”)) = “paragraph” And _
                    CStr(style.Attribute(w + “default”)) = “1”) _
            ) _
            .First() _
            .Attribute(w + “styleId”) _
Dim paragraphs = _
    mainPartDoc.Root _
        .Element(w + “body”) _
        .Descendants(w + “p”) _
        .Select(Function(p) _
            New With { _
                .ParagraphNode = p, _
                .Style = GetParagraphStyle(p, defaultStyle) _
            } _
Dim r As XName = w + “r”
Dim ins As XName = w + “ins”
Dim paragraphsWithText = _
    paragraphs.Select(Function(p) _
        New With { _
            .ParagraphNode = p.ParagraphNode, _
            .Style = p.Style, _
            .Text = p.ParagraphNode _
                .Elements() _
                .Where(Function(z) z.Name = r or z.Name = ins) _
                .Descendants(w + “t”) _
                .StringConcatenate(Function(s) CStr(s)) _
        } _
Dim groupedCodeParagraphs = _
    paragraphsWithText.GroupBy(Function(p) p.Style)
For Each g In groupedCodeParagraphs
    Console.WriteLine(“Group of paragraphs styled {0}”, g.Key)
    For Each p In g
        Console.WriteLine(“{0} {1}”, _
                    p.Style.PadRight(12), _

Then we see:

Group of paragraphs styled Heading1
Heading1     Parsing WordprocessingML with LINQ to XML
Group of paragraphs styled Normal
Normal       The following example prints to the console.
Normal       This example produces the following output:
Group of paragraphs styled Code
Code         using System;
Code         class Program {
Code             public static void Main(string[] args) {
Code                 Console.WriteLine(“Hello World”);
Code             }
Code         }
Code         Hello World

This grouped the “Hello World” with the code, which is not what we want.

As it turns out, there isn’t a standard query operator that does exactly what we want.  We want an operator that groups only adjacent fields with a common key.  So let’s write one.  In addition to the GroupAdjacent extension method, we need an GroupOfAdjacent class that we can iterate through for each grouping.  It only takes a couple dozen lines of code to implement this.

Unlike the C# version, the GroupAdjacent implementation for Visual Basic is not lazy.  But this really doesn’t impact performance in any noticeable way, even for large documents.

Before this version of GroupAdjacent returns the first group, it iterates through the entire collection, creating a list of lists.

To use GroupAdjacent, we pass it a lambda that selects the value that when that value changes, the operator creates a new group.  GroupAdjacent then is a sequence of groups, each of which contain a sequence of type T.

Here is the listing:

Imports System.IO
Imports System.Xml
Imports System.Text
Imports DocumentFormat.OpenXml.Packaging
Public Class GroupOfAdjacent(Of TElement, TKey)
    Implements IEnumerable(Of TElement)
    Private _key As TKey
    Private _groupList As List(Of TElement)
    Public Property GroupList() As List(Of TElement)
            Return _groupList
        End Get
        Set(ByVal value As List(Of TElement))
            _groupList = value
        End Set
    End Property
    Public ReadOnly Property Key() As TKey
            Return _key
        End Get
    End Property
    Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of TElement) _
            Implements System.Collections.Generic.IEnumerable(Of TElement).GetEnumerator
        Return _groupList.GetEnumerator
    End Function
    Public Function GetEnumerator1() As System.Collections.IEnumerator _
            Implements System.Collections.IEnumerable.GetEnumerator
        Return _groupList.GetEnumerator
    End Function
    Public Sub New(ByVal key As TKey)
        _key = key
        _groupList = New List(Of TElement)
    End Sub
End Class
Module Module1
    <System.Runtime.CompilerServices.Extension()> _
    Public Function GroupAdjacent(Of TElement, TKey)(ByVal source As IEnumerable(Of TElement), _
            ByVal keySelector As Func(Of TElement, TKey)) As List(Of GroupOfAdjacent(Of TElement, TKey))
        Dim lastKey As TKey = Nothing
        Dim currentGroup As GroupOfAdjacent(Of TElement, TKey) = Nothing
        Dim allGroups As List(Of GroupOfAdjacent(Of TElement, TKey)) = New List(Of GroupOfAdjacent(Of TElement, TKey))()
        For Each item In source
            Dim thisKey As TKey = keySelector(item)
            If lastKey IsNot Nothing And Not thisKey.Equals(lastKey) Then
            End If
            If Not thisKey.Equals(lastKey) Then
                currentGroup = New GroupOfAdjacent(Of TElement, TKey)(keySelector(item))
            End If
            lastKey = thisKey
        If lastKey IsNot Nothing Then
        End If
        Return allGroups
    End Function
    <System.Runtime.CompilerServices.Extension()> _
    Public Function GetPath(ByVal el As XElement) As String
        Return el _
            .AncestorsAndSelf _
            .InDocumentOrder _
            .Aggregate(“”, Function(seed, i) seed & “/” & i.Name.LocalName)
    End Function
    <System.Runtime.CompilerServices.Extension()> _
    Function StringConcatenate(Of T) _
            (ByVal source As IEnumerable(Of T), ByVal projectionFunc As Func(Of T, String)) _
            As String
        Return source.Aggregate(New StringBuilder, _
            Function(sb, i) sb.Append(projectionFunc(i)), _
            Function(sb) sb.ToString)
    End Function
    Public Function LoadXDocument(ByVal part As OpenXmlPart) _
            As XDocument
        Using streamReader As StreamReader = New StreamReader(part.GetStream())
            Using xmlReader As XmlReader = xmlReader.Create(streamReader)
                Return XDocument.Load(xmlReader)
            End Using
        End Using
    End Function
    Public Function GetParagraphStyle(ByVal para As XElement, _
                                      ByVal defaultStyle As String) As String
        Dim w As XNamespace = _
        Dim paraStyle = CStr(para.Elements(w + “pPr”) _
                       .Elements(w + “pStyle”) _
                       .Attributes(w + “val”) _
        If (paraStyle Is Nothing) Then
            Return defaultStyle
            Return paraStyle
        End If
    End Function
    Sub Main()
        Dim w As XNamespace = _
        Dim filename As String = “SampleDoc.docx”
        Using wordDoc As WordprocessingDocument = _
            WordprocessingDocument.Open(filename, True)
            Dim mainPart As MainDocumentPart = _
            Dim styleDefinitionPart As StyleDefinitionsPart = _

Comments (0)