Create a Concordance for a Word Document from C# using Automation


I was talking to my son about some of his homework assignments. He had to write several essays. We talked about the power of Microsoft Word and how I had to count the words manually when writing a 5000 word essay on a typewriter, but Word shows the word count easily.

Then we talked about the utility of a concordance and how to create one.

Below is a simple C# program that you can run that will open a Word document (or you can have it create a new one, and have it type some text), retrieve the text, count the occurrences of each word into a dictionary, then output the result into an Excel spreadsheet. The sample needs nothing more than office installed and Visual Studio. I suspect that any version of each will suffice.

The code uses C# dynamic for which types are not checked at compile time, and you don’t get intellisense.

Of course, you can accomplish the same thing with VBA in a macro (Excel or Word).

Years ago, I wrote the code to enable office intellisense in Visual FoxPro, even though it has no strong typing like C# dynamic. It still works today

clip_image001

clip_image002

Under the covers, dynamic and Visual Foxpro user something called OLE Automation which is still very useful today.

Of course, the code below can be modified to have an exclusion list (the number of “the”s in a document may not be interesting) and even to directly insert the concordance data directly into the target document.

<code sample>

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;

namespace WpfApplication1
{
  // Start Visual Studio
  // File->New->Project->C# WPF application
  // paste in this code
  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += (ol, el) =>
      {
        try
        {
          /*
          when i don't know how to manipulate Word or Excel, i do this:
          View->Macro->Record a macro
          then edit the macro
          or i look at MSDN documentation:
          Excel: https://msdn.microsoft.com/en-us/library/office/ff194068.aspx
          Word: https://msdn.microsoft.com/en-us/library/office/ff837519.aspx
          or open a VS developer command prompt and 
            run oleview on "C:\Program Files (x86)\Microsoft Office\Office16\Excel.exe"
          or open oleview, navigate to TypeLibraries->Microsoft Word 16.0 Object Library

          */
          var dict = new Dictionary<string, int>();
          var dtStart = DateTime.Now;
          // also works with 
          //    VisualFoxPro.Application
          //    PowerPoint.Application
          //    Outlook.Application
          //    
          var typeWord = Type.GetTypeFromProgID("Word.Application");
          dynamic oWord = Activator.CreateInstance(typeWord);
          oWord.Visible = true;
          //dynamic oDoc = oWord.documents.add(); // to create a new document
          dynamic oDoc = oWord.RecentFiles(1).Open();
          // you can type text like so
          //oWord.Selection.TypeText("Hello there");

          // save the file
          //var filename = System.IO.Path.Combine(
          //    Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
          //    "temp.docx");
          //oDoc.saveas(filename);

          int nWrdCnt = 0;
          // expand the selection to everything
          oWord.Selection.WholeStory();
          // get all the text of the document
          var allText = oWord.Selection.Text as string;
          // split the text into an array of words
          var words = allText.Split(new[] { ' ' });

          foreach (var txt in words)
          {
            // Is it in our dictionary?
            if (dict.ContainsKey(txt))
            {
              dict[txt]++;
            }
            else
            {
              dict[txt] = 1;
            }
            nWrdCnt++;
          }
          oWord.Quit();

          // it's actually faster to output the data as a CSV file, 
          // then start excel with it,
          // but that's less flexible
          //*
          var sb = new StringBuilder();
          // add a header
          sb.AppendLine("Word, Count");
          // iterate through the dictionary in descending order by value
          foreach (var kvp in dict.OrderByDescending(kvp => kvp.Value))
          {
              sb.AppendLine($@"""{kvp.Key}"",{kvp.Value}");
          }
          var fileName = System.IO.Path.Combine(
              System.IO.Path.GetTempPath(),
              "t.csv");
          System.IO.File.WriteAllText(fileName, sb.ToString());
          System.Diagnostics.Process.Start(fileName);
          /*/
          var typeExcel = Type.GetTypeFromProgID("Excel.Application");
          dynamic oExcel = Activator.CreateInstance(typeExcel);
          var oWrkBook = oExcel.Workbooks.Add();
          var oSheet = oWrkBook.ActiveSheet;
          int nRow = 1;
          foreach (var kvp in dict.OrderByDescending(kvp => kvp.Value))
          {
            oSheet.Cells[1][nRow] = kvp.Key.StartsWith("=") ? @"'=" : kvp.Key;
            oSheet.Cells[2][nRow] = kvp.Value;
            nRow++;
          }
          oExcel.Visible = true;
          //*/

          this.Title = $"# of words = {nWrdCnt} Duration = {(DateTime.Now - dtStart).TotalSeconds}";
          var lv = new ListView()
          {
            ItemsSource = from kvp in dict
                          orderby kvp.Value descending
                          select new
                          {
                            V = $"{kvp.Value} {kvp.Key}"
                          }
          };
          this.Content = lv;

        }
        catch (Exception ex)
        {
          this.Content = ex.ToString();
        }

      };
    }
  }
}

</code sample>

Comments (1)

  1. Ram says:

    Very nice thought process and its implementation too.

Skip to main content