OpenXML & VSTO & VBA – Finding a reliable mechanism for reading the correct value of CharactersWithSpaces ‘extended-properties’ in Word documents [part 2/2].


 

This article is split across two blog posts and this is part #2 .. use this link to go to part #1.

In part #1 of this article, I demonstrated how words are counted using OpenXML and I warned about the dangers of not getting it done correctly.

Here is a short summary of the alternatives we have:   

Can we obtain 100% reliable statistics?

Method #1:
       When Word receives a query about the Document statistics (either manually through the Ribbon
       graphical interface or using VBA ObjectModel), it computes those results and always reports 
the
       correct numbers. All we have to do next, is to save and close the file and then send it to a program
       which accesses the OpenXML structure and reads those values.  

        The only difficult part is finding a way to trigger the update every time, to ensure that the
        information stored in the file is up to date.

        Advantages: 
                                       > easy to implement;
                                       > OpenXML code used to read the values is very simple;
                                       > works for all kinds of input files ... even very complex ones 
                                         (containing embedded charts, shapes, nested tables ..etc);

        Disadvantages:
                                       > we can only force the Word Count update if we rely on a VBA / VSTO
                                          add-in installed on the client-side;
                                      > somehow, the automated Statistics Update Add-in has to be
                                         deployed to all end-users; 

 Method #2:
      
Write our own OpenXML code and count the words ourselves.

        Advantages: 
                                       > no need for 'helper' tools;

        Disadvantages:
                                       > because the OpenXML format is VERY complex, the code will run
                                          reliably only for basic input files; If you want to extend the program
                                          to be able to handle all kinds of input documents you will find that
                                          the complexity of the code increases up to the point where it is not
                                          feasible to continue
with the project (you will very likely be forced
                                          to write individual code rules for targeting all kinds of
                                          exceptions and special conditions for XML text tags, that may
                                          appear in different combinations);

In this article, I would like to present the first method, where we use VBA to count our words. But first, we have to trigger the problem:

 

Triggering the Word statistics mismatch problem

1.   Just create a new Word document, type "=rand(1)" (without the quotes), then press Enter key;
2.   Save it file using .docx type, then close it;
3.   Open the file using an OpenXML editor, or rename the document from .docx to .zip, open the docProps folder and then edit the app.xml file;  Note the values of these XML items:
      >  Pages
      >  Words
      >  Characters;
      >  Lines;
      >  Paragraphs
      >  CharactersWithSpaces;

4.   Close the editor, or if you renamed the file to .zip, restore its original extension; Open it again in Word;
5.   On the Review tab, in the Proofing group, click Word Count;
6.   Compare the statistics in the Word Count dialog with those noted from the app.xml file;

 

 Result: we easily notice that the numbers are different ...

7.   Close the document again, you should be prompted to Save it. Go ahead and click OK to store the updated
      document information;
8.   Open its internal OpenXML structure and this time you should see that the numbers match; 
 

If we slightly change the order of execution for the aforementioned steps:

   > create a new file;
   > add some text;
   > save the file;
   > keep the document open, then go to 'Review' > 'Proofing';
   > click on 'Word Count';
   > close the file;
   > open it using an OpenXML editor, or rename the document from .docx to .zip, open the docProps
      folder and then edit the app.xml file;

... you'll notice that the correct statistics information is stored into the document.

But something interesting happens ... when we try to close the Word document after viewing the Statistics, even though we didn't add any modification (we simply clicked on 'Word Count') the application prompts us to save the file again!

The same behavior occurs if I go to VBA and execute:

"Debug.Print ActiveDocument.ComputeStatistics(wdStatisticCharactersWithSpaces)".

What this means is that when we first saved the file, Word just entered a rough estimation of the Character count, but when we clicked on 'Word Count', the application updated its Statistics.

Since we know that it’s enough to click on the 'Word Count' button or execute the VBA instruction to have the correct value stored in OpenXML, we can take advantage of it to force a computation before the user triggers a Save in Word.

In this way, each time the end-user sends his file to an automated code or script, it will contain the most up-to-date Statistics information.

 

A simple solution using VBA ...
 
I wrote a simple Word DOTM add-in which will trigger the statistics update before each save:

                      DISCLAIMER

Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment!

THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.

We grant You a nonexclusive, royalty-free right to use and modify the
Sample Code and to reproduce and distribute the object
 code form of
the Sample Code, provided that. You agree:
   (i)   to not use Our name, logo, or trademarks to market Your
         software 
product in which the Sample Code is embedded;
   (ii)  to include a valid copyright notice on Your software
         product in which the 
Sample Code is embedded; and
   (iii) to indemnify, hold harmless, and defend Us and Our
         suppliers from and against any claims 
or lawsuits,
         including attorneys’ fees, that arise or result from the
         use or distribution of t
he Sample Code.


Module1
-----------------------------------------------------------------

Option Explicit 

Public clsWd As Class1

Sub AutoExec()

   Debug.Print "WordFixStatistics: AutoExec fired"
   Set clsWd = New Class1

End Sub

Class1
------------------------------------------------------------------
Public WithEvents clsWd As Application

Private Sub Class_Initialize()

Debug.Print "WordFixStatistics: Class_Initialize fired"
Set clsWd = Application

End Sub

Private Sub clsWd_DocumentBeforeSave(ByVal Doc As Document, _
                                     SaveAsUI As Boolean, _
                                     Cancel As Boolean)

Debug.Print "WordFixStatistics: DocumentBeforeSave event fired"
Debug.Print "The document: " & Doc.Name & " has [" & _

Doc.ComputeStatistics(wdStatisticCharactersWithSpaces) & "] CharactersWithSpaces." 

End Sub

 

As you can see, everything seems to be working:

 

 .. but not for all scenarios.

Let's suppose the end-user starts by opening an older .DOC file, then he does a SaveAs to store it as an OpenXML format document. In this case, the newly saved .DOCX file will contain unreliable word count information ... But why ?

It seems this issue is being caused by the fact that my code is receiving a handle that points at the old .DOC file when this event occurs:

Private Sub clsWd_DocumentBeforeSave(ByVal Doc As Document ..

.. therefore Word computes the correct Statistics for a different file. This is not a problem in a normal Save action, but with a SaveAs, we get a handle on the new document only after we exit the BeforeSave event handler, and by that time it is too late for the code to act.

A less simple solution using VBA 🙂 ...

There is no AfterSave event in Word, but I tried to simulate one:

>  first I am saving the name of the present document (DOC);
>  then I am executing a delayed call to another function (timerCallback) where I check if the
    active document (it becomes a DOCX format after SaveAs completes) has the same name
    as the one I recorded before;
>  if the names are not identical, we probably executed a SaveAs so we trigger the computation
    again;
>  I chose to introduce a 200ms delay .. but it doesn’t seem to matter whether this interval is
    smaller or larger; The Application.OnTime call schedules a timer callback at the first available
    moment, but it is done only after the SaveAs function completes .. so by that time we already
    have a new document name and can detect if we have to trigger a new computation;

The source code became more complex now … and with the added complexity, there may be problems. It’s up to you to decide if you want to keep the first design and just instruct your users to perform a normal Save after they convert a document, or keep this more complicated design.


 Module1
-----------------------------------------------------------------

Option Explicit

Public currentDocName As String
Public clsWd As Class1

Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

Sub AutoExec()

   Debug.Print "WordFixStatistics: AutoExec fired"
   Set clsWd = New Class1

End Sub

Sub timerCallback()

  DoEvents
  Sleep 200

  Debug.Print "TimerCallback triggered."

  If Not (ActiveDocument Is Nothing) And Len(currentDocName) > 0 Then 

   If ActiveDocument.Name <> currentDocName Then
     Debug.Print "TimerCallback: SaveAs detected. Old doc. name: " & _
                  currentDocName & ", New 
doc. name: " & _
                  ActiveDocument.Name

     Debug.Print "The document: " & ActiveDocument.Name & " has [" & _
                 
ActiveDocument.ComputeStatistics
                  (wdStatisticCharactersWithSpaces) & _
                  "]
 CharactersWithSpaces."
    
End If

  End If

End Sub

Class1
------------------------------------------------------------------

Option Explicit

Public WithEvents clsWd As Application

Private Sub Class_Initialize()

  Debug.Print "WordFixStatistics: Class_Initialize fired"
  Set clsWd = Application

End Sub

Private Sub clsWd_DocumentBeforeSave(ByVal Doc As Document, _
                                     SaveAsUI As Boolean, _
               
                     Cancel As Boolean)

  Debug.Print "WordFixStatistics: DocumentBeforeSave event fired"
  Debug.Print "The document: " & Doc.Name & " has [" & _

  Doc.ComputeStatistics(wdStatisticCharactersWithSpaces) & _ 
      "] CharactersWithSpaces."

  currentDocName = Doc.Name

  Application.OnTime Now(), "timerCallback"

End Sub

Here is the output:


As I have shown inside the areas highlighted with red, at first we open an older document format: rand1.doc. When we trigger a SaveAs, my BeforeSave code runs but it just counts the words inside this old file and exits. The Save action is automatically performed by Word, and only after we exit that internal function (which is not reachable to us) we notice that the ActiveDocument has changed to rand9.docx.

When the control returns to the VBA macros and they get their chance to run (the Application.OnTime schedules my task to execute at the first available slot), my code performs a comparison and detects the new document: rand9.docx which is highlighted in blue. It will trigger another word count and exit the routine.


The End.

I hope you enjoyed my article.

 

For any questions, feel free to add a comment or write me at cristib@microsoft.com.

 

 

 

Comments (0)

Skip to main content