Manipulating Office document contents – On Steroids!

Yes, after a long time  .. sorry, whatever .. let’s not spend time on reasons. Not going to help anyone … Smile

 

Here is the new story: How many time did you try manipulating document contents (for example adding something .. removing something etc), and the perf wasn’t something that you really expected. I’ve been there, part of the reason is: generally when things work great, I don’t get to see them Smile

While working on an issue, I came across a way to manipulate Office documents with a great performance – OpenXML: Yes, I know what are you thinking, because I thought the same thing – “How on earth can you use OpenXML to manipulate a loaded document!! you can’t even open it with OpenXML SDK!!”.  The answer lies in one of my previous post where I talked about FlatOPC, (not explicitly though). I am using the same thing for document manipulation. The core idea is -

  1. Get “System.IO.Packaging.Package”  stream for the document
  2. Open it using OpenXML SDK (Yes! you can open memory stream using OpenXML SDK)
  3. Convert it to FlatOPC
  4. Manipulate whatever you want ..
  5. Use InsertXML to insert it back to the document

Now, this is the idea – how to use it, is left to your imagination. Though I have already built a reusable library that you can use for achieving the same results without bothering what’s going on under the hoods, but it’s still in need of a good plugin system. But, you’ll get it for sure

Below is one example of what are the things that you can achieve using this: In this example I am removing all the “Editors” from the document (because having a lot of editors might mean, a lot of network calls)

  1:       private void button1_Click(object sender, RibbonControlEventArgs e)
  2:         {
  3:             wdApp.ScreenUpdating = false;
  4:             wdApp.ActiveDocument.Content.Select();
  5:             string openxml = string.Empty;
  6:  
  7:             //Get stream for the range. This is the System.IO.Packaging.Package stream
  8:             Stream packageStream = OpcHelper.GetPackageStreamFromRange(wdApp.Selection.Range);
  9:  
  10:             //Stream packageStream = this.Paragraphs[1].Range.GetPackageStreamFromRange();
  11:             //Use Open Xml SDK to process it.
  12:             using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(packageStream, true))
  13:             {
  14:                 //Convert to flat opc using this in-memory package
  15:                 XDocument xDoc = OpcHelper.OpcToFlatOpc(wordDoc.Package);
  16:  
  17:                 XmlNamespaceManager xnm = new XmlNamespaceManager(xDoc.CreateReader().NameTable);
  18:                 xnm.AddNamespace("w", "https://schemas.openxmlformats.org/wordprocessingml/2006/main");
  19:  
  20:                 xDoc.XPathSelectElements("//w:permStart", xnm).ToList().ForEach(a => a.Remove());
  21:                 xDoc.XPathSelectElements("//w:permEnd", xnm).ToList().ForEach(a => a.Remove());
  22:  
  23:                 openxml = xDoc.ToString();
  24:             }
  25:  
  26:             
  27:  
  28:             //Insert this flat opc Xml
  29:             wdApp.ActiveDocument.Select();
  30:  
  31:             try
  32:             {
  33:                 object nullstring = "";
  34:                 wdApp.ActiveDocument.Unprotect(ref nullstring);
  35:             }
  36:             catch (Exception)
  37:             {
  38:                 throw;
  39:             }
  40:  
  41:  
  42:             wdApp.Selection.Range.InsertXML(openxml, ref missing);
  43:             wdApp.ScreenUpdating = true;
  44:         }

Now, I am sure you are looking for explanation for some of the things here .. which I would do surely: but in the next post. This one is for you to figure out what’s happening Smile . But, don’t worry, I am not going to throw you in the dark – Below is OpcHelper that is being used:

 

  1: using System;
  2: using System.IO;
  3: using System.IO.Packaging;
  4: using System.Linq;
  5: using System.Text;
  6: using System.Xml;
  7: using System.Xml.Linq;
  8: using Microsoft.Office.Interop.Word;
  9:  
  10:  
  11: namespace WordAddIn2
  12: {
  13:     public static class OpcHelper
  14:     {
  15:     /// <summary>
  16:     /// Returns the part contents in xml
  17:     /// </summary>
  18:     /// <param name="part">System.IO.Packaging.Packagepart</param>
  19:     /// <returns></returns>
  20:     static XElement GetContentsAsXml(PackagePart part)
  21:     {
  22:         XNamespace pkg = 
  23:            "https://schemas.microsoft.com/office/2006/xmlPackage";
  24:         if (part.ContentType.EndsWith("xml"))
  25:         {
  26:             using (Stream partstream = part.GetStream())
  27:             using (StreamReader streamReader = new StreamReader(partstream))
  28:             {
  29:                 string streamString = streamReader.ReadToEnd();
  30:                 XElement newXElement = 
  31:                     new XElement(pkg + "part", new XAttribute(pkg + "name", part.Uri), 
  32:                         new XAttribute(pkg + "contentType", part.ContentType), 
  33:                         new XElement(pkg + "xmlData", XElement.Parse(streamString)));
  34:                 return newXElement;
  35:             }
  36:          }
  37:         else
  38:         {
  39:             using (Stream str = part.GetStream())
  40:             using (BinaryReader binaryReader = new BinaryReader(str))
  41:             {
  42:                 int len = (int)binaryReader.BaseStream.Length;
  43:                 byte[] byteArray = binaryReader.ReadBytes(len);
  44:                 // the following expression creates the base64String, then chunks
  45:                 // it to lines of 76 characters long
  46:                 string base64String = (System.Convert.ToBase64String(byteArray))
  47:                     .Select
  48:                     (
  49:                         (c, i) => new
  50:                         {
  51:                             Character = c,
  52:                             Chunk = i / 76
  53:                         }
  54:                     )
  55:                     .GroupBy(c => c.Chunk)
  56:                     .Aggregate(
  57:                         new StringBuilder(),
  58:                         (s, i) =>
  59:                             s.Append(
  60:                                 i.Aggregate(
  61:                                     new StringBuilder(),
  62:                                     (seed, it) => seed.Append(it.Character),
  63:                                     sb => sb.ToString()
  64:                                 )
  65:                             )
  66:                             .Append(Environment.NewLine),
  67:                         s => s.ToString()
  68:                     );
  69:  
  70:                 return new XElement(pkg + "part",
  71:                     new XAttribute(pkg + "name", part.Uri),
  72:                     new XAttribute(pkg + "contentType", part.ContentType),
  73:                     new XAttribute(pkg + "compression", "store"),
  74:                     new XElement(pkg + "binaryData", base64String)
  75:                 );
  76:             }
  77:         }
  78:     }
  79:     /// <summary>
  80:     /// Returns an XDocument
  81:     /// </summary>
  82:     /// <param name="package">System.IO.Packaging.Package</param>
  83:     /// <returns></returns>
  84:     public static XDocument OpcToFlatOpc(Package package)
  85:     {
  86:         XNamespace 
  87:             pkg = "https://schemas.microsoft.com/office/2006/xmlPackage";
  88:         XDeclaration 
  89:             declaration = new XDeclaration("1.0", "UTF-8", "yes");
  90:         XDocument doc = new XDocument(
  91:             declaration,
  92:             new XProcessingInstruction("mso-application", "progid=\"Word.Document\""),
  93:             new XElement(pkg + "package",
  94:                 new XAttribute(XNamespace.Xmlns + "pkg", pkg.ToString()),
  95:                 package.GetParts().Select(part => GetContentsAsXml(part))
  96:             )
  97:         );
  98:         return doc;
  99:     }
  100:     /// <summary>
  101:     /// Returns a System.IO.Packaging.Package stream for the given range.
  102:     /// </summary>
  103:     /// <param name="range">Range in word document</param>
  104:     /// <returns></returns>
  105:     public static Stream GetPackageStreamFromRange(Range range)
  106:     {
  107:         XDocument doc = XDocument.Parse(range.WordOpenXML);
  108:         XNamespace pkg =
  109:            "https://schemas.microsoft.com/office/2006/xmlPackage";
  110:         XNamespace rel =
  111:             "https://schemas.openxmlformats.org/package/2006/relationships";
  112:         Package InmemoryPackage = null;
  113:         MemoryStream memStream = new MemoryStream();
  114:         using (InmemoryPackage = Package.Open(memStream, FileMode.Create))
  115:         {
  116:             // add all parts (but not relationships)
  117:             foreach (var xmlPart in doc.Root
  118:                 .Elements()
  119:                 .Where(p =>
  120:                     (string)p.Attribute(pkg + "contentType") !=
  121:                     "application/vnd.openxmlformats-package.relationships+xml"))
  122:             {
  123:                 string name = (string)xmlPart.Attribute(pkg + "name");
  124:                 string contentType = (string)xmlPart.Attribute(pkg + "contentType");
  125:                 if (contentType.EndsWith("xml"))
  126:                 {
  127:                     Uri u = new Uri(name, UriKind.Relative);
  128:                     PackagePart part = InmemoryPackage.CreatePart(u, contentType,
  129:                         CompressionOption.SuperFast);
  130:                     using (Stream str = part.GetStream(FileMode.Create))
  131:                     using (XmlWriter xmlWriter = XmlWriter.Create(str))
  132:                         xmlPart.Element(pkg + "xmlData")
  133:                             .Elements()
  134:                             .First()
  135:                             .WriteTo(xmlWriter);
  136:                 }
  137:                 else
  138:                 {
  139:                     Uri u = new Uri(name, UriKind.Relative);
  140:                     PackagePart part = InmemoryPackage.CreatePart(u, contentType,
  141:                         CompressionOption.SuperFast);
  142:                     using (Stream str = part.GetStream(FileMode.Create))
  143:                     using (BinaryWriter binaryWriter = new BinaryWriter(str))
  144:                     {
  145:                         string base64StringInChunks =
  146:                        (string)xmlPart.Element(pkg + "binaryData");
  147:                         char[] base64CharArray = base64StringInChunks
  148:                             .Where(c => c != '\r' && c != '\n').ToArray();
  149:                         byte[] byteArray =
  150:                             System.Convert.FromBase64CharArray(base64CharArray,
  151:                             0, base64CharArray.Length);
  152:                         binaryWriter.Write(byteArray);
  153:                     }
  154:                 }
  155:             }
  156:             foreach (var xmlPart in doc.Root.Elements())
  157:             {
  158:                 string name = (string)xmlPart.Attribute(pkg + "name");
  159:                 string contentType = (string)xmlPart.Attribute(pkg + "contentType");
  160:                 if (contentType ==
  161:                     "application/vnd.openxmlformats-package.relationships+xml")
  162:                 {
  163:                     // add the package level relationships
  164:                     if (name == "/_rels/.rels")
  165:                     {
  166:                         foreach (XElement xmlRel in
  167:                             xmlPart.Descendants(rel + "Relationship"))
  168:                         {
  169:                             string id = (string)xmlRel.Attribute("Id");
  170:                             string type = (string)xmlRel.Attribute("Type");
  171:                             string target = (string)xmlRel.Attribute("Target");
  172:                             string targetMode =
  173:                                 (string)xmlRel.Attribute("TargetMode");
  174:                             if (targetMode == "External")
  175:                                 InmemoryPackage.CreateRelationship(
  176:                                     new Uri(target, UriKind.Absolute),
  177:                                     TargetMode.External, type, id);
  178:                             else
  179:                                 InmemoryPackage.CreateRelationship(
  180:                                     new Uri(target, UriKind.Relative),
  181:                                     TargetMode.Internal, type, id);
  182:                         }
  183:                     }
  184:                     else
  185:                     // add part level relationships
  186:                     {
  187:                         string directory = name.Substring(0, name.IndexOf("/_rels"));
  188:                         string relsFilename = name.Substring(name.LastIndexOf('/'));
  189:                         string filename =
  190:                             relsFilename.Substring(0, relsFilename.IndexOf(".rels"));
  191:                         PackagePart fromPart = InmemoryPackage.GetPart(
  192:                             new Uri(directory + filename, UriKind.Relative));
  193:                         foreach (XElement xmlRel in
  194:                             xmlPart.Descendants(rel + "Relationship"))
  195:                         {
  196:                             string id = (string)xmlRel.Attribute("Id");
  197:                             string type = (string)xmlRel.Attribute("Type");
  198:                             string target = (string)xmlRel.Attribute("Target");
  199:                             string targetMode =
  200:                                 (string)xmlRel.Attribute("TargetMode");
  201:                             if (targetMode == "External")
  202:                                 fromPart.CreateRelationship(
  203:                                     new Uri(target, UriKind.Absolute),
  204:                                     TargetMode.External, type, id);
  205:                             else
  206:                                 fromPart.CreateRelationship(
  207:                                     new Uri(target, UriKind.Relative),
  208:                                     TargetMode.Internal, type, id);
  209:                         }
  210:                     }
  211:                 }
  212:             }
  213:             InmemoryPackage.Flush();
  214:         }
  215:         return memStream;
  216:     }
  217: }
  218: }

Stay tuned for the next set of entries where I’d attempt to explains some of the things that we’ve used here – and we’d have a full fledged library (which supports addins – and you’ll be able to do the contribution)