How to use get the HTML contents of the document using IDataObject

Sure, I know there are many ways of doing it -  You can save it as HTML and then read the HTML file or you can use the HTMLProject items from the Word OM or you can just select all the contents of the file and save it to clipboard and read it from there.

But, I am telling you a way other then the the usual - big deal ? depends ..

My situation was -

I didn't want to save the file to a temp location as HTML as in that case my active document actually gets modified to HTML. Although, I can do a "Save As" Under a different name, but even in that case my current Active Document will be changed to HTML.  In this case (2nd case) if you've saved the original doc as document previously, then at least that document is intact, but the only problem is now if you want that doc back, you'll need to reopen it.

Depending on a lot of things I might, or might  not be okay with this approach. Hence ruled out   ...

HTML Project items is removed in Word 2007, so that's no option for me ...

Clipboard option is possible, but I don't like the idea of using clipboard from my custom code  (seen a lot of issues on it ...) So I thought of using a different approach which is based on IDataObject

Here is the code -

using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Word = Microsoft.Office.Interop.Word;
using System.Reflection;
using System.Runtime.InteropServices;
using COM = System.Runtime.InteropServices.ComTypes;

namespace IDataObjectFromWord
    public partial class Form1 : Form

        [DllImport("kernel32.dll", CharSet = CharSet.Auto, ExactSpelling = true, SetLastError = true)]
        private static extern IntPtr GlobalLock(HandleRef handle);

        [DllImport("kernel32.dll", CharSet = CharSet.Auto, ExactSpelling = true, SetLastError = true)]
        private static extern bool GlobalUnlock(HandleRef handle);

        [DllImport("kernel32.dll", CharSet = CharSet.Auto, ExactSpelling = true, SetLastError = true)]
        private static extern int GlobalSize(HandleRef handle);

        Word.Application wdApp;
        Word._Document wdDoc;
        object o=Missing.Value;    
        public Form1()

        private void button1_Click(object sender, EventArgs e)
            wdApp= new Word.Application();
            wdDoc=wdApp.Documents.Add(ref o,ref o, ref o, ref o);
            wdDoc.Range(ref o, ref o).Text = "Hello World ...";
            wdApp.Visible = true;

            COM.IDataObject dt = (COM.IDataObject)wdDoc;
            COM.IPersistFile pp = (COM.IPersistFile)wdDoc;
            COM.FORMATETC format= new COM.FORMATETC();
            COM.STGMEDIUM stgmedium= new COM.STGMEDIUM();
            format.cfFormat = (short)DataFormats.GetFormat(DataFormats.Html).Id;
            format.dwAspect = COM.DVASPECT.DVASPECT_CONTENT;
            format.lindex = -1;
            format.tymed = COM.TYMED.TYMED_HGLOBAL;

            stgmedium.tymed = COM.TYMED.TYMED_HGLOBAL;
            stgmedium.pUnkForRelease = null;

            dt.GetData(ref format, out stgmedium);

            IntPtr pointer = stgmedium.unionmember;
            HandleRef handleRef = new HandleRef(null, pointer);

            byte[] rawArray = null;
                IntPtr ptr1 = GlobalLock(handleRef);

                int length = GlobalSize(handleRef);

                rawArray = new byte[length];

                Marshal.Copy(ptr1, rawArray, 0, length);          

            catch (Exception exp)
                System.Diagnostics.Debug.WriteLine("HtmlFromIDataObject.GetHtml -> Html Import threw an exception: " + Environment.NewLine + exp.ToString());


            //return rawArray;
            pp.Save(@"C:\upload\Something.doc", false);



Comments (0)

Skip to main content