Removing Page and Section Breaks from a Word Document


In today’s post I am going to show you how to remove page and section breaks within a Word document using the Open XML SDK. Removing these two types of breaks is similar, but requires two different approaches. Let’s start off by jumping into removing page breaks.


My post will talk about using version 2 of the SDK.


If you just want to jump straight into the code, feel free to download this solution here.


Solution to Remove Page Breaks


To remove page breaks in a document we need to take the following actions:



  1. Open the Word document via the Open XML SDK

  2. Get access to the main document part

  3. Find all page breaks within the main document part

  4. For every page break found, remove it from the document

  5. Save changes

For the sake of this example, let’s say I am starting with the following Word document:



This document has a page break (shown outlined in red) on the first page.


The Code


The code is pretty straight forward and follows the solution steps as described above in the solutions section:









static void RemovePageBreaks(string filename)

{

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))

{

MainDocumentPart mainPart = myDoc.MainDocumentPart;

List<Break> breaks = mainPart.Document.Descendants<Break>().ToList();

foreach (Break b in breaks)

{

b.Remove();

}

mainPart.Document.Save();

}

}

Pretty easy stuff!


End Result


Running this code I should end up with a document that looks like the following:



Now let’s see how to remove section breaks within a document. Before I actually jump into the solution of removing sections, I want to talk a bit about section breaks within a Word document.


Section Breaks in a Word Document


WordprocessingML does not natively store the concept of pages, since it is based on paragraphs and runs. Instead it uses sections to specify groups of paragraphs that have a specific set of page properties.


Every Word document has at least one section, where each section specifies page properties (like page size, orientation, margins, etc), header/footer references, column information, etc. Given this information, there are really two high level types of sections:



  1. A section as a paragraph property – A document may have zero or more of these types of sections

  2. A document final section property – A document can will only have one of these types of sections

In today’s post I am going to show you how to remove all sections that are a paragraph property.


Solution to Remove Section Breaks


To remove section breaks in a document we need to take the following actions:



  1. Open the Word document via the Open XML SDK

  2. Get access to the main document part

  3. Find all paragraph properties that are contain section breaks

  4. For every paragraph property found, remove the section property as a child of the paragraph property

  5. Save changes

For the sake of this example, let’s say I am starting with the following Word document:



This document has a section break (shown outlined in red) on the first page, which separates a one column section from a two column section.


The Code


This code is also pretty straight forward and follows the solution steps as described above in the solutions section:









static void RemoveSectionBreaks(string filename)

{

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))

{

MainDocumentPart mainPart = myDoc.MainDocumentPart;

List<ParagraphProperties> paraProps = mainPart.Document.Descendants<ParagraphProperties>()

.Where(pPr => IsSectionProps(pPr)).ToList();

foreach (ParagraphProperties pPr in paraProps)

{

pPr.RemoveChild<SectionProperties>(pPr.GetFirstChild<SectionProperties>());

}

mainPart.Document.Save();

}

}

static bool IsSectionProps(ParagraphProperties pPr)

{

SectionProperties sectPr = pPr.GetFirstChild<SectionProperties>();

if (sectPr == null)

return false;

else

return true;

}

End Result


Running this code I should end up with a document that looks like the following:



Notice how the document now has two columns. This solution removed the first section property, which specified a one column section.


Zeyad Rajabi

Comments (3)

  1. Jason says:

    This is slightly OT, but I’ve noticed that in Word 2007 section breaks misbehave *within content controls*.

    For example:

    1. a section break at the end of an sdtContent (ie w:p/w:pPr/w:sectPr/w:type immediately before the </w:sdtContent>, you can’t delete in the Word UI

    2. what is in the OpenXML a section break of type "continuous", is displayed in the Word UI as a section break of type "Next Page"

    Any insight into this?  Is there a bug tracking system somewhere I can access in which this could be reported?

    thanks

    Jason

  2. Zeyad Rajabi says:

    Jason,

    For issue #1 try using Shift-Delete. That should delete the section break. For issue #2 I am unable to repro what you are seeing. I see Word correctly displaying the breaks as continuous.