A tool to import texts from Project Gutenberg

Here's the tool I wrote to import "Project Gutenberg" (link: http://www.gutenberg.org) texts into OneNote. The first link goes to the setup files, and the second has the code if you want to see that.

Update as of April 6, 2009: the updated download file is below my signature at the end of this article.  The "setup" and source files are combined as well - if you just want to install the powertoy, simply run the GutenWin.exe file.






Remember to run the setup.exe file if you install this. I use it to set some registry keys (more on that later) which are needed for the tool to break the book into chapters correctly. I also include the legal information in a header and footer page to meet the requirements of the Gutenberg project.


Another goal is to be able to put an imported book into a specific notebook during import. I did not necessarily want all the new pages to go into the Unfiled Notes section. I create a simple tree control which shows you all the notebooks you have open in OneNote and lets you choose which notebook to add the new work of prose. My tool even makes a guess at the name of the imported book to use as the new section name. If you downloaded "Pride and Prejudice" last week, it should get the name correctly. It's included in the setup.zip file as well so you can use it to test the tool.


And if you don't choose a specific notebook as an import location, I default to Unfiled Notes. Throw in a simple status bar (which uses the number of chapters to import as the percentage complete) and a completion dialog when done, and I'm done!


Now for the limitations. It just so happened that the first few books which I tested were prose books with clearly delineated chapters. Mark Twain and Jane Austen books, specifically. The tool worked great. Then I tried "Ulysses" by Joyce (link: http://www.gutenberg.org/dirs/etext03/ulyss12.txt) and got garbled results. That book doesn't use the word "Chapter" or "CHAPTER" to delineate chapters. It just uses a Roman numeral at the beginning of each chapter. In this case, I could cook up a scheme to look for individual Roman numerals on a line by themselves and use them as chapter breaks. Unfortunately, it gets worse, since this particular text uses a pair of dashes to either side of the Roman numeral as a visual aid to see the chapter break better on screen. I looked around at a few more books and some used a table of contents to give a chapter name (unique for each chapter), no table of contents and unique chapter names, Roman numerals or individual Arabic numbers by themselves and so on.


Then someone internally to Microsoft imported a book of poetry. Ugh. Removing line breaks at the end of each line in poetry makes no sense – that logic is only applicable to prose/paragraph types of text. I don't recommend this tool to import poetry – the formatting gets totally lost, and you wind up with three pages or so of bogus text.


What I decided to do was get it to work for the books I was interested in reading and making comments. "Wuthering Heights" was my final test – if that worked, I could "ship" my tool. It did and I did.


I left this slightly extensible for users without Visual Studio who do not want to re-write or add to the code to get around the limitation of using the word "Chapter" to break out individual chapters. You can add new words to use as separators to the string registry key at HKEY_CURRENT_USER\Software\Guinsoft\GutenWin named "delim". Just add your new keywords you want to use as separators to the end of the list, and use a comma as a separator.


I learned quite a bit about our XML schema when writing this tool. Since English gives writers a free rein to create books in any manner, it's very difficult to guess the author's intention. A side goal of this particular tool was to give me a reason to create a "notebook picker" to choose where to send data I'm adding to OneNote. Let me know if you like this.

 Questions, comments, concerns and criticisms always welcome,



Comments (12)
  1. Back when was working on my text file importer and the Project Gutenberg addin ( http://blogs.msdn.com

  2. bob says:

    How does it work?  Is a buton installed on a toolbar?  If so, one was not installed during set up.

  3. One of our MVPs, Kathy Jacobs (link to her site http://www.lockergnome.com/nexus/callkathy ), asked last

  4. I was reading the newsgroups a few weeks ago and saw this question : "Will onenote run better if I keep

  5. Scoop0901 says:

    Are there directions for how to use this PowerToy to import the texts?  I visit Project Gutenburg often, and though too busy to help proofread as I did years ago, still support its mission, but, more than that, I love the texts available.  Getting them into OneNote, even more simply, would be great!

  6. JohnGuin says:

    Just run the setup program and it adds a shortcut in the start menu.  when you start it, you can choose which file you want to import, and choose which notebook and section you want to use to import the book.  click the "Convert to onenote" button and it lets you know when it is done.

    Hope this helps,


  7. I was getting ready to post this list of the addins the OneNote Test Team worked on in the year 2007

  8. Amanda says:

    Hi there

    I can’t see any note as to why the download for this awesome-sounding powertoy is disabled? Please make it available again (blink, blink)

  9. John says:

    the link to the download is below my signature – the former server got turned off.

    I just tested the download and it worked for me.  Is it failing for you?

  10. Hadi says:

    I'v downloaded the link below your signature. but it has no setup.exe that you mentioned. it was a program that work correctly in my notebook but not in my Desktop PC and also has the source of application.

    I look in my pc registry. there wasn't the Key you said.

    I thank you if you send the answer to my Email: OneNote2007@live.com

    Hadi Mollaei

Comments are closed.

Skip to main content