New Beta API: Page Recall

Hey everybody! I'm very happy to announce that we've made a new API available in beta today: page content recall!

You can use this to retrieve the contents of a page in HTML form. Without any further ado, let's get right to it.

GET https://www.onenote.com/api/beta/pages/{pageId}/content

The pageId parameter here is the id returned on a page when querying ~/pages or ~/sections/{sectionId}/pages (see Query Pages Now in Beta for more details).

The HTML returned by the API mirrors the HTML that you would use to create a page (in that it's a subset of your standard HTML5 that can be used inside OneNote pages - see Gareth's overview at HTML in the OneNote API). This means that it should be very easy for you to read pages, re-style them as needed, and render them in your application. The HTML is intended to be a logical, semantic representation of the content, not necessarily a visually perfect representation, so you may want to re-style if you render it directly.

Some highlights of page recall

  • Images and embedded files- since the API response content type is just regular text/html (we didn't want to have to hand you back multipart MIME), image and embedded file data are referenced via URL. These URLs still require authentication (since it's still private content), so they won't render correctly in a browser.

    
    <img src=”https://www.onenote.com/api/beta/resources/{imageId}/$value” /> 
    
    
    <object data=”https://www.onenote.com/api/beta/resources/{fileId}/$value” /> 
    

     

  • OCR text - You can now get back the text that has been OCR'ed from your picture via the API! Here's what that would look like.

    
    <img src="…" data-extracted-text="recognized text" /> 
    

You can try out this API yourself here: OneNote Beta API Apigee console

What this looks like

Here is a (relatively simple) sample OneNote page.

image

And this is what the output HTML for this page looks like.

 <html xmlns="https://www.w3.org/1999/xhtml" lang="en-us">
<head>
 <title>Sample Study Notes</title>
 <meta name="created" content="2014-09-17T14:52:00.0000000" />
</head>
<body>
 <div data-structure="outline" style="position:absolute;width:720px;height:28.8000011444092px;left:48px;top:115.200004577637px">
 <h1 style="font-family:Calibri;font-size:16px;color:#1e4e79">
 <span lang="en-US">American History 101: Moon Landing</span>
 </h1>
 <br />
 <p style="font-family:Calibri;font-size:11px;text-align:left">
 <span lang="en-US">First moon landing - July 20, 1969 with Apollo 11 (Eagle)</span>
 </p>
 <br />
 <p style="font-family:Calibri;font-size:11px;text-align:left">
 <span lang="en-US" style="font-weight:bold">Apollo 11 Astronauts</span>
 </p>
 <table style="border:1px">
 <tr>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">Neil Armstrong</span></p>
 </td>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">Commander</span></p>
 </td>
 </tr>
 <tr>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">Buzz Aldrin</span></p>
 </td>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">LM Pilot</span></p>
 </td>
 </tr>
 <tr>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">Michael Collins</span></p>
 </td>
 <td>
 <p style="font-family:Calibri;font-size:11px;text-align:left"><span lang="en-US">Command Module Pilot</span></p>
 </td>
 </tr>
 </table>
 <br />
 <a href="https://en.wikipedia.org/wiki/File:Apollo_11_insignia.png">
 <img alt="Circular insignia: Eagle with wings outstretched holds olive branch on Moon with earth in background, in blue and gold border." width="20" height="20" src="https://www.onenote.com/api/beta/resources/0-9a5b47d1dc3b41abac7ae175c6baeee3!1-93EDDFFF21CC550B!47985/$value" data-fullres-src="https://www.onenote.com/api/beta/resources/0-9a5b47d1dc3b41abac7ae175c6baeee3!1-93EDDFFF21CC550B!47985/$value" />
 </a>
 <br />
 <a href="https://en.wikipedia.org/wiki/File:Apollo_11.jpg">
 <img alt="Three astronauts in spacesuits without helmets sitting in front of a large photo of the Moon." width="70.5" height="55.5" src="https://www.onenote.com/api/beta/resources/0-bb5dbd48cb5d41bb8dc83296fe5dbbf4!1-93EDDFFF21CC550B!47985/$value" data-fullres-src="https://www.onenote.com/api/beta/resources/0-bb5dbd48cb5d41bb8dc83296fe5dbbf4!1-93EDDFFF21CC550B!47985/$value" />
 </a>
 <br />
 <p style="font-family:Calibri;font-size:11px;text-align:left">
 <span lang="en-US">References: </span>
 </p>
 <p style="font-family:Calibri;font-size:11px;text-align:left">
 <a href="https://en.wikipedia.org/wiki/Apollo_11" lang="en-US">https://en.wikipedia.org/wiki/Apollo_11</a>
 </p>
 <p style="font-family:Calibri;font-size:11px;text-align:left">
 <a href="https://www.nasa.gov/mission_pages/apollo/missions/apollo11.html" lang="en-US">https://www.nasa.gov/mission_pages/apollo/missions/apollo11.html</a>
 </p>
 <br />
 <br />
 </div>
</body>
</html>

Known Beta Issues

Part of releasing this API to Beta is that we know it's not yet perfect. You may have been able to tell that from the sample above. Here are two issues with the output that we plan to fix quickly:

  1. Verbosity - the HTML currently contains some extra tags (usually <span> tags) and style redundancy where they're not really necessary. We're working on cleaning up the output here as well to make this easier to understand the underlying content.
  2. Pretty printing - the HTML above has been cleaned up for this post. Page HTML from the API is currently returned without newlines as a single large block of text. This works fine for parsing, but is a little hard to read (hence the cleanup) if you're trying to debug or just read the response. We're planning on formatting the output to make it a little easier to read by default.

OData Usage

This uri is RESTful, but there's also an OData variation that you can use by switching out the /content portion for /$value instead. This conforms to the OData "raw value" convention, which makes sense for the "raw value" of a page to conform to the content of that page.

OAuth Scopes

Like other query endpoints (hierarchy, page list), page content recall requires permissions to be able to retrieve content. You'll need to have one of these scopes requested to use this new API: office.onenote, office.onenote_update_by_app, office.onenote_update (basically any scope other than office.onenote_create).

Help us make it even better!

As with all the APIs that we release, we're releasing this to Beta first to hear your feedback! Please use it, let us know where you find issues, where you'd like us to tweak it, where we can make it better. Or even if it's great and you just want us to release it in the v1.0 endpoint, we always want to hear more from you. We're planning to leave this feature in Beta for at least a month to hear and iterate on your feedback before releasing this to production.

We on the OneNote API team will continue to work very hard on our feature backlog to enable more APIs for you in the future. You can continue to help guide our backlog based on what you'd like to see!

Leave your feedback as comments here for page recall suggestions.

Post new feature requests or vote on other ideas at our UserVoice.

Let us know if you have any problems or questions about using our APIs on our StackOverflow board.

Thank you for helping us build an awesome API, and as always, happy coding!

-Brian