New! API to Auto-Extract Business Cards, Recipe URLs, and Product URLs Now in Beta!


Hi there, Greg Akselrod here with an exciting new feature for the OneNote API. Earlier today, we announced Business Card Scanning in Office Lens, and previously, we announced recipe and product clipping with the OneNote Web Clipper. Today, we're also opening up this functionality to OneNote developers via the OneNote Beta API (https://www.onenote.com/api/beta/).

We've designed auto-extraction to be as easy as possible for developers to use. Just include an empty <div> tag with a few additional properties, and the OneNote API will detect whether it can extract its content onto the page and replace the <div> with a simplified rendering of the content being captured.

Here’s an example of scanning a business card image to OneNote with auto-extraction:

<div 
data-render-method="extract"
data-render-src="name:scanned-image" />

image[14]

Types of Auto-Extraction Currently Available

Input Type

Content Types Supported (as of Dec 2014)

Scanned Images

- Business cards

URLs

- Recipes
- Products

Input

Scanned Image Extraction

If your app or device captures images to the OneNote API, you can take advantage of business card scanning by including the following markup:

<div data-render-method="extract" data-render-src="name:scanned-image" />

URL Extraction

If your app captures web content to the OneNote API, you can take advantage of recipe and product clipping by including the following markup:

<div data-render-method="extract" data-render-src="http://allrecipes.com/recipe/beef-stroganoff-iii/" />

Result

Business Cards

image[14]

The following business card data is recognized and extracted:

  • Name
  • Title
  • Organization
  • Phone & fax numbers (made into a tel: link)
  • Mailing/physical address (with a link to map it on Bing)
  • Email addresses (made into a mailto: link)
  • Websites

In addition, a vCard (.VCF file) with the extracted information is embedded in the page so OneNote users can easily import the contact details into Outlook or their phone’s contact list. The vCard is also a convenient way to recall this information from the OneNote API.

Business card recognition works best for English cards right now, but we plan to improve accuracy in other languages in the coming months.

Recipes

The following recipe information is extracted:image

  • Title
  • Hero image
  • Ingredients
  • Preparation Steps
  • Prep time
  • Cook time
  • Total time

Recipes can be extracted from many top sites such as AllRecipes.com.

Products

image

The following product detail information is extracted:

  • Title
  • Rating
  • Primary image
  • Description
  • Features
  • Specifications

Products can be extracted from a number of top sites such as Amazon.com, HomeDepot.com, and Sears.com.

Fallback behavior

When using auto-extraction for your user scenario, you should consider what should happen if the OneNote API is unable to extract anything. By default, if OneNote is unable to extract anything, it will render the image or URL onto the page.

You can control the fallback behavior with data-render-fallback. Note that fallback only occurs if OneNote was unable to extract anything – but if extraction was partially successful or contains inaccuracies, fallback is not invoked.

In general, we recommend including the original image or URL on the page. That way, if the OneNote API can’t extract some or all of the information, an image of the original input is always available to the user on the OneNote page. For example:

Business cards

<div 
data-render-method="extract"
data-render-src="name:scanned-image"
data-render-fallback="none" />
<img src="name:scanned-image" />

Recipe and Product URLs
 
<div 
data-render-method="extract"
data-render-src="http://allrecipes.com/recipe/beef-stroganoff-iii/"
data-render-fallback="none" />
<img data-render-src="http://allrecipes.com/recipe/beef-stroganoff-iii/" />

Reference for Auto-Extraction

Here’s the full syntax:

<div 
data-render-method="extract"
data-render-src="URL-to-render | name:Multipart-Message-Part-Name"
[data-render-fallback="render|none"] />
  • data-render-method is required and must be set to "extract", "extract.businesscard", "extract.recipe", or "extract.product". If your scenario is general purpose, we recommend using "extract" and letting the API automatically detect the content type. If your scenario is limited to a certain content type, you can specify an explicit content type. In certain cases, specifying an explicit type improve results.
  • data-render-src can either be an absolute URL or a multipart message part name referencing an image. data-render-src is required.
  • data-render-fallback controls what should happen if the OneNote API is unable to auto-extract content. If set to "none" and extraction fails, the <div> tag is ignored and does not result in any OneNote content being generated. If set to "render", the content is inserted in the page as an image as if <img data-render-src="…"> was used. If data-render-fallback is omitted, it defaults to "render".

Try It Now!

To try this now, head over to our OneNote Beta API Console and try one of the above HTML snippets in a create page API call. The API is currently in Beta, and we’d love to hear what you think. You can let us know by leaving a comment on this blog post.

Help make it better

Business card scanning works best on English-based business cards right now, but we plan to add additional language support in the future. You can help our recognition algorithms get smarter:

  1. Upload your collection of scanned business cards to a folder on OneDrive.com, Dropbox.com, or any other cloud drive.
  2. Create a sharing link. Here are instructions for: OneDrive and Dropbox.
  3. Email the sharing link to OneNoteBizCards@microsoft.com.

We’ll only use the images to improve our algorithms.

 

-Greg, Prasad, Ajitesh, Prashant, Julia, Yan, Donny, & Scott with help from Bing and Microsoft Research

Comments (3)

  1. Nicole Kergy says:

    Some of these are really inspiring. I am pretty sure those awesome business cards from <A HREF="http://www.copycatprint.com.au/">printers brisbane</A>. Thank you so much!

  2. GregAk says:

    Hi JL- What country are your business cards from? The business card extractor works best with US English cards right now.

    Also, are you using data-render-method="extract" or "extract.businesscard"? When specifying extract.businesscard, the extractor is a bit more aggressive in the extraction — it emits a result as long as it's able to extract a name and any one additional piece of information. When using bare "extract", the extractor requires name + email and any one additional piece of information. This is to ensure that "extract" can be used for general scanning purposes without generating false positives.

    This is a new beta API and we're open to your feedback — let us know how we can do better! Also, if you email the business cards you're trying to scan to OneNoteBizCards@microsoft.com, we'll use them to improve the extractor.

    Thanks!

    Greg

  3. JL says:

    Having mixed experiences with auto extraction.  Some cards works very well.  Others just appear as picture and no contact card extracted, with no obvious reason why it would work for one card but not another.  

    Is there a way in OneNote to trigger the logic to re-evaluate that picture as a contact?  Even partially extracted results is better than nothing.  (OneNote file is in OneDrive personal).

Skip to main content