Last month, I wrote about the exciting new auto-extraction capabilities of the OneNote API. Starting today, auto-extraction is available to all developers via the production OneNote API (https://www.onenote.com/api). We’ve also published official documentation, and you can play with these APIs from our API Reference as usual.
Business Card Extraction Improvements
We heard your feedback in the beta release that business card extraction wasn’t working on certain cards. We’ve taken steps to improve this, and we’ll continue to improve it as time goes on. You can still send us a OneDrive/DropBox/etc link to your scanned business cards to OneNoteBizCards@microsoft.com. We’ll use any images submitted to further expand the language and format coverage.
Rehash: Auto-Extraction Capabilities
Last month, we announced Business Card Scanning in Office Lens, and previously, we announced recipe and product clipping with the OneNote Web Clipper. Today, we’re releasing the same underlying auto-extraction magic for use by anyone via the OneNote API https://www.onenote.com/api.
We've designed auto-extraction to be as easy as possible for developers to use. Just include an empty <div> tag with a few additional properties, and the OneNote API will detect whether it can extract its content onto the page and replace the <div> with a simplified rendering of the content being captured.
Here’s an example of scanning a business card image to OneNote with auto-extraction:
Types of Auto-Extraction Currently Available
Content Types Supported (as of Dec 2014)
- Business cards
Scanned Image Extraction
If your app or device captures images to the OneNote API, you can take advantage of business card scanning by including the following markup:
<div data-render-method="extract" data-render-src="name:scanned-image" />
If your app captures web content to the OneNote API, you can take advantage of recipe and product clipping by including the following markup:
<div data-render-method="extract" data-render-src="http://allrecipes.com/recipe/beef-stroganoff-iii/" />
The following business card data is recognized and extracted:
- Phone & fax numbers (made into a tel: link)
- Mailing/physical address (with a link to map it on Bing)
- Email addresses (made into a mailto: link)
In addition, a vCard (.VCF file) with the extracted information is embedded in the page so OneNote users can easily import the contact details into Outlook or their phone’s contact list. The vCard is also a convenient way to recall this information from the OneNote API.
Business card recognition works best for English cards right now, but we plan to improve accuracy in other languages in the coming months.
- Hero image
- Preparation Steps
- Prep time
- Cook time
- Total time
Recipes can be extracted from many top sites such as AllRecipes.com.
The following product detail information is extracted:
- Primary image
When using auto-extraction for your user scenario, you should consider what should happen if the OneNote API is unable to extract anything. By default, if OneNote is unable to extract anything, it will render the image or URL onto the page.
You can control the fallback behavior with data-render-fallback. Note that fallback only occurs if OneNote was unable to extract anything – but if extraction was partially successful or contains inaccuracies, fallback is not invoked.
In general, we recommend including the original image or URL on the page. That way, if the OneNote API can’t extract some or all of the information, an image of the original input is always available to the user on the OneNote page. For example:
<img src="name:scanned-image" />
<img data-render-src="http://allrecipes.com/recipe/beef-stroganoff-iii/" />
Reference for Auto-Extraction
Here’s the full syntax:
data-render-src="URL-to-render | name:Multipart-Message-Part-Name"
- data-render-method is required and must be set to "extract", "extract.businesscard", "extract.recipe", or "extract.product". If your scenario is general purpose, we recommend using "extract" and letting the API automatically detect the content type. If your scenario is limited to a certain content type, you can specify an explicit content type. In certain cases, specifying an explicit type improve results.
- data-render-src can either be an absolute URL or a multipart message part name referencing an image. data-render-src is required.
- data-render-fallback controls what should happen if the OneNote API is unable to auto-extract content. If set to "none" and extraction fails, the <div> tag is ignored and does not result in any OneNote content being generated. If set to "render", the content is inserted in the page as an image as if <img data-render-src="…"> was used. If data-render-fallback is omitted, it defaults to "render".
Try It Now!
To try this now, head over to our OneNote API Console and try one of the above HTML snippets in a create page API call. The API is currently in Beta, and we’d love to hear what you think. You can let us know by leaving a comment on this blog post.
Help make it better
Business card scanning works best on English-based business cards right now, but we plan to add additional language support in the future. You can help our recognition algorithms get smarter:
- Upload your collection of scanned business cards to a folder on OneDrive.com, Dropbox.com, or any other cloud drive.
- Create a sharing link. Here are instructions for: OneDrive and Dropbox.
- Email the sharing link to OneNoteBizCards@microsoft.com.
We’ll only use the images to improve our algorithms.
-Greg, Prasad, Ajitesh, Prashant, Julia, Yan, Donny, & Scott with help from Bing and Microsoft Research