Audio transcriptions and annotations with OneNote

I just got a great tip from a colleague at work (Richard Sprague) that I wanted to share. Since it is related, I thought I’d also include a tip I got earlier from Barry Brahier.

Richard came up with a way to transcribe audio recordings into OneNote! The recording can be one made in OneNote or be from anywhere really (such as from a solid state recorder and uploaded to your PC). Of course you can put this transcription into any app that supports speech input – I just happen to like OneNote for this.

CAVEAT: Your recording has to be excellent for the transcription to be anything other than gibberish: high sample rate and bit rate, no background noise, clean crisp audio environment (no echoes, background chatter). This means recordings of meeting room environments through your built-in laptop mic are NOT going to work.

Let’s say for example that you have a voice recording that you made using a dedicated recorder and want to put a transcription of it into OneNote (maybe in support of meeting minutes and other notes).

Follow Richard’s instructions to set up your audio input for speech. In OneNote, place the insertion point on the page where you want the text to appear. Using the speech TIP (available on Tablet PC or Vista), set the dictation mode ON. Here’s where to find some of these settings on a Tablet PC (use the settings in order 1, 2, 3):

Now play back the audio as you would normally. Voilà, text appears in OneNote that is vaguely similar to a transcription (see caveat above).

If you want to use recordings made in OneNote, be aware that the default recording quality for OneNote is not meant for speech recognition. We use a voice codec and bit rate/sample rate designed to compress spoken word audio as small as can be while still usable by human beings. In OneNote 2007 we increased the settings slightly to make audio search work better, but speech recognition (transcription) requires a much higher level of quality.

To set up your future recordings in OneNote to be transcribable, first go to Tools/Options/Audio and Video. Switch the codec to Windows Media Audio 9.1 Professional. If this isn’t available consider downloading the latest set of codec for Windows Media (should come with WM Player 10). Otherwise just pick the highest settings available (e.g. 44Khz, 440kbps) for now – you can experiment with lower settings later.

Once your recording is made you can use Richard’s tip to transcribe it later. Just place the insertion point on the page then press the play button for the recording in OneNote. What’s neat of course is that if you have say a 1hr recording with linked audio notes, you can press the audio playback icon next to the notes you wrote and get a transcription of just a portion of your recording starting at that point – no need to transcribe the whole hour just to get the answer to a question, for example.

Barry Brahier sent me a tip on how to use the linked audio notes feature of OneNote using a pre-made recording. Here’s his tip. The basic idea is similar to what Richard came up with. Essentially you loop the playback of your existing recording through the sound mixer in your PC and re-record it into OneNote, where you can apply linked audio annotations as you would if you were doing the recording directly into OneNote in the first place. Thanks for the great tip Barry!

Update: with OneNote 2007, you can annotate existing audio just by typing/writing new text/ink while the audio is playing. Anything created while it is playing is linked. You can add more annotations to existing audio recordings or annotate a new recording.

Another Update: I received several questions asking how to do this if you don’t have Vista or a Tablet PC. You can get the Microsoft Speech Recognition tools another way. If you have Office XP or Office 2003, try Tools/Speech in Word and follow the instructions. After that the speech tools are installed and available on the floating “language bar”. You can also download speech recognition for free at:

but note that the recognizer in the free download version is not as good as the one we built for Office.

Comments (21)

  1. Erik Paul says:

    I don’t think that I will be able to take advantage of the transciption.  Our classroom has a computer that records the lectures and then posts them online via FTP.  The recordings are pretty bad.  BUT, I will definately take advantage of Barry’s tip.

    Is it possible to add an audio annotation after the initial annotation?  Suppose I’m in class and I take notes on my tablet.  Then, after class I am reading my notes and I listen to a certain section again.  I realize that I missed writing down an important concept, so I write it down.  Is it possible for me to link that annotation to the audio?

  2. Ben says:

    thank you for spaming the first comment!

  3. (That wasn’t spam; that was a trackback.)

    I wonder if this would work better with podcasts or radio broadcasts (which have clearer quality). Would be great for people studying a foreign language.

  4. audionote says:

    Now I understand why we can’t see the keywords used for audio search in OneNote — low audio quality.

    In that case I would love to see an option to store higher-quality audio notes (like the option to store pressure sensitivity information for better-quality inking) to enable a better experience when using audio notes.  

    That would (in theory) enable transcriptions from audio notes and better audio text searches, right?

    I would love to see this process refined so it’s possible to paste audio notes from a dedicated voice recorder into OneNote without going through all this workaround stuff.  Automatic transcription for search and reading would be great, and would complement the rest of OneNote well.  

    It would also be great to be able to play a pasted-in audio note directly in OneNote and take ink notes along with it instead of going through this workaround.  

    With OneNote Mobile on a PocketPC/SmartPhone and voice notes, that would be a great step.  

  5. audionote: You already have the option to store higher quality audio notes – I included a picture above where you can change the setting. The audio search doesn’t get better linearly with audio quality. It has more to do with what quality the system is trained with. We trained our audio search to work with the quality of recording we use by default. It doesn’t get worse when you improve the recording quality but it doesn’t get way better either. Speech rec is trained on higher quality audio because it can’t be made to work well without that high quality. Another criteria for audio notes is that the audio not take up a lot of space so going with high compression gets us 5MB/hour vs high quality which is 5-10x of that.

    Including transcription in OneNote directly is something we’d like to do but at the moment the results you get don’t match people’s expectations and I am not keen to introduce features that underdeliver vs people’s expectations. I’m hoping that this "hack" will get more people trying it and giving us feedback on whether it works well enough for them.

    As I mentioned in the "update" at the end of the post you can now (in 2007 Beta 2 coming up) just drag an audio recording into OneNote 2007, start playback, and take notes – that will automatically link to the audio.

  6. This looks very useful for me. I find that while the audio recordings in OneNote are very useful because of the time-linking with notes that I have problems with the sound of my writing or typing also being recorded – and the Lenovo X41 battery isn’t good enough for me to record and take notes all day – so I often use a Sony memory stick recorder. Just being able to replay and link the audio in will be good.

    On the high quality recording issue. Could there be an option to record at high quality for the recognition – and then autoamtically downsample to the smaller files to keep storage down, perhaps done in the idle time like the re-indexing already is?

  7. Andrew Brown says:

    I’m a journalist and was sold on onenote largely by the way that annotaitons are automatically linked into audio recordings. I can see this would be more useful if I had a tablet PC, but it is useful enough on an ordinary thinkpad.

    None the less, I’d be astonished if speech recognition was as useful as people are hoping. Transcription is hard, even for trained audio typists. Accents make it much harder — my wife, like me, is British, and she has had huge trouble transcribing tapes/files of Americans talking. The BBC has some of the best transcribers you can hope for anywhere, and even they have trouble with foreign accents or languages. So, although this feature sounds really cools, I doubt it will be half as much use in practice as the drawing tools and the calculator, which much more closely approach the idea of "intelligent paper"

  8. Mary: there could be such an option – however we don’t have time to add it, sorry. I’m not sure exactly what the quality settings for the recording have to be to get good quality speech-to-text. They probably don’t have to be as high as I showed above.

    Andrew:I completely agree – hence my caveat above, and why we have not prioritized this feature. However, it is worth noting that as long as your expectations are low and not "human transcriber" quality and you just want 80% correct or so then it actually can give you the gist of a recording, or the start of a transcription. The Tablet PC also offers a "UK English" recognition engine for speech. Of course with the variety of accents in the UK this is no panacea but it represents a recognition engine developed using samples collected from a cross section of the population in the UK. The US one is based on a collection of accents too, but it tends to do better on the midwest or mid-atlantic accents (the bland, newscaster accent – or what most Canadians like me speak :-))

  9. erik paul says:

    Chris,  my study partners have all bought tablets, principally based on these audio recordings.  I keep talking up 2007 and the wonderful features of indexing the audio recordings.  They keep asking me if the audio and text indexing will be able to recognize medical terms (we are medical students).  This is very important to us.  I know that you can get dictionary add ons for the office suite.  Any ideas?

  10. erik paul: good question. The audio indexing does not use a "dictionary". It converts typed phrases to phonetic equivalents (based partly on a dictionary but also on rules to handle words it doesn’t know). It then matches on phonemes that appear in the audio stream (matches their wave forms). So it may very well work for medical terms. It’s probably best if you try it yourself in beta 2.

    Speech recognition is different. Since it has to produce text, it does need a dictionary. The current engine in Tablet PC does not include specialized terms. I’m afraid I don’t know much about that area, but a great person to ask is Richard Sprague whom I linked to at the top of the article.

  11. Nancy says:

    How useful is this for, say, a doctor? I am looking at applying OneNote for notes/dictation. Can it hold 4-8 hours of continuous patient dictation and transcribe it into text? Ideas, comments appreciated. Thanks

  12. Nancy, OneNote can hold hundreds of hours of dictation at super high quality (you can record a year of audio at the normal quality). The transcription process works by playing back the audio, so it takes as long to transcribe as the recording takes. As I mentioned, its important to try this out to see if you can get the quality you need.

  13. Barry_Lewis says:

    Chris, any idea whether Microsoft has considered adding the "Presenter View" to Onenote? This would make note taking in a seminar great, rather than trying to add extra slides to powerpoint anytime you wanted the class to create a list of something. It would also help in later exporting the list and returning to the attendees. Doesn’t look like any of MS products other than PoewrPoint offer this option.


  14. rose says:

    DB CD Burner & Ripper – A powerful music creation tool to create audio and data CDs easily. It works also as a CD ripper to help you rip your CDs.

  15. I enjoy reading your blog. I use OneNote and am looking forward to the introduction of OneNote Mobile for the pocket PC.

    I use Dragon NaturallySpeaking speech recognition to input text into OneNote. If I’m in a hurry, I often initially enter things with pen input and later use speech recognition to tidy it up.

    See the following post in my blog:-


    Speech Empowered Computing

  16. Andrew Brown says:

    Very low tech thing that would make onenote much easier to use — keyboard shortcuts for the audio playback. If these exist, I can’t find them in thehelp system, yet when I am "fleshing out" the notes I have made during an interview with more detailed transcriptions from the audio, I really don’t want to have to keep grabbing the mouse to  pause the playback; it would also be nice to have keys to move the recording back 10 seconds.

    (and a pony)

    But, really, these would be the kind of low-tech tweaks that make a lot of difference.

  17. I don’t know about you, but when I was in college, there were times I wish I could have sent someone

  18. Ultracet. says:

    Extracting acetaninophen from ultracet. Ultracet.