A quick test and tip for OneNote audio search

Looking through the newsgroup this weekend showed this comment: basically, this person was having some difficulty getting audio search working well. I had some problems with this in the past and wanted to share a tip to possibly get this working better for you.

First, the quality of the audio should be as high as possible. Dan wrote about this during the OneNote 2007 development phase and documents some tips at his blog.

This tip might get you a little further along the line. Here's the testing I performed along with the tip.

The first step was to get an audio file. I downloaded an audio recording of Patrick Henry's "Give me liberty or give me death" speech (read by Richard Shulman) from https://www.history.org/media/audio.cfm. It's 2.76MB and about as high quality as you can get. Expected search results for this file should be very accurate.

I added it to a page and made sure audio indexing was enabled in Tools | Options | Audio and Video:

clip_image001

Since the audio indexer only runs when OneNote is idle, and at low priority, I went away for a day's worth of meetings. That gave the indexer plenty of time to run. The rule of thumb is to let the indexer run for 2-3 times the length of the recording. If you have a day's worth of audio, letting the indexer run overnight is probably a good idea.

Now, I know the words "I know not" (what course others may take) are in this speech, so I searched for them. OneNote found two results:

clip_image002

Already we have some erratic results. The phrase "I know not" is only in the text of the speech at the 7:54 mark. The text of the speech at 7:35 is "idle what is". Still, this shows the non-exact nature of phonetic matching.

Now the tip for what to do if no results are found and you really expected some. If no matches were found, down at the bottom of the results page is the "View More" link to let you change the threshold for audio searches. Click that.

clip_image003

If you had some results found, the UI will look like this:

clip_image004

Obviously, you want to click the "Click here to view matches" linkā€¦

Down at the bottom of this task pane is the dropdown to let you lower the confidence level the audio indexer uses to find a match:

clip_image005

Lowering this threshold should cause more potential results to be found, but the quality of the results may be lower. In other words, you may get more "false positive" results. Likewise, raising it to 0.8 will use a higher threshold and narrow results accordingly. In this case, changing it to .3 gives the results I expected:

clip_image006

Lowering it to 0.3 finds one extra result, this time the word "involatile" at the 5:05 mark. It's confidence level was 0.4, so that explains why it was not shown with the default of 0.5. At 0.1, a result from "there is no longer any room for hope" gets returned at the 4:58 mark. Lowering the threshold did find more results, but with this test file, they were clearly not correct.

One last quick test here. Since audio searching is based on the phonetic representation of words, a phrase like "I know not" should produce the same results as "eye no knot." A search for that second phrase finds:

clip_image007

Which is exactly what I expected. The test passed!

And finally, to answer the final question from the original thread about finding out if audio indexing is finished. An easy way (from the test point of view) is to look in the audio cache folder for an FI file for each embedded audio or video file. That location on my Vista machine is C:\Users\John\AppData\Local\Microsoft\OneNote\12.0\Audio Cache. To help find pages with embedded files, you can get my powertoy.

Questions, comments, concerns and criticisms always welcome,

John