Microsoft Speech API SDK


The Speech API Software Developers Kit (SAPI SDK) contains the documentation, samples, and header and library files to create applications and utilities that use speech recognition and voice synthesis. In addition, the SAPI SDK can be used to create speech recognition and voice synthesis engines that can be used by other applications.

Generally, the version of SAPI is determined by the platform that shipped it. SAPI 5.1 was included with Windows XP along with the Microsoft Sam TTS engine. The initial release of Windows XP did not include a speech recognition engine. The Tablet PC Edition of Windows XP did include version 6.1 of Microsoft’s speech recognition engine. This was also shipped with Office 2003. Office 2003 also included SAPI TTS voices from Lernout & Hauspie, called LH Michael and LH Michelle. Also note that some vendors include SR and TTS engines with their products. For example, my laptop came with speech recognition and TTS engine provided by Toshiba.

With Windows Vista, the version of SAPI that is installed is 5.3. We have replaced the Microsoft Sam voice with next generation technology in a new female voice we call Microsoft Anna. We have also made major improvements to the speech recognition engine (now version 8.0) and that is included in all editions of Windows Vista.

For the SDK, you can download the SAPI 5.1 SDK to create applications and engines that work on Windows XP and Windows Server 2003. These applications or engines should also be forward-compatible with SAPI 5.3 on Windows Vista and beyond. The SAPI 5.1 SDK is a stand-alone package, separate from other Microsoft SDK’s.

With SAPI 5.3, we integrated our SDK into the main Windows SDK (sometimes known as the Platform SDK). You can use the Windows SDK to create applications for Windows Vista, Windows XP, and Windows Server 2003. What OS version you target is done at compile-time, and that prevents features that only exist in latter versions from being available.

You can get an ISO image to burn the SDK to a DVD here:

http://www.microsoft.com/downloads/details.aspx?familyid=7614FE22-8A64-4DFB-AA0C-DB53035F40A0

To selectively download and install various components of the Windows SDK, go here:

http://www.microsoft.com/downloads/details.aspx?FamilyId=C2B1E300-F358-4523-B479-F53D234CDCCF

Something else that is new is our Managed Speech API’s. Codenamed SpeechFX, the Managed Speech API is part of the Microsoft .NET Framework 3.0. The new System.Speech namespace provides managed classes for speech recognition and synthesis. This makes it much easier to write speech applications from managed code, such as C# or Visual Basic .NET.

The Managed Speech API documentation is included with the Windows SDK. Applications that use .NET Framework 3.0 will work on Windows Vista, Windows XP and Windows Server 2003. Note that you have to redistribute the .NET Framework 3.0 with your application for Windows XP and Windows Server 2003. The framework is already included with Windows Vista.

Comments (10)

  1. sergiman says:

    hello mister.

    first of all: i love windows vista, you guys invested alot of hard work and the results are totally amazing!

    do you know where i can find a german(austria or germany) language pack for microsoft speech recognizer 8.0 ?

    thank you for your time

  2. ITag says:

    Hi,

    I can’t get any system.speech recognition events to fire in Vista RTM. The same assembly runs fine in XP.

    I’ve tried posting to MSDN forums but I just get stuck on the preview page without any error saying why.

    Here is the usenet post I made.

    http://groups.google.com/group/microsoft.public.speech_tech.sdk/browse_frm/thread/cd0ae93618a9e3b6

    Any help would be greatly appreciated.

    thanks.

  3. VOE says:

    I haven’t worked with the SDK yet. I was just working with Adobe reader 7.0 today in the reed out loud mode, reading an Introduction to VB.NET PDF free e-book.

     Boy! is the Sam I am bad. Reading the table of contents, it waits for all the points in the listing then says “point –next line in table”

    They should stop making table of contents in the old ways and put the page number first so the poor lil ol narrator can get it right.

    Then in regular pages, can you imagine how Sam reads visual basic code out loud?

     Your page in fact does not render for Sam unless I need to enable something. It did reed this. Thank you for putting up with spelling mistakes so Sam I am can reed it right.

  4. Dave I says:

    I am writing an application targeting XP and Vista and would love to have Anna available on XP (I know it comes with Vista.)

    Is Anna available as a redistributable?  Can I distribute Anna with my application somehow. Please?  

    Anna is such a massive improvement to previous microsoft TTS offerings I really hope so.

    I know many other developers are pleading for this as well.

  5. MSDNArchive says:

    Dave, thanks for writing.  At present there are no plans to make the Microsoft Anna engine available for downlevel operating systems.

    You’re feedback is helpful and it’s possible we’ll make it available in the future.

    In the meantime, there are a number of third-party SAPI TTS engines available, many with high quality.

  6. Dave I says:

    Thanks Charles your feedback is disappointing but appreciated 🙂

    As a lone developer budget is no existent, so "Mike" is the best of a bad bunch I suppose.

    Can you tell me if this is possible…  I would like to highlight each phoneme of the word as it is spoken.  The bouncing ball of a karaoke but on a word basis, and not as cheesy obviously.  I can handle visualisation bit.

    I’m using managed c#. Browsing through MSDN, SpeakProgressEventArgs exposes a character position aha I thought.  But the progress only seems to be invoked between words.  

    Aside from subscribing to PhonemeReached event and mapping the phoneme Text to written text any suggestions.  Much appreciated.

  7. Anais says:

    What about the reverse? Is it possible to use Microsoft Sam on a Vista system? Installing the 5.1 sdk on Vista Home Premium yields "Speak Error: ActiveX component can’t create object" when Mary or Sam are selected.

    Anna is not usable for my purposes due to faulty interaction of <pron> tags with punctuation, as in: Speak this <pron sym="t eh 1 k s t">text</pron> . Now speak this.

  8. kite says:

    Hi~Where can I get the SAPI 5.3 sample codes?Thx~

  9. Anurag says:

    Hi…

    I m working on an application that requires speech to text conversion. Can i automate speech to text feature of microsoft office 2003 using c# .net.

    I need help urgently

    thanking you

    Anurag

  10. Anurag says:

    Hi…

    I m working on an application that requires speech to text conversion. Can i automate speech to text feature of microsoft office 2003 using c# .net.

    I need help urgently

    thanking you

    Anurag