How to Extend Your App with Talking Maps

In a previous blog post we had a lap around the new support for Custom Geospatial Data in the Bing Spatial Data Services (SDS). This time around we will build upon that tutorial and extend the app so that we can talk to it and have it talk back.

Check out the video to see and hear what we’re going to build.

In order to achieve this, we leverage the Bing Speech Recognition Control for Windows 8.1 as well as the Windows 8.1 SDK for speech synthesis.

The documentation for the Bing Speech Recognition Control contains detailed instructions on how to register and install the control and how to enable a project for speech recognition so we won’t dive too deep into this. Instead we start with our previous project assuming that

  • You signed up for the Bing Speech Recognition Control in the Windows Azure Marketplace
  • You registered an application and created a Client ID and Client Secret in the Azure Marketplace
  • You downloaded and installed the Bing Speech Recognition Control
  • You downloaded and installed the Windows SDK for Windows 8.1

Speech-Enabling Our Project

Once we are all set up, we open the project that we created for the previous blog post and add references to Bing Speech and the Visual C++ 2013 Runtime.

The Visual C++ Runtime requires that we compile the project for individual platforms rather than for all CPUs at the same time. Therefore, we open the Configuration Manager and select our first platform (here x64).

We also need to modify the package.appxmanifest from the code-view. By adding the capability “microphone” and an Extensions section just below the Capabilities.

<Capabilities>
  <Capability Name="internetClient" />
  <DeviceCapability Name="microphone" />
  <DeviceCapability Name="location" />
</Capabilities>
<Extensions>
  <Extension Category="windows.activatableClass.inProcessServer">
    <InProcessServer>
      <Path>Microsoft.Speech.VoiceService.MSSRAudio.dll</Path>
      <ActivatableClass ActivatableClas-sId="Microsoft.Speech.VoiceService.MSSRAudio.Encoder" ThreadingModel="both" />
    </InProcessServer>
  </Extension>
  <Extension Category="windows.activatableClass.proxyStub">
    <ProxyStub ClassId="5807FC3A-A0AB-48B4-BBA1-BA00BE56C3BD">
      <Path>Microsoft.Speech.VoiceService.MSSRAudio.dll</Path>
      <Interface Name="IEncodingSettings" InterfaceId="C97C75EE-A76A-480E-9817-D57D3655231E" />
    </ProxyStub>
  </Extension>
  <Extension Category="windows.activatableClass.proxyStub">
    <ProxyStub ClassId="F1D258E4-9D97-4BA4-AEEA-50A8B74049DF">
      <Path>Microsoft.Speech.VoiceService.Audio.dll</Path>
      <Interface Name="ISpeechVolumeEvent" InterfaceId="946379E8-A397-46B6-B9C4-FBB253EFF6AE" />
      <Interface Name="ISpeechStatusEvent" InterfaceId="FB0767C6-7FAA-4E5E-AC95-A3C0C4D72720" />
    </ProxyStub>
  </Extension>
</Extensions>

Changing the UX

In the header of default.html, we add references to the Bing Speech Control.

<!-- Bing Speech -->
    <link href="/Bing.Speech/css/voiceuicontrol.css" rel="stylesheet" />
    <script src="/Bing.Speech/js/voiceuicontrol.js"></script

In the body of default.html, we replace the existing buttons with a button to initiate the Bing Speech Recognizer and a div element to host the Bing SpeechRecognizerUx

<div id="divPanel" >
    <div id="divSpeechControl" data-win-control="BingWinJS.SpeechRecognizerUx"></div>
    <input id="btnSpeech" type="button" value="Talk to Me" />
</div>

We also add a style for this new button in the default.css.

#btnSpeech {
    position: relative;
    left: 0px;
    top: 0px;
    margin: 10px;
    background-color: black;
    width:330px;
}

Modifying the JavaScript

In the default.js we add a few more global variables for the Bing Speech credentials and the Speech Recognizer.

// Bing Speech 
var speechRecognizer = null;
var bsClientID = "YOUR_BING_SPEECH_ID";
var bsClientSecret = " YOUR_BING_SPEECH_SECRET";

In the app.onactivated event we replace the event-listeners for the removed buttons with one for btnSpeech.We also define the Bing Speech Recognizer. The modified function looks like this:

app.onactivated = function (args) {
        if (args.detail.kind === activation.ActivationKind.launch) {
            if (args.detail.previousExecutionState !== activa-tion.ApplicationExecutionState.terminated) {
            } else {
            }
            args.setPromise(WinJS.UI.processAll().done(function () {
                var btnShowPanel = document.getElementById("btnShowPanel");
                btnShowPanel.addEventListener("click", togglePanel, false);
                divPanel = document.getElementById("divPanel");

                var btnSpeech = document.getElementById("btnSpeech");
                btnSpeech.addEventListener("click", talkToMe, false);
                var credentials = new Bing.Speech.SpeechAuthorizationParameters();
                credentials.clientId = bsClientID;
                credentials.clientSecret = bsClientSecret;
                speechRecognizer = new Bing.Speech.SpeechRecognizer("en-US", creden-tials);

                document.getElementById("divSpeechControl").winControl.tips = new Array(
                    "For more accurate results, try using a headset microphone.",
                    "Speak with a consistent volume.",
                    "Speak in a natural rhythm with clear consonants.",
                    "Speak with a slow to moderate tempo.",
                    "Background noise may interfere with accurate speech recognition."
                );

                Microsoft.Maps.loadModule("Microsoft.Maps.Map", { callback: getMap, cul-ture: "en-US", homeRegion: "US" });
            })
            );
        }
    };

Finally, we add a new function talkToMe that handles tapping or clicking the speech-button. When this event fires, we initialize the Speech Recognizer and evaluate the result. Depending on keywords that we recognize, we synthesize a response and fire a corresponding function as defined in our previous tutorial.

function talkToMe() {
    document.getElementById("divSpeechControl").winControl.speechRecognizer = speechRec-ognizer;
    speechRecognizer.recognizeSpeechToTextAsync().then(function (result) {
            if (typeof (result.text) == "string") {
                //document.getElementById("divResultText").innerHTML = result.text;

                // The object for controlling and playing audio.
                var audio = new Audio();

                // The object for controlling the speech synthesis engine (voice).
                var synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

                if (result.text.indexOf("locate") > -1) {
                    synth.synthesizeTextToStreamAsync("Locating You.").then(function (markersStream) {
                        var blob = MSApp.createBlobFromRandomAccessStream(markersStream.ContentType, markersStream);
                        audio.src = URL.createObjectURL(blob, { oneTimeOnly: true });
                        audio.play();

                        locateMe();
                    });
                }

                else if (result.text.indexOf("district") > -1) {
                    synth.synthesizeTextToStreamAsync("Searching Bing Spatial Data Ser-vices for school district.").then(function (markersStream) {
                        var blob = MSApp.createBlobFromRandomAccessStream(markersStream.ContentType, markersStream);
                        audio.src = URL.createObjectURL(blob, { oneTimeOnly: true });
                        audio.play();

                        getBoundary();
                    });
                }

                else if (result.text.indexOf("schools") > -1) {
                    synth.synthesizeTextToStreamAsync("Searching Bing Spatial Data Ser-vices for school sites.").then(function (markersStream) {
                        var blob = MSApp.createBlobFromRandomAccessStream(markersStream.ContentType, markersStream);
                        audio.src = URL.createObjectURL(blob, { oneTimeOnly: true });
                        audio.play();

                        getPOI();
                    });
                }

                else if (result.text.indexOf("awesome") > -1) {
                    synth.synthesizeTextToStreamAsync("I know.").then(function (mark-ersStream) {
                        var blob = MSApp.createBlobFromRandomAccessStream(markersStream.ContentType, markersStream);
                        audio.src = URL.createObjectURL(blob, { oneTimeOnly: true });
                        audio.play();
                    });
                }

                else if (result.text.indexOf("built") > -1) {
                    var Ssml = "<speak version='1.0' " +
                        "xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>" +
                        "Justin, ,<phoneme alphabet='x-microsoft-ups' ph='S1 P R AH . M I L'></phoneme>, Yi, Gevorg, Doug, and <phoneme alphabet='x-microsoft-ups' ph='S1 P R AH S I . D AH'></phoneme>." +
                        "<break time='500ms' />" +
                        "These guys rock" +
                        "</speak>";
                    synth.synthesizeSsmlToStreamAsync(Ssml).then(function (markersStream) {
                    var blob = MSApp.createBlobFromRandomAccessStream(markersStream.ContentType, markersStream);
                    audio.src = URL.createObjectURL(blob, { oneTimeOnly: true });
                    audio.play();
                    });
                }
            }
            else {
                // Handle quiet or unclear speech here.
            }
        },
        function (error) {
            // Put error handling here.
        }
    )
}

And that’s already it. Run your app and check it out. You’ll find the complete source code here.

- Bing Maps Team

TalkingMapsThumbnail.png