Bot.Builder.Extensions: Extensions for Microsoft’s Bot Framework

After over a year of using Microsoft's Bot Framework to build bots for myself and others, I felt the community could use a set of extensions that will improve the experience of writing a bot, in particular a bot that works well on speech-enabled channels.

If you're working on a chatbot built on the Bot Framework or a Cortana Skill, Bot.Builder.Extensions makes your job easier. In essence, by doing a couple of Search & Replace operations in your existing bot, you can expect a decent speech-enabled experience on any Bot Framework channel with no regression in how it performs on today's channels that don't support speech.

Where to get it

You can get the Bot Builder Extensions library for your BotFramework bot on NuGet.

Documentation

IDialogContext extensions

IDialogContext.WithSpeech()

This essentially sets your IDialogContext object up to perform well on Bot Framework channels that suppor speech. The best part? This will work on channels that don't support speech as well!

Suggested use: simply search & replace uses of context. throughout your bot with context.WithSpeech(). and you'll get all the magic.

Once you set your context up using WithSpeech() you get a few new methods:

PostAsync(string text, string speak = null, string locale = null, bool asPrompt = true, bool willSayMore = false, bool endConversation = false, CancellationToken cancellationToken = default(CancellationToken))

PostAsync(IMessageActivity msg, string speak = null, string locale = null, bool asPrompt = true, bool willSayMore = false, bool endConversation = false, CancellationToken cancellationToken = default(CancellationToken))

The PostAsync method off your new context object allows you to pass some new parameters to it:

  • speak - this is the text you wish the message to speak out on channels that support speech
  • asPrompt - is this message a prompt to the user? On speech channels this will open the microphone up after sending the message to accept the user's response
  • willSayMore - obviously this is exclusive of asPrompt. Will your bot be immediately sending another message? If so, this will ensure the microphone can't be used.
  • endConversation - is this the last message in the conversation? If so, the bot will send the message to the user, then send an EndOfConversationActivity to the channel. On speech-capable channels, like Cortana, this causes the conversation to end and the next speech sent to the mic to be treated as a new command to the channel (outside your bot).
Example Usage
 private async Task MessageReceivedAsync(IDialogContext context, IAwaitable result)
{
    // want to ask the user a question?
    await context.WithSpeech().PostAsync("<question here>", asPrompt: true);
    context.Wait(MessageReceivedAsync);

    // send the user multiple message, ending with a question?
    await context.WithSpeech().PostAsync(@"<first message>", willSayMore: true)
        .ContinueWith(t => {
            if (t.IsCompleted)
            {
                return context.WithSpeech().PostAsync(@"<second message>", asPrompt: true);
            }

            return Task.CompletedTask;
        });

    // user says 'done'?
    await context.WithSpeech().PostAsync(@"Thanks! Bye!", endConversation: true);
}

IDialogContext.SayAsync(string text, string speak = null, string locale = null, CancellationToken cancellationToken = default(CancellationToken))

Your new SayAsync method has an additional parameter

  • speak - this is the text you wish the message to speak out on channels that support speech.

In addition to letting you specify this, if you don't specify it, .WithSpeech() contexts will automatically speak out the value of the text parameter.

IDialogContext.SupportsSpeech()

true if the channel for the current IDialogContext supports speech. This means usage of .Speak, .InputHint, and .SayAsync() will have an effect.

IDialogContext.EndConversation()

The EndConversation() method built in to the BotBuilder SDK (v3.5.9+) requires a code be sent. This convenience method defaults to EndOfConversationCodes.CompletedSuccessfully (the majority of uses).

Message extensions

These extensions help create a quick and easy way of formulating speech based on the current setup of a message.

IMessageActivity.GetOptionSpeech()

If you present your user with options in the form of CardAction objects (aka Buttons), this method will turn those button options in to SSML you can shove in to .Speak to speak them out. In the event that you don't have any .SuggestedActions on your message, this will look for and use the buttons on the first card in the .Attachments collection

IMessageActivity.GetSpeechForCarousel()

Similarly, if you add a carousel to your message (eg: populate .Attachments with a bunch of cards), this method will turn the cards you put in to the carousel in to speech which you can shove in to .Speak.

Card extensions

These extensions apply to types of cards provided by the BotBuilder SDK. At the time of this writing that includes ReceiptCard, SigninCard, HeroCard, ThumbnailCard, AudioCard, VideoCard, and AnimationCard.

[Attachment|*Card].GetSpeech()

This takes the content of a card and turns in to usable SSML. For most cards, this means taking the Title, Subtitle, and Text properties and putting them together in a spoken paragraph.

For ReceiptCard, this will read out "Title includes number item(s). Your total is total. You can say option speech" (see GetOptionSpeech() above)

For SigninCard, this will read out "Text.Option speech" (see GetOptionSpeech() above) Since the other card types can be, and often are, put in to carousels, the spoken response does not include speech for any CardAction objects (.Buttons) on the cards. If you post one card back in a message and also want its options spoken, add them on by using GetOptionSpeech() on the message itself.

Example
 var card = new HeroCard
{
    Title = "BotFramework Hero Card",
    Subtitle = "Your bots — wherever your users are talking",
    Text = "Build and connect intelligent bots to interact with your users naturally wherever they are, from text/sms to Skype, Slack, Office 365 mail and other popular services.",
    Images = new[] { new CardImage("https://sec.ch9.ms/ch9/7ff5/e07cfef0-aa3b-40bb-9baa-7c9ef8ff7ff5/buildreactionbotframework_960.jpg") },
    Buttons = new[] { new CardAction(ActionTypes.OpenUrl, "Get Started", value: "https://docs.botframework.com/en-us/") }
};

// put the hero card on a reply to the user
var msg = context.MakeMessage();
msg.Attachments.Add(card.ToAttachment);

// let WithSpeech do all the speech work for you
await context.WithSpeech().PostAsync(msg);

// or pull out speech from the message, set it in to the 'Speak' property, and post with the usual APIs

var spokenCard = card.GetSpeech();  // Gets SSML for the text, subtitle, and text of the above-defined hero card

var optionSpeech = msg.GetOptionSpeech();   // Gets SSML from the 'Buttons' property of the first card on the message (the hero card 'card'). In this case: "Get Started"

// Set either 'spokenCard' and/or 'optionSpeech' in to the 'Speak' property on the created message
msg.Speak = spokenCard;
// or
msg.Speak = optionSpeech;

await context.PostAsync(msg);

PromptDialogEx

PromptDialogEx is a prompt dialog built for speech. While the prompts from BotBuilder will act as a prompt from a microphone control standpoint on speech-enabled channels, they won't automatically speak the prompt. This fixes that.

Simply search & replace usage of PromptDialog. with PromptDialogEx. and you're good to go!

SSML Extensions

When you're trying to manipulate the SSML you get back from the Card, Message, and/or Context extensions, it can be challenging. These methods aim to alleviate that.

In the example above I retrieved two sets of SSML. One for the HeroCard instance, and one for the buttons on the hero card after I added it to msg. But how do we shove them both in to .Speak on msg? Simple:

string.CombineSsml(string)

Example

Modifying the bottom portion of code above:

 msg.Speak = spokenCard.CombineSsml(optionSpeech);   // combines the card SSML with the option speech for the message as a whole (in this case 'Buttons' property of the card)

IEnumerable.AsOptionsSsml()

If you didn't want to have to put a card in to a message just to extract the speech for the options you put on the card, you can use this method to get speech for a collection of CardAction instances directly. Using this in conjunction with CombineSsml() we'd augment the above sample like so:

 ...
var spokenCard = card.GetSpeech();
var optionSpeech = card.Buttons.AsOptionSpeech();

var msg = context.MakeMessage();
msg.Attachments.Add(card.ToAttachment);
msg.Speak = spokenCard.CombineSsml(optionSpeech);

Helpers

Finally, here's a set of methods that I think might just help folks out. At this time they're really geared toward working with speech responses.

WrapSsml(string, bool = true)

Give this method a string, and it'll convert it in to a fully-compliant SSML document. This includes the <? xml ... declaration and a root <speak... tag for you.

StripXmlDocDeclaration(string)

If you're trying to put together a bunch of SSMLs (eg: in CombineSsml), you can't have multiple <!--?xml ... tags so here's a method to strip them out if you need to.

StripSpeakTag(string)

Need to combine the contents of multiple SSML documents in to a large SSML document? Use this method to strip out the <speak... tag, leaving you with just the SSML content.