連続音声認識を使った音声入力のUWPサンプル

Article
07/22/2016

#wpjp #wpdev_jp #win10jp #w10mjp

Windows 10 には Cortana がいます。Cortana は音声認識とAIと検索によって構成されているWindows 10 の機能の一つですが、この音声認識の部分は Windows 10 のOSからも提供されており、アプリケーションの中で単独で使うことが出来ます。

■UWPでの音声認識エンジン

この音声認識のためのAPIがSpeechRecognizer クラスです。

このSpeechRecognizer には2つの認識方法があって、1つは認識エンジンが起動し、認識し、終了するという1文ごとに使うものと、Cortana の待ち受けがそうであるように、話したことをずっと聞いていてその都度認識して結果を出すというもの。今回はこの連続認識（Constraint）の使い方の実装です。

ということで、簡単なサンプルを作ってみました。Visual Studio を使ってUWPのアプリケーションを作ってみてください。

■もっとも簡単な音声認識の実装 UI

MainPage.xaml のUI側には以下の2行を追加します。上が結果を表示するためのもの。したは途中経過を表示するもの。

     <Grid Background="LightGray">
        <TextBlock x:Name="output" FontSize="24" Padding="10" Text="" Margin="0,0,0,54" TextWrapping="Wrap"></TextBlock>
        <TextBlock x:Name="textBlock1" Foreground="White" TextAlignment="Center" FontSize="24" Text="Waiting..." VerticalAlignment="Bottom" Margin="10"/>
    </Grid>

■音声認識の実装とその処理内容。

続けてMainPage.xaml.cs には以下のコードを追加します。初めは SpeechRecognizer のインスタンスを作って、Constraintsとしてコンパイルします。あとはStartAsyncで認識開始します。認識中は逐次HypothesisGenerated イベントが起動しますので、その都度画面に認識内容を表示します。話し終わってしばらく空くとResultGenerated イベントが起動するのでここで最終決定として文を処理（画面に表示）します。

ちなみに初期化時に SpeechRecognizer(new Language("en-US")); と書くと話したことを英語として認識します。（掘った芋いじったな→ What time is it now ? ）しかしOSに英語音声のライブラリがないとException 起こすので注意です。

画面表示の際に dispatcher.RunAsync を使っていますが、これは認識時はUIとは別スレッドで処理が行われているため、UIに対して直接操作ができないのです。そこでdispatcher を使ってプライマリスレッドであるUIスレッドに対してアプローチして、その中でUIに書き込んでいます。非同期処理から画面を操作するときにはよく使う方法です。

 public sealed partial class MainPage : Page
{
    public MainPage()
    {
        this.InitializeComponent();
    }

    //連続音声認識のためのオブジェクト
    private SpeechRecognizer contSpeechRecognizer;
    private CoreDispatcher dispatcher;


    protected async override void OnNavigatedTo(NavigationEventArgs e)
    {
        //ハックグラウンドスレッドからUIスレッドを呼び出すためのDispatcher
        dispatcher = CoreWindow.GetForCurrentThread().Dispatcher;
            
        //初期化
        contSpeechRecognizer = 
            new Windows.Media.SpeechRecognition.SpeechRecognizer();
        await contSpeechRecognizer.CompileConstraintsAsync();

        //認識中の処理定義
        contSpeechRecognizer.HypothesisGenerated += 
            ContSpeechRecognizer_HypothesisGenerated;
        contSpeechRecognizer.ContinuousRecognitionSession.ResultGenerated +=
            ContinuousRecognitionSession_ResultGenerated;

        //認識開始
        await contSpeechRecognizer.ContinuousRecognitionSession.StartAsync();

    }

    private async void ContSpeechRecognizer_HypothesisGenerated(
        SpeechRecognizer sender, SpeechRecognitionHypothesisGeneratedEventArgs args)
    {
        //認識途中に画面表示
        await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () => 
        {
            textBlock1.Text = args.Hypothesis.Text;
        });
    }

    private async void ContinuousRecognitionSession_ResultGenerated(
        SpeechContinuousRecognitionSession sender, SpeechContinuousRecognitionResultGeneratedEventArgs args)
    {
        //認識完了後に画面に表示
        await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
        {
            textBlock1.Text = "Waiting ...";
            output.Text += args.Result.Text + "。\n";
        });
    }

}

起動したらこんな感じ。勿論 Windows 10 Mobile でも使えます。

音声認識の品質は実はマイクの感度に大きく影響します。de:code 2016 のキーノートで、Bridgestone のデモの際に有線マイクを使っていたのはその為で、あの会場ではBluetooth ヘッドセットはかなり品質的には苦しかったんです。ホテルの部屋では結構大丈夫だったんですけどね。

なので、マイクの品質チェックには気を付けて下さい。その辺はもしかしたら Windows 10 Mobile の方がいいかもしれません。

連続音声認識を使った音声入力のUWPサンプル

Additional resources