POSTED BY: RENAUD LECOEUCHE, MSS Software Dev Engineer
We have been busy introducing new Application Programming Interfaces (API) in Microsoft Speech Server. I’d like to describe the process that goes into creating a new API. I can only skim the surface. Creating an API is a lengthy and complex issue. Still, I believe this will give you some insight about our work, and how you influence it.
I’ll focus on the Dialog Workflow API. This is the API I am most involved with, both from a historical point of view (having worked on it since Microsoft Speech Server 2004) and from a development point of view.
Some of the main sources of information when creating an API are the following:
User feedback and requests
Existing API (internal and external)
In this post, I’ll discuss the first one.
User feedback and requests
By far the most valuable source of information is your feedback. Let me take an example: many users commented that Microsoft Speech Server 2004 was great but that RunSpeech (the algorithm that decides the execution order of the speech controls on the page) was sometimes a problem. If you don’t do form filling RunSpeech can get in the way. I believe that the problem was not with RunSpeech per se but with the fact that it was the only way to benefit from other classes in the API such QuestionAnswer. In Microsoft Speech Server 2007, we decided to make the API more modular. Specifically, turn management (starting the synthesizer and recognizers at the right time) is handled by one set of classes: Statement and QuestionAnswer among others. Dialog flow management is handled in another layer: by workflow activities such as IfElse or with FormFillingDialog. The big improvement is that you are free to use FormFillingDialog only when it makes sense in your application. If you don’t like it, you still get the full benefit of the turn management classes. Similarly, on the storage front, using SemanticItem objects is optional if you are not in a FormFillingDialog. The feedback we got led us to an improved design.
Feedback is not only important for high level design; it is also valuable for smaller aspects of the API. There are too many examples to cite them all, so let me just give one: many of you needed to be able to change the calling party on an outbound call. We added that capability in the MakeCall activity.
I encourage you to file suggestions on the Connect web site so we can continue to improve the API. Can we add all your suggestions into the API? No, because there are other constraints that limit what we can do. But many suggestions will shape the product.
In my next post, I will speak about existing API and usability.