Build your first Alexa skill with Alexa.NET and Azure Functions – The certification


In the previous post we have learned how to build our first Alexa skill using a set of familiar tools and platforms: C#, Visual Studio and Azure Functions.
At the end of the post, we had a very simple skill perfectly working through the simulator or using an Echo device linked to our developer account.

However, if you would have tried to submit the skill for certification, you would have failed the validation tests, despite our skill was working just fine during the testing phase.

In this post we're going to cover these problems and understand how you can solve them.

The certification process

Skills, exactly like mobile apps, must go through a certification process before being released on the public Alexa Store. There are two steps involved in the certification of an Alexa skill:

  1. An automatic process, which checks the skill from a technical point of view. During this step, a tool will send multiple requests to the function to make sure that all the standard voice commands are processed properly.
  2. A manual process, which will make sure that the skill provides a flawless experience and it complies with the content policies.

Before submitting a skill for certification, exactly like any mobile app, you have first to fill all the Store metadata. It's a set of information (description, icon, sample commands, etc.) that will be displayed in the Alexa Store and that will help users to understand if the skill is a good fit for them.

These information must be filled in the Distribution section of the Alexa Developer Console:

All the fields are self-explanatory and, if you have experience with mobile apps submissions, you should feel yourself at home.
Once you have filled all the info, you can move to the next section called Privacy & Compliance:

This section asks multiple questions about potental sensitive content that your skill may include, like collecting personal information, allowing to make purchases, etc.
An important field is the last one, called Testing instructions. These are the instructions that the tester must follow in order to try your skill. As you can see, it's required. This means that you have to fill it even if your skill doesn't require any special setup, like an additional hardware or an account on a 3rd party service. In this case, you can ues this field to help the tester to understand which is the purpose of your skill and which commands he can try.

The last required step is Availability:

Here you can choose if you want to make the skill public or if you want to start distributing it to a select number of beta testers, identified by their Amazon account. Additionally, you can limit the distribution of a skill to a specific country.

The functional tests

Once you have completed the Distribution section, you can move to the Certification one. The first step will be the validation page, which includes a button labeled Run to start the automatic process. This process will just validate the basic stuff, like that you have properly filled all the metadata for the Store.

Then comes the tricky part 😃 The next section is called Functional test and it will perform a series of technical tests to make sure that the skill is satisfying all the technical requirements.

As already mentioned in the beginning of the post, even if our skill works without problems with the simulator, this validation will fail. In this post we're going to explore which are the biggest blockers and how to solve them.

Validating the request

The first error you'll see is about the signature of the request:

The skill end-point is not validating the signatures for incoming requests and is accepting requests with an empty signature URL.

Let's take a look at the page in the Amazon documentation about hosting a skill on our own service. This is our scenario, since we're hosting the skill using an Azure Function and not AWS Lambda.

If you go through the checklist, you will see how we are meeting all the requirements except the last one:

The service must validate that incoming requests are coming from Alexa.

Our function doesn't include any logic to verify that the request is indeed coming from Alexa. If you take the input JSON of a request and you send it to the function from another source (for example, using a tool like Postman), it will be processed without errors. Amazon doesn't allow this. He wants that every request to a skill is validated before responding to it.
The rest of the page contains all the technical details on how we can do that. Essentially, it comes to two steps:

  1. Alexa signs all the HTTPS requests sent to the endpoint. We need to make sure that the signature is valid.
  2. To avoid "replay attacks", we need to check the timestamp of the request. A request must be discarded, even if the signature is valid, in case the timestamp is older than 150 seconds.

Luckily, Alexa.NET makes this verification process easy, thanks to a dedicated API that can check both conditions for us.
Let's see how to change the code of the function we have started to build in the previous post to accomodate this requirement. Let's start by declaring a new method inside our function:

private static async Task<bool> ValidateRequest(HttpRequest request, ILogger log, SkillRequest skillRequest)
{
    request.Headers.TryGetValue("SignatureCertChainUrl", out var signatureChainUrl);
    if (string.IsNullOrWhiteSpace(signatureChainUrl))
    {
        log.LogError("Validation failed. Empty SignatureCertChainUrl header");
        return false;
    }

    Uri certUrl;
    try
    {
        certUrl = new Uri(signatureChainUrl);
    }
    catch
    {
        log.LogError($"Validation failed. SignatureChainUrl not valid: {signatureChainUrl}");
        return false;
    }

    request.Headers.TryGetValue("Signature", out var signature);
    if (string.IsNullOrWhiteSpace(signature))
    {
        log.LogError("Validation failed - Empty Signature header");
        return false;
    }

    request.Body.Position = 0;
    var body = await request.ReadAsStringAsync();
    request.Body.Position = 0;

    if (string.IsNullOrWhiteSpace(body))
    {
        log.LogError("Validation failed - the JSON is empty");
        return false;
    }

    bool isTimestampValid = RequestVerification.RequestTimestampWithinTolerance(skillRequest);
    bool valid = await RequestVerification.Verify(signature, certUrl, body);

    if (!valid || isTimestampValid)
    {
        log.LogError("Validation failed - RequestVerification failed");
        return false;
    }
    else 
    {
        return true;
    }
}

The method accepts, as input:

  • The original HttpRequest
  • The logger, so that we can log errors
  • The SkillRequest object retrieved from the body of the request

The first part of the code checks that all the information which are required to validate the signature are indeed present, based on the documentation we have read before:

  • The request must include a header called SignatureCertChainUrl, which is the URI where to retrieve the certificate used by Amazon to sign the request
  • The request must include a header called Signature, which contains the encyrpted signature
  • The body of the request (which is the JSON payload) must not be empty

If the request passes these basics checks, we can use a couple of helpers provided by Alexa.NET to verify that these information are valid. Both are exposed by the class RequestVerification.

The first one is RequestTimestampWithinTolerance(), which accepts as input the SkillRequest object. It will return a boolean that will tell you if the timestamp of the request is within the 150 seconds tolerance or not.

The second one is Verify() and it requires the three parameters we have extracted before: the two headers and the body. This method will download a X.509 certificate from the URL included in the first header, it will extract the public key and it will use it to decrypt the signature stored in the second header. Then it will match the resulting hash with the hash of the full body request. If they are equal, it means that the request is valid.

Only if both these methods return true the request can be considered valid and we can move on to process it. Per the instructions provided by Amazon, if any of these checks should fail we need to return a HTTP response with status code 400 Bad Request.

As such, this is how we must change the code of our Azure Function to satisfy this requirement:

[FunctionName("Alexa")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
ILogger log)
{   
    string json = await req.ReadAsStringAsync();
    var skillRequest = JsonConvert.DeserializeObject<SkillRequest>(json);

    bool isValid = await ValidateRequest(req, log, skillRequest);
    if (!isValid)
    {
        return new BadRequestResult();
    }

    //handle the request
}

If the ValidateRequest() method we have just defined returns false, we return a new BadRequestResult object, which is translated with a 400 Bad Request HTTP status code. Otherwise, we go on and we handle the request using the same code we have seen in the previous post.

Handling the built-in intents

Once we have fixed the security issues, the validation tool will report another series of errors, like:

The skill should respond appropriately when users say "cancel".

The skill must close when using the "exit" command without returning an error response.

The skill's help prompt does not keep the skill session open.

During the testing phase, Alexa will try to invoke the skill with a set of commands which we aren't handling right now and, as such, they will cause a failure.

To understand better the context, let's return to the Amazon Developer Console and let's take a look at the Intents section of the invocation model:

As you can see, we have a set of intents which are categorized under the Built-in Intents section and which all start with the AMAZON. prefix.
This is a set of built-in commands that every skill must handle, which help to make the user experience more consistent.

There are three intents which you are required to implement in order to pass the validation:

  • AMAZON.CancelIntent, which is invoked when the users tries to cancel the running operation.
  • AMAZON.HelpIntent, which is invoked when the user asks for help on how to use the skill.
  • AMAZON.StopIntent, which is invoked when the user wants to stop using the current skill.

The first step is to update the invocation model, by defining which are the sample utterances for each command. The approach is the same we have seen in the previous post for the custom intent we have build. Click on each of them, define one or more sample utterances, then save the model and rebuild it.

Your skill will receive these intents like regular ones. As such, the code we're going to write is the same we have seen in the previous post to handle the LastPosts intent. The only difference is that, this time, the unique identifier can't be customized, but we must use the one assigned by Alexa.

This is the updated code of our function:

[FunctionName("Alexa")]
public static async Task<IActionResult> Run(
    [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
    ILogger log)
{
    //validate and parse the request

    if (requestType == typeof(LaunchRequest))
    {
        //handle the launch request
    }

    else if (requestType == typeof(IntentRequest))
    {
        var intentRequest = skillRequest.Request as IntentRequest;

        if (intentRequest.Intent.Name == "LastPosts")
        {
            //handle the last article request
        }
        else if (intentRequest.Intent.Name == "AMAZON.CancelIntent")
        {
            response = ResponseBuilder.Tell("Cancelling.");
        }
        else if (intentRequest.Intent.Name == "AMAZON.HelpIntent")
        {
            response = ResponseBuilder.Tell("You can ask which is the latest article or when the last article was published.");

            response.Response.ShouldEndSession = false;
        }
        else if (intentRequest.Intent.Name == "AMAZON.StopIntent")
        {
            response = ResponseBuilder.Tell("Bye");
        }
    }

    return new OkObjectResult(response);
}

As you can see, other than handling the intent identified by the keyword LastPosts, we handle also three new intents: AMAZON.CancelIntent, AMAZON.HelpIntent and AMAZON.StopIntent.

How to handle these intents is up to you, based on the user experience you want to provide to the user. In my case, I simply return different responses based on the action, so that the user can understand what's happening.
The only difference is regarding the HelpIntent. In this case, the user wants to use our skill but he doesn't really know how to do it. As such, we set the ShouldEndSession property of the response to false so that he can continue to interact with the skill, maybe using one of the commands we have suggested.

Handling the end of the session

Once you have added support in your function for the built-in skills, you will still face one last error:

The skill must close when using the "exit" command without returning an error response.

Whenever the user terminates a session, Alexa will send to your function a request of type SessionEndedRequest. We need to handle it properly, in order to avoid returning error codes.
However, unlike for the built-in intents, we won't receive this information as an intent, but as a totally different request's type. As such, in our code we need to handle this scenario in the same way we handle the LaunchRequest or the IntentRequest.

This is the updated code of our function:

[FunctionName("Alexa")]
public static async Task<IActionResult> Run(
    [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
    ILogger log)
{
    //validate and parse the request

    if (requestType == typeof(LaunchRequest))
    {
        //handle the launch request
    }

    else if (requestType == typeof(IntentRequest))
    {
        //handle the various intentes
    }
    else if (requestType == typeof(SessionEndedRequest))
    {
        log.LogInformation("Session ended");
        response = ResponseBuilder.Empty();
        response.Response.ShouldEndSession = true;
    }

    return new OkObjectResult(response);
}

We have added, at the bottom, a new if condition to check if the request type is SessionEndedRequest. If that's the case, we don't have to do anything special. We just need to make sure that the session is properly ended, by setting the ShouldEndSession property to true. Alexa doesn't expect to pronounce a voice command in such scenario, so we can safely return an emtpy response using the ResponsBuilder.Empty() method.

That's it! Now if we run again the functional tests, we should get a "green" and we can move on to the final phase, which is submitting the skill for manual certification.

Handling all the potential errors

Even if we don't get specific errors during the functional test, we always have to make sure that the skill handles gracefully all the potential failures, instead of returning exceptions or error codes.

The code of our function, for example, isn't perfect. In case the user invokes the LastPosts intent, in fact, we are assuming that the download of the RSS feed of this blog will always be successful. Instead, the network can drop; or the blog could be temporarily unavailable and, as such, the RSS feed unreachable.

It's always a best practice to handle all these potential errors and make sure that Alexa returns a meaningful message to the user. For example, this is how we can improve the code of our function:

[FunctionName("Alexa")]
public static async Task<IActionResult> Run(
    [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
    ILogger log)
{
    //parse the request

    if (requestType == typeof(LaunchRequest))
    {
        //handle the launch request
    }

    else if (requestType == typeof(IntentRequest))
    {
        var intentRequest = skillRequest.Request as IntentRequest;

        if (intentRequest.Intent.Name == "LastPosts")
        {
            string rss = "https://blogs.msdn.microsoft.com/appconsult/feed/";
            string output = string.Empty;
            List<string> news = null;
            try
            {
                news = await ParseFeed(rss);
            }
            catch (Exception exc)
            {
                output = "An error has occured, please try again later";
            }

            output = $"The title of the last article is {news.FirstOrDefault()}";

            response = ResponseBuilder.Tell(output);
        }
        
        //handle the other intents
    }
    else if (requestType == typeof(SessionEndedRequest))
    {
        //handle the session ended request
    }

    return new OkObjectResult(response);
}

We encapsulate the download of the RSS feed inside a try / catch statement. In case of errors, we return a response to the user explaining him that something bad happened. Otherwise, we parse the XML feed and we return the requested response to the user.

Detecting errors during the functional tests

The trick of using ngrok to make easier to identify errors during the development of the skills works great also when we need to run the functional tests.
Functional tests, in fact, aren't directly connected to the certification process. Once your skill passes the test, it isn't automatically submitted to manual review. This means that we can execute the functional tests with the Azure Function running locally on our machine instead than on Azure.

However, we won't really have the chance to do step-by-step debugging. Alexa will send, in fact, multiple requests to the endpoint in a very short period of time. Adding debugging would block these requests, making the overall functional tests to fail. The good part is that we have still access to the Azure Function console, which will allow us to quickly see potential errors or failed requests. Remember, in fact, that everything that is logged using the integrated logger (the ILogger object which is included in the signature of the Run() method) will be displayed in the console. We can easily log, for example, exception messages and stack traces.

Additionally, this approach will make the turnaround times much shorter. If the functional tests will return an error, we can fix it and immediately run them again, without having to redeploy the whole function to Azure.

Submitting to certification

To submit the skill for certification just go to the final step, Submission, and press the Submit for review button.
However, before doing it, make sure to:

  1. Publish the last version of your function to Azure
  2. Go back to the Endpoints section of the Build tab and replace the testing endpoint (based on ngrok) with the real one assigned by Azure
  3. Run again the functional tests, to make sure that the validation completes succesfully also with the published version of the skill.

That's it! Once it has been submitted, we will get a feedback from Amazon within 5 business days. During this time frame, we aren't allowed to submit any change to the invocation model. In case of failure, we will receive a detailed report with all the identified problems.

Wrapping up

In this post we have seen how building an Alexa skill which handles the custom intents we have defined isn't enough to publish it. We need also to pass a set of functional tests enforced by Amazon, which will make sure that the skill satisfies all the security and technical requirements.

Thanks to Alexa.NET, fulfilling these requirements isn't a very trivial task, but we still need to do some fine tuning to our Azure Function before we are ready to submit the skill for the manual review. With this post we can conclude the series about the basic concepts you need to now to publish an Alexa skill based on C# and hosted on Azure. In the future we're going to see how to implement more complex scenarios, like handling multi language support.

You can find the updated sample code used in this post on GitHub.

Happy coding!

Comments (0)

Skip to main content