Monday, August 15, 2016

Polly Want a Cracker? A Simple Alexa Skill that Echoes Your Words


Doesn't it bug you that your skill service doesn't receive the understood text that was spoken.  When you don't launch a custom skill, the card that is included in your Alexa app displays the understood spoken text.

Why doesn't Amazon pass spoken text to a custom skill?  I think this keeps Alexa more secure.  If an Alexa owner added a skill that was easy to accidentally launch, then Alexa would be sending that service spoken dialog at times when the owner did not intend to send data.  Also, in a loud room, it is possible background dialog could be sent to your skill.  If someone launches the skill in the same room where someone else is on the phone reciting their credit card number, that is data someone would not want sent to a random skill developer.

However, if you had a need to receive the understood spoken text in your skill, it is possible.  You can use a built-in slot type that only exists for compatibility with earlier ASK skills, AMAZON.LITERAL.  Here is an example of how that would work below:

Intent Schema
{
  "intents": [
     {
      "intent": "Repeater",
      "slots": [
        {
          "name": "A",
          "type": "AMAZON.LITERAL"
        },
        {
          "name": "B",
          "type": "AMAZON.LITERAL"
        },

        --- CLIPCLIPCLIP ---
        --- And so on... ---
        --- CLIPCLIPCLIP ---

        {
          "name": "Y",
          "type": "AMAZON.LITERAL"
        },
        {
          "name": "Z",
          "type": "AMAZON.LITERAL"
        }
      ]
    }
  ]
}


Sample Utterances
Repeater {a|A} {a|B} {a|C} {a|D} {a|E} {a|F} {a|G} {a|H} {a|I} {a|J} {a|K} {a|L} {a|M} {a|N} {a|O} {a|P} {a|Q} {a|R} {a|S} {a|T} {a|U} {a|V} {a|W} {a|X} {a|Y} {a|Z}


I only have one intent that I use to match a phrase.  You'll need to add the missing slots that I clipped.  My sample utterance matches the intent with all AMAZON.LITERALs.  The one thing I noticed while testing this, is if I didn't include enough slots for the spoken words, the AVS wouldn't match my intent.  So, I created a bunch of slots, to catch a decently long phrase.

Then I use some javascript math to loop over all the slots and rebuild the phrase that was heard by Alexa.

exports.handler = function (event, context) {
    console.log(JSON.stringify(event.request));

    if(event.request.type === "LaunchRequest")
        context.succeed(buildResponse("Say something", {}, false));
    else if(event.request.type === "IntentRequest") {
        var output = "";
        for(var i=0; i <= "Z".charCodeAt(0)-"A".charCodeAt(0); i++ ) {
            var slot = event.request.intent.slots[String.fromCharCode("A".charCodeAt(0) + i)];
            if(slot && slot.value)
                output += " " + slot.value;
        }
        context.succeed(buildResponse(output, {}, false));
    }
    else
        context.succeed(buildResponse("", {}, true));
};

function buildResponse(output, attributes, shouldEndSession) {
    return {
        version: "1.0",
        sessionAttributes: attributes,
        response: {
            outputSpeech: {
                type: "PlainText",
                text: output
            },
            reprompt: {
                outputSpeech: {
                    type: "PlainText",
                    text: output
                }
            },
            shouldEndSession: shouldEndSession
        }
    };
}

That should do it for you.  You will now get the complete spoken phrase in your response.

How might this be useful?  It actually won't be that useful generally, since you'd need to do your own language understanding AI to match the spoken phrase in your skill logic.  And generally, if you have other intents in your schema, AVS tries really hard to match the spoken dialog with one of those before it would use this one.

1 comment:

  1. It's fun to irritate guru programmers by saying Jason, rather than J-SAUN

    ReplyDelete