This post is an introduction to Amazon Alexa. This is a blog post format of my original presentation on Google Slides. This post and Google Slide presentation are for educational purposes.
Let’s get started!
What is Alexa?
What is a VUI?
VUI stands for Voice User Interface meaning that a user can interact with the software using their voice. It is similar to a GUI or Graphical User Interface in the way that a VUI and GUI are both front ends. However, a VUI is not in competition with a GUI. A VUI is not meant to replace a screen. A VUI is just another option in which a user can interact with software.
For Alexa, the software takes intents and utterances. Intents are functions or actions that a user is available to invoke in an Alexa skill through an utterance. Since there are many different ways to express the same intent (or request), utterances are a set of likely spoken phrases for one intent. (In an upcoming blog, there will be code to illustrate this).
With VUI, there are different ways of expressing emotion. SSML is Speech Synthesis Markup Language. SSML gives a developer control on how Alexa should sound. SSML can make Alexa whisper, pause, or recite a number as a set of digits. Another way of expressing emotion is to use Speechcons. Speechcons are like emoticons. Instead of a smiley face emoticon, a developer can use a speechcon to make Alexa say, “Hurrah!”
How Do You Use Alexa? (Utterance Syntax)
An utterance is what a user would say to interact with Alexa. The utterance maps to an intent, which invokes a function on an Alexa skill. What is an example of a spoken utterance?
“Alexa, start Space Facts.”
There are 3 parts to an utterance: 1. the wake word, 2. the phrase, and 3. the skill name. Here is a break down of an utterance:
- Wake word
- Skill name/function invocation
- Space Facts
The typical wake word is “Alexa”, and it is highly recommended to keep it as Alexa unless someone the user lives with has the name Alexa. If someone in the user’s household has the name Alexa, the user can change the wake word to “Amazon” or “Computer”. So the utterance would be “Amazon, start Space Facts”.
The phrase is the verb or word that causes an action. For example, “Alexa, start Space Facts” should start an intent that will invoke the Space Facts skill. Or “Alexa, stop” will stop the current action. “Alexa, help” will give some information about the skill.
This phrase or verb is where the developer will be concentrating making the different utterances for the same intent. So the utterances using “start”, “open”, “begin” will typically map the same intent. A developer will want to cover the many ways a user would say this command.
Lastly, there is the skill name or function invocation. An utterance using a skill name could be “Alexa, stop Space Facts”. Depending on the skill, there can be different functions a user can invoke that the skill will listen for.
Alexa Skill Request Lifecycle
Once a user says an utterance, what happens?
Once a user says an utterance such as, “Alexa, start Space Facts”, the Amazon Echo device picks that up and interfaces with the Alexa Skill. The skill sends the utterance to the cloud (AWS) where it does the fancy footwork. The cloud is the skill’s back end and where the code/functionality is located. The code maps the utterance to an intent, which runs code that will return a corresponding response via JSON to the Alexa Skill. The Alexa skill will then output the response to the user via the Amazon Echo device. In this example, if the user said, “Alexa, start Space Facts”, Alexa’s response will be probably be a space fact such as “A year on Mercury is just 88 days long”.
So that’s it on my introduction to Amazon Alexa. I will be writing an upcoming blog post on how to make a simple Alexa skill, and I will link it here. I will also update and adjust this blog post with some graphics. Please stay tuned!