Developer to Developer: Getting Started with Voice Interfaces and PC Skills with Alexa, Part 1
Updated July 15, 2019
Now, unless you’re been living under a rock for the last few years, it’s hard to ignore the fact that Amazon’s Alexa Service is the defacto standard for IoT in the smarthome. By the way, did you know that as of Q2 2019, Amazon’s Alexa voice service works with over 60k different smart home devices?
Figure 1. Here’s 4 of the Over 60,000 Different Devices that are Compatible with Amazon’s Alexa Voice Service. Image Credit: Amazon.com
That’s an absolutely staggering amount of different consumer electronic devices that all respond to the 3-syallable phrase, “A·LEX·A”. Now, of course one of the main reasons why Amazon has market dominance is the fact that Amazon allows developers like you and me to enhance the capabilities of the Alexa service with “apps” (Amazon calls them “skills”) that you can create yourself. So, the purpose of this article is to show developers with ZERO experience how to get started creating a skill for the Alexa service. We’re going to cover some of the basics of Natural Language Processing such as “utterances” and “intents”. This article also covers how to use Lambda functions inside AWS (Amazon Web Services) with a sprinkling of Javascript (via Node JS) to make the core functions of your skill. In this article, I’m going to assume that you’ve never created an Alexa Skill before, so if you’ve already created a skill, then most of this material will seem redundant.
First Things First: The Terminology Used for Creating an Alexa Skill
When creating a Skill for Alexa, there will be a lot of terms thrown at you that you may (or may not) understand. So here’s some of the most important ones:
- Alexa – Ok, this is an obvious one. This is Amazon’s voice service. It receives the vocal speech input from a user as an “utterance”, and converts the utterance into “intents” and “slots”. More about those later on.
- Alexa Skill – A Skill is a term to refer to a 3rd party app/service that conforms to a specific API that allows Alexa to understand and respond to new things. Please note that the capabilities of a skill are restricted to the languages that Alexa can understand. So what does that mean? It means that you can create an Alexa Skill to make Alexa to understand how to order pizza for you in English, but it can’t be used to allow Alexa to order a pizza for you in Ancient Hebrew because Alexa doesn’t understand that language.
- ARN – An ARN (Amazon Resource Name) is basically a URL for a service. The major difference between a regular URL and an ARN is that a URL is the address for any public webserver in the world. An ARN is an address to any service hosted by Amazon’s infrastructure. For every ARN that I’ve seen so far, they will start with “arn:aws:”. So the ARN for an AWS Lambda service will start with "arn:aws:lambda:" and the ARN for an AWS Cognito service will start with "arn:aws:cognito".
- AWS – This is the Amazon Web Services acronym. Although it’s pretty obvious for most developers, we needed to cover this one since the process necessary to build a PC Skill for Alexa uses several of the services within AWS.
- Cognito – Cognito is one of the services that are a part of AWS, and it gives you the ability to create "user pools" and "identity pools". User pools are user directories that provide sign-up and sign-in options for your app users. Identity pools provide AWS credentials to grant your users access to other AWS services.
- Intent – In NLP (Natural Language Processing) the intent is what an NLP system has determined what the user wants based upon their utterance (the spoken words). Multiple utterances can all have the same intent. For example, consider the phrases, “Book me a flight to Tokyo”, “I want to travel to Tokyo”, and “Can you find me a flight to Tokyo sometime in March?” All of those statements have the same intent, which is that the user wants to travel to Tokyo.
- Lambda – Lambda is another one of services within AWS. Lambda is Amazon’s service for "serverless computing" with the flexibility to pick (almost) any programming language that you like. At first glance, the term* serverless computing* sounds ironic because (trust me) there’s a definitely a server involved. However, when you’re using Lambda, you’re not paying for all the resources (harddrive, CPU, RAM, OS, etc.) necessary to get your Lambda service working. With Lambda, all you need to do is write your code in the language that you want, and it can become a cloud-hosted service. Lambda currently supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby. One of the best benefits of all is the fact that Amazon makes Lambda very attractive for developers because you get 1M free requests per month. This is perfect for IoT and the Smart Home because in such cases, you really need a simple function hosted in the cloud to perform a specific task for you.
- Slot – In NLP (Natural Language Processing) a slot is essentially a variable that an NLP system has determined after the intent is derived from the utterance. A single utterrance can have one or more slots. For example, consider the phrases, “Can you find me a flight to Tokyo sometime in March?” If the intent is to make a travel reservation, then the slots are Tokyo (the destination) and March (the timeframe).
Next Step: Complete the 6-Step "Fact Skill" Tutorial
Ok, the Amazon Alexa Developer Team created a great 6-step tutorial on how to create a basic Alexa Skill, with the equivalency of "Helloworld". The "Fact Skill" allows you to create a basic Alexa Skill that guides you through the following steps:
- Setting Up Your Alexa Skill in the Alexa Developer Portal
- Setting Up a Lambda Function in AWS
- Connecting Your Voice User Interface To Your Lambda Function
- Testing Your Lambda Function and Alexa Skill
- (Optional) Customizing Your Alexa Skill
- (Optional) Getting Your Skill Certified and Published
Figure 2. This is a basic sequence diagram showing the process and the actors involved in invoking the "Fact Skill". My Skill in this example is called "Bruce Space Facts" which gives back a random fact about space when spoken to by the user.
Now, since you've learned all the vocabulary necessary to create a "Fact Skill", be sure to go to the link above and complete the 6-step tutorial. It will definitely help you to understand all the components necessary in order to make the simplest Alexa Skill of all (a skill that gives a random fact). Also, it's good to know that you can create an Alexa Skill without owning an Alexa compatible device! Amazon provides everything necessary to test your Alexa Skill directly from the browser of your PC. All this is covered in the tutorial, so complete the tutorial, and come back later for the next step in this series!