Everyone familiar with Siri, Google Now, Cortana, S-Voice, and/or Echo is familiar with the progress and improvement in speech recognition over the past decade. Much of this improvement comes from cloud-based recognizers deploying “deep learning” on big data.
Although it’s often out of the spotlight, there’s been lots of progress in speech recognition for embedded systems. In fact, most of the major speech engines deploy a combination of embedded plus cloud-based recognition. This is most noticeable in commands like “Hey Siri,” “OK Google,” “Hey Cortana,” “Hi Galaxy,” and “Alexa.” All of these cloud-based recognition systems use embedded “trigger” phrases to open the cloud connection to ready itself for the speech recognition.
The goal is to develop an embedded Speech Recognition system that can recognize a large dictionary of words with low latency and footprint.