Member-only story
Building a custom wake word detector and a voice command engine
Inspired by the existing virtual assistants like Alexa, Google Home or Echo I wanted to create a similar application that was easy customizable and extensible. Of course I know that there is a lot of work and research behind the aforementioned products, but a more basic virtual assistant type application is still doable.
From a high level perspective there are several requirements:
- It must accept a custom make word (the wake word is the word or phrase that signals the assistant that we are going to give it a command, for example: ‘hey Alexa’)
- Being able to order different types of workflows with voice commands
- It shouldn’t need many resources for running the whole application
For the first part we will have to create and train a model that can take as input any fixed length sound representation (that can be raw timeseries or output of any spectral based methods) and determine if it is the wake word we want. Another approach here would be to use an encoder to extract a latent representation (or embedding) of the recording and compare it using a similarity metric to a precomputed value that we hold as reference for the wake word. The latter method would need a bigger model and a larger dataset to develop and because of this we are going…