Hello all! Firstly, I would like to apologize for this rather extended gap between posts. I have been busy with applications for graduate study and internships as part of the current curriculum.
Getting down to business. Today we’ll be looking at something interesting– how to build our own voice command recognition system. This will entail:
- Speak to your laptop to activate the agent
- Only certain commands will activate the agent
- When these commands are recognized, the agent can execute certain actions (This won’t be covered here for obvious reasons– feel free to create whatever you would like using these!)
The really good thing about what we’re going to do today is that the reliance on external libraries is going to be minimal and is not going to require long and arduous code writing. At the end of it, we’ll have a program capable of performing a seemingly complex task with supremely simple code (written in Python!).
To install pocketsphinx on Ubuntu, just run the following in terminal:
sudo apt-get install pocketsphinx
That’s all there is!
Deciding on the voice commands
We are going to have our program recognize the following commands:
- HELLO : Example Usage: Saying HELLO should generate a random templated response from a response list
- DATE : Example Usage: Saying DATE should give you the current system datetime
- GOODBYE : Example Usage: Saying GOODBYE should quit the program
- NAME : Example Usage: Saying NAME should display your name
- SNAKE : Example Usage: Saying SNAKE should start a game– a Python program named
However, for our implementation and for the purpose of instruction, we are just going to print the command every time it is recognized. This is to signify that our implementation is working. As an exercise, one could actually extend this to meet the example usage.
Also, feel free to have your own commands here. Now for Sphinx to recognize these commands internally, we are required to create a dictionary and a language model specifying them in a particular way. For this, we will use the CMU Lexicon Tool.
Generating the Language Model & Dictionary
Doing this is very simple. Just create a
words.txt file with the following data:
HELLO DATE GOODBYE NAME SNAKE
Upload this file to the CMU Lexicon Tool webpage. Download the
.lm files from all the ones generated. In the current directory, place the
.dic file in a directory called
dic and the
.lm file in a directory called
2698.lm (for me, this is the name– it will be different for you), looks like this:
Language model created by QuickLM on Tue Jan 2 12:55:32 EST 2018 Copyright (c) 1996-2010 Carnegie Mellon University and Alexander I. Rudnicky The model is in standard ARPA format, designed by Doug Paul while he was at MITRE. The code that was used to produce this language model is available in Open Source. Please visit http://www.speech.cs.cmu.edu/tools/ for more information The (fixed) discount mass is 0.5. The backoffs are computed using the ratio method. This model based on a corpus of 5 sentences and 7 words \data\ ngram 1=7 ngram 2=10 ngram 3=5 \1-grams: -0.7782 </s> -0.3010 -0.7782 <s> -0.2218 -1.4771 DATE -0.2218 -1.4771 GOODBYE -0.2218 -1.4771 HELLO -0.2218 -1.4771 NAME -0.2218 -1.4771 SNAKE -0.2218 \2-grams: -1.0000 <s> DATE 0.0000 -1.0000 <s> GOODBYE 0.0000 -1.0000 <s> HELLO 0.0000 -1.0000 <s> NAME 0.0000 -1.0000 <s> SNAKE 0.0000 -0.3010 DATE </s> -0.3010 -0.3010 GOODBYE </s> -0.3010 -0.3010 HELLO </s> -0.3010 -0.3010 NAME </s> -0.3010 -0.3010 SNAKE </s> -0.3010 \3-grams: -0.3010 <s> DATE </s> -0.3010 <s> GOODBYE </s> -0.3010 <s> HELLO </s> -0.3010 <s> NAME </s> -0.3010 <s> SNAKE </s> \end\
The dict file
2698.dic looks like this:
DATE D EY T GOODBYE G UH D B AY HELLO HH AH L OW HELLO(2) HH EH L OW NAME N EY M SNAKE S N EY K
We also need a pre-trained English model and we will be using the
en-us model provided by CMU Sphinx itself. Download all the files present here from Github and place them in the current directory in a folder
en-us inside a folder called
model. So in the current directory along with the
lm directories we now also have
model/en-us/ containing all the files previously downloaded from Github.
model/en-us directory will contain the following files:
feat.params mdef means noisedict README sendump transition_matrices variances
Coding the voice assistant in Python
We first start off with our module imports:
Next we will add the locations of our dictionary and language model to our program:
The temp path is for the log directory from where we will read what the pocketsphinx is saying. So make sure that you have read/write access in this directory.
Now the way we would start this from the command line would comprise of a command like this -
pocketsphinx_continuous -inmic yes -dict ~/speech-rec/dic/2698.dic -lm ~/speech-rec/lm/2698.lm -hmm ~/speech-rec/model/en-us/ -logfn ~/speech-rec/temp/output.log -backtrace yes
Assuming the name of my directory where I am writing all this is called
So we will use the subprocess module to execute this program in Python. Firstly, we create a string specifying this command for simplicity:
Next,we need to have a function that will check for what command was recognized. If you run the pocketsphinx command in the terminal (without the
logfn flag) you’ll see that in one of the [INFO] statements we see the output of the model. As an example, this is how this statement looks:
INFO: pocketsphinx.c(1180): HELLO (-7143)
To get to this we’ll use regex. You could also use grep on the log output document or just an old fashioned search on the same. This is completely up to you. My implementation for this regex based search is shown below in the form of the function
check. You can create your own or reuse this.
Next we use the
Popen function of the
subprocess module to run the command in another terminal. Moreover, all of this has only been tested on Ubuntu 16.04 and won’t possibly work on another OS. After this we give a small delay for the logs to start being created and so that they get some relevant data. Here goes:
Next we keep reading the log file, running our
check function and printing the output, all inside an infinite loop. One more thing to keep in mind– we need to maintain an index checkpoint so that we don’t read previously written commands to file but always start from where we last were. This is important to implement because the logs are continuously being written to by pocketsphinx and that too, in a non-destructive manner. This checkpoint control will be ensured using the
idx variable. Moreover, we will terminate the created child processes at the end of the program.
One more thing to remember, since we have not implemented a clean-up mechanism for the
output.log, you will have to currently delete it everytime you run the command. As a good clean-up mechanism we can tie the GOODBYE command to delete both the
output.log file as well as exit the program. This would look like this:
This does the trick!
This is the complete program that we have written so far:
Hopefully you learnt something interesting and new today. Try and implement as many functions as you possibly can. Reach out to me if you get interesting results! Cheers :)