Building a speech recognition agent using CMU Sphinx


alt text

Hello all! Firstly, I would like to apologize for this rather extended gap between posts. I have been busy with applications for graduate study and internships as part of the current curriculum.

Getting down to business. Today we’ll be looking at something interesting– how to build our own voice command recognition system. This will entail:

  1. Speak to your laptop to activate the agent
  2. Only certain commands will activate the agent
  3. When these commands are recognized, the agent can execute certain actions (This won’t be covered here for obvious reasons– feel free to create whatever you would like using these!)

The really good thing about what we’re going to do today is that the reliance on external libraries is going to be minimal and is not going to require long and arduous code writing. At the end of it, we’ll have a program capable of performing a seemingly complex task with supremely simple code (written in Python!).

Installation

To install pocketsphinx on Ubuntu, just run the following in terminal:

sudo apt-get install pocketsphinx

That’s all there is!

Deciding on the voice commands

We are going to have our program recognize the following commands:

  • HELLO : Example Usage: Saying HELLO should generate a random templated response from a response list
  • DATE : Example Usage: Saying DATE should give you the current system datetime
  • GOODBYE : Example Usage: Saying GOODBYE should quit the program
  • NAME : Example Usage: Saying NAME should display your name
  • SNAKE : Example Usage: Saying SNAKE should start a game– a Python program named snake.py

However, for our implementation and for the purpose of instruction, we are just going to print the command every time it is recognized. This is to signify that our implementation is working. As an exercise, one could actually extend this to meet the example usage.

Also, feel free to have your own commands here. Now for Sphinx to recognize these commands internally, we are required to create a dictionary and a language model specifying them in a particular way. For this, we will use the CMU Lexicon Tool.

Generating the Language Model & Dictionary

Doing this is very simple. Just create a words.txt file with the following data:

HELLO
DATE
GOODBYE
NAME
SNAKE

Upload this file to the CMU Lexicon Tool webpage. Download the .dic and .lm files from all the ones generated. In the current directory, place the .dic file in a directory called dic and the .lm file in a directory called lm. The 2698.lm (for me, this is the name– it will be different for you), looks like this:

Language model created by QuickLM on Tue Jan  2 12:55:32 EST 2018
Copyright (c) 1996-2010 Carnegie Mellon University and Alexander I. Rudnicky

The model is in standard ARPA format, designed by Doug Paul while he was at MITRE.

The code that was used to produce this language model is available in Open Source.
Please visit http://www.speech.cs.cmu.edu/tools/ for more information

The (fixed) discount mass is 0.5. The backoffs are computed using the ratio method.
This model based on a corpus of 5 sentences and 7 words

\data\
ngram 1=7
ngram 2=10
ngram 3=5

\1-grams:
-0.7782 </s> -0.3010
-0.7782 <s> -0.2218
-1.4771 DATE -0.2218
-1.4771 GOODBYE -0.2218
-1.4771 HELLO -0.2218
-1.4771 NAME -0.2218
-1.4771 SNAKE -0.2218

\2-grams:
-1.0000 <s> DATE 0.0000
-1.0000 <s> GOODBYE 0.0000
-1.0000 <s> HELLO 0.0000
-1.0000 <s> NAME 0.0000
-1.0000 <s> SNAKE 0.0000
-0.3010 DATE </s> -0.3010
-0.3010 GOODBYE </s> -0.3010
-0.3010 HELLO </s> -0.3010
-0.3010 NAME </s> -0.3010
-0.3010 SNAKE </s> -0.3010

\3-grams:
-0.3010 <s> DATE </s>
-0.3010 <s> GOODBYE </s>
-0.3010 <s> HELLO </s>
-0.3010 <s> NAME </s>
-0.3010 <s> SNAKE </s>

\end\

The dict file 2698.dic looks like this:

DATE	D EY T
GOODBYE	G UH D B AY
HELLO	HH AH L OW
HELLO(2)	HH EH L OW
NAME	N EY M
SNAKE	S N EY K

We also need a pre-trained English model and we will be using the en-us model provided by CMU Sphinx itself. Download all the files present here from Github and place them in the current directory in a folder en-us inside a folder called model. So in the current directory along with the dic and lm directories we now also have model/en-us/ containing all the files previously downloaded from Github.

The model/en-us directory will contain the following files:

feat.params
mdef
means
noisedict
README
sendump
transition_matrices
variances

Coding the voice assistant in Python

We first start off with our module imports:

import os
import re
from subprocess import Popen
from time import sleep

Next we will add the locations of our dictionary and language model to our program:

DIR_PATH = os.path.dirname(os.path.realpath(__file__))
DIC_PATH = DIR_PATH + "/dic/2698.dic"
LM_PATH = DIR_PATH + "/lm/2698.lm"
MODEL_PATH = DIR_PATH + "/model/en-us"
TEMP_PATH = DIR_PATH + "/temp/output.log"

The temp path is for the log directory from where we will read what the pocketsphinx is saying. So make sure that you have read/write access in this directory.

Now the way we would start this from the command line would comprise of a command like this -

pocketsphinx_continuous -inmic yes -dict ~/speech-rec/dic/2698.dic -lm ~/speech-rec/lm/2698.lm -hmm ~/speech-rec/model/en-us/ -logfn ~/speech-rec/temp/output.log -backtrace yes

Assuming the name of my directory where I am writing all this is called speech-rec. So we will use the subprocess module to execute this program in Python. Firstly, we create a string specifying this command for simplicity:

launchSphinx = "pocketsphinx_continuous " + "-inmic yes " + "-dict " + DIC_PATH + " -lm " + LM_PATH + " -hmm " + MODEL_PATH + " -logfn " + TEMP_PATH + " -backtrace yes"

Next,we need to have a function that will check for what command was recognized. If you run the pocketsphinx command in the terminal (without the logfn flag) you’ll see that in one of the [INFO] statements we see the output of the model. As an example, this is how this statement looks:

INFO: pocketsphinx.c(1180): HELLO (-7143)

To get to this we’ll use regex. You could also use grep on the log output document or just an old fashioned search on the same. This is completely up to you. My implementation for this regex based search is shown below in the form of the function check. You can create your own or reuse this.

def check(input):
    output = ''
    a='((?:[a-z][a-z]+))'
    b='.*?'
    c='((?:[a-z][a-z]+))'
    d='.*?'
    e='((?:[a-z][a-z]+))'
                                
    regex = re.compile(a+b+c+d+e,re.IGNORECASE|re.DOTALL)
    match = regex.search(input)
    if match:
        if match.group(1) == 'INFO' and match.group(2) == 'pocketsphinx':
            output=match.group(3)
        else:
            output="n/a"
    else:
        output="n/a"
    return str(output)

Next we use the Popen function of the subprocess module to run the command in another terminal. Moreover, all of this has only been tested on Ubuntu 16.04 and won’t possibly work on another OS. After this we give a small delay for the logs to start being created and so that they get some relevant data. Here goes:

proc = Popen("gnome-terminal -e '" + launchSphinx + "'", shell = True)
sleep(5)

Next we keep reading the log file, running our check function and printing the output, all inside an infinite loop. One more thing to keep in mind– we need to maintain an index checkpoint so that we don’t read previously written commands to file but always start from where we last were. This is important to implement because the logs are continuously being written to by pocketsphinx and that too, in a non-destructive manner. This checkpoint control will be ensured using the idx variable. Moreover, we will terminate the created child processes at the end of the program.

 
idx = 0 

while True:
    with open(TEMP_PATH) as f:
        for i, line in enumerate(f):
            if line.startswith("INFO: pocketsphinx.c") and (i>idx):
                cmd = check(line)
                print cmd
        idx = i
            
proc.terminate()
proc.kill()

One more thing to remember, since we have not implemented a clean-up mechanism for the output.log, you will have to currently delete it everytime you run the command. As a good clean-up mechanism we can tie the GOODBYE command to delete both the output.log file as well as exit the program. This would look like this:

 
idx = 0 
flag = 0
while True:
    with open(TEMP_PATH) as f:
        for i, line in enumerate(f):
            if line.startswith("INFO: pocketsphinx.c") and (i>idx):
                cmd = check(line)
                print cmd
                if cmd == 'GOODBYE':
                    os.system('rm -rf ' + TEMP_PATH)
                    flag = 1
                    break
        idx = i
        if flag == 1:
            break
    if flag == 1:
        break
            
proc.terminate()
proc.kill()

This does the trick!

This is the complete program that we have written so far:

 
import os
import re
from subprocess import Popen
from time import sleep

DIR_PATH = os.path.dirname(os.path.realpath(__file__))
DIC_PATH = DIR_PATH + "/dic/2698.dic"
LM_PATH = DIR_PATH + "/lm/2698.lm"
MODEL_PATH = DIR_PATH + "/model/en-us"
TEMP_PATH = DIR_PATH + "/temp/output.log"

launchSphinx = "pocketsphinx_continuous " + "-inmic yes " + "-dict " + DIC_PATH + " -lm " + LM_PATH + " -hmm " + MODEL_PATH + " -logfn " + TEMP_PATH + " -backtrace yes"

def check(input):
    output = ''
    a='((?:[a-z][a-z]+))'
    b='.*?'
    c='((?:[a-z][a-z]+))'
    d='.*?'
    e='((?:[a-z][a-z]+))'
                                
    regex = re.compile(a+b+c+d+e,re.IGNORECASE|re.DOTALL)
    match = regex.search(input)
    if match:
        if match.group(1) == 'INFO' and match.group(2) == 'pocketsphinx':
            output=match.group(3)
        else:
            output="n/a"
    else:
        output="n/a"
    return str(output)
                                                        
proc = Popen("gnome-terminal -e '" + launchSphinx + "'", shell = True)

sleep(5) 

idx = 0 
flag = 0

while True:
    with open(TEMP_PATH) as f:
        for i, line in enumerate(f):
            if line.startswith("INFO: pocketsphinx.c") and (i>idx):
                cmd = check(line)
                print cmd
                if cmd == 'GOODBYE':
                    os.system('rm -rf ' + TEMP_PATH)
                    flag = 1
                    break
        idx = i
        if flag == 1:
            break
    if flag == 1:
        break
        
            
proc.terminate()
proc.kill()
     

Hopefully you learnt something interesting and new today. Try and implement as many functions as you possibly can. Reach out to me if you get interesting results! Cheers :)