KEYWORD SPOTTING SYSTEM

The system, developed at Brno University of Technology, Faculty of Information Technology, is designated mainly for seeking for keywords in long speech records. After a list of speech audio files and a list of keywords that are to be detected is specified, the system automaticly selects parts of speech where keywords are pronunced.
The supported audio formats are:


Figure 1. The list of files and the list of keywords


Figure 2. The selected speech parts where keywords were detected

Solution description:

  • Keywords are modeled with triphone Hidden Markov Models
  • Czech SpeechDat-E speech database is used for model training
  • A Modified Viterbi algorithm [1] is used for keyword detection
  • An optimal thresholds for each keyword is precalculated
Treshold estimation:
 Each threshold is linearly depended on states contained in the keyword. The contribution of each state was obtained during a global criterion (false alarms and false acceptation) minimization and a linear equation system solution.

Download:

An public evaluation version is available at kws.zip . In comparison to full version, it has the following limitations:
  • Only three audio records can be loaded.
  • Maximal length of one record is one minute.

References:

[1]
J. Junkawitch, L. Neubauer, H. Hoge, and G, Ruske. A new keyword spotting algorithm with pre-calculated optimal thresholds