The system, developed at Brno University of Technology,
Faculty of Information Technology, is designated mainly for seeking for keywords
in long speech records. After a list of speech audio files and a list of
keywords that are to be detected is specified, the system automaticly selects
parts of speech where keywords are pronunced.
The supported audio formats are:
- Microsoft® waveform 8kHz
- A-law 8kHz
- Raw 8kHz, 8 or 16 bits
Figure 1. The list of files and the list of keywords
Figure 2. The selected speech parts where keywords were detected
Solution description:
- Keywords are modeled with triphone Hidden Markov Models
- Czech SpeechDat-E speech database is used for model training
- A Modified Viterbi algorithm [1] is used for keyword detection
- An optimal thresholds for each keyword is precalculated
Treshold estimation:
Each threshold is linearly depended on states contained in the keyword.
The contribution of each state was obtained during a global criterion (false alarms and false acceptation) minimization and a linear equation system
solution.
Download:
An public evaluation version is available at
kws.zip
. In comparison to full version, it has the following limitations:
- Only three audio records can be loaded.
- Maximal length of one record is one minute.
References:
[1]
|
J. Junkawitch, L. Neubauer, H. Hoge, and G, Ruske.
A new keyword spotting algorithm with pre-calculated optimal thresholds
|