Software

Speech search

Source code of three systems for speech search is available here:

Neural Network Trainer TNet

Overview:

A tool for parallel training of neural networks for phoneme state classification.
Fast implementation is based on multithread data parallelization by POSIX threads
and/or node paralleization implemented by means of CUDA GPGPU.
The training algorithm is stochastic gradient descent.

The toolkit is capable of various NN tricks such as convolutive networks with shared weights,
recurrent networks, RBM pre-training by Contrastive Divergence and sequence-based MPE training.

Lattice Spoken Term Detection toolkit (LatticeSTD)

Toolkit for experiments with lattice based spoken term detection. It allows you to define set of terms and search them in lattices.

Precisely:
* Searches for defined sequence of links in lattice and outputs label assign to this sequence.
* Calculates confidence of found sequence (posterior probability)
* Filter overlapped detection
* Allows handle substitutions/deletion/insertion (useful for phone lattices)

KWSViewer - Interactive viewer for Keyword spotting output

Abstract

This tool can load an output of a keyword-spotting system (KWS) and reference file in HTK-MLF format and show detections in a tabular view. You can also use it to replay detections, tune and visualize scores, hits, misses and false-alarms using sliders on the right-side panel.

Installation - Windows

Download here kwsviewer_v1.5_win32.zip (6MB). No installation is needed. You can run kwsviewer.exe directly without any installation.

Joint Factor Analysis Matlab Demo

This set of Matlab functions and data by Ondrej Glembek (glembek@fit.vutbr.cz) is a simple tutorial of Joint Factor Analysis (JFA), as it was investigated at the JHU 2008 workshop http://www.clsp.jhu.edu/workshops/ws08/groups/rsrovc/.

The tutorial is based on Patrick Kenny's paper:

Kenny, P "Joint factor analysis of speaker and session variability: Theory and algorithms" - Technical report CRIM-06/08-13 Montreal, CRIM, 2005, http://www.crim.ca/perso/patrick.kenny/

especially on the simplified version of the training in:

Web-based demo for Language Identification

a www-based demonstration of our phonotactic language identification. Try it out at http://speech.fit.vutbr.cz/lid-demo/
Arabic, English, Farsi, French, German, Hindu, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese, Czech, Polish and Russian can be detected.

Software from the Speech Processing Group

HMM toolkit STK

This distribution includes SERest - a tool for embedded training of HMM's with supporting scripts. Key features of SERest include re-estimation of linear transformations (MLLT, LDA, HLDA) within the training process, and use of recognition networks for the training. More info here.

Phoneme recognizer based on long temporal context

The phoneme recognizer was developed at Brno University of Technology, Faculty of Information Technology and was successfully applied to tasks including language identification [4], indexing and search of audio records, and keyword spotting [5]. The main purpose of this distribution is research. Outputs from this phoneme recognizer can be used as a baseline for subsequent processing, as for example phonotactic language modeling.

HMM Toolkit STK

HMM Toolkit STK from Speech@FIT

Lattice Search Engine (LSE)

This package contains several tools. The main three of them are:
- indexing HTK lattices
- sorting the index
- searching in the sorted index for single words or phrases

Some of the features of these tools were not used for a long time and may contain bugs.