Automatic Language Identification
Description
The task of LID is to detect the language a particular speech segment was spoken. This technology is used for example for routing calls in call-centers and for emergency numbers. Within AMI, the LID can be used for example to detect the language of back-channel speech in meetings and routing it to appropriate speech recognizer. Security is another main application domain for LID.
The system created by members of Speech@FIT - Pavel Matejka, Lukas Burget and Petr Schwarz - combines two approaches known as acoustic and phonotactic:
- Acoustic LID determines the language directly on the basis of features derived from the speech signal. This approach can for example well separate between French and English - in the former, the nasal cavity is more frequently open which is directly translated into speech features. For the NIST evaluation, Speech@FIT researchers improved the existing technologies by adding discriminative training of acoustic models, which is improving the separation among languages.
- In Phonotactic LID, speech is first transcribed by phoneme recognizer into strings or graphs (lattices) of phonemes. On these, "language" models are trained to capture statistics of couples and triples of phonemes. In this way, German and English can be for example separated based on different statistics of "und" and "and". Speech@FIT group pioneered the use of so called "anti-models" for this task, that are also able to improve the discrimination among target languages.
