Úspěch skupiny Speech@FIT v automatickém rozpoznávání jazyka

The Speech@FIT group from Brno University of Technology (AMI partner) recorded an important success in the past weeks. In NIST-evaluations NIST-LRE-2005, their system for automatic language identification (LID) scored the second in the primary condition (30 second speech segments) and the best in two secondary conditions (10s and 3s) in tough competition of 12 academic and industrial laboratories from all over the world.

The task of LID is to detect the language a particular speech segment was spoken. This technology is used for example for routing calls in call-centers and for emergency numbers. Within AMI, the LID can be used for example to detect the language of back-channel speech in meetings and routing it to appropriate speech recognizer. Security is another main application domain for LID.

The system created by members of Speech@FIT - Pavel Matejka, Lukas Burget and Petr Schwarz - combines two approaches known as acoustic and phonotactic:

  • Acoustic LID determines the language directly on the basis of features derived from the speech signal. This approach can for example well separate between French and English - in the former, the nasal cavity is more frequently open which is directly translated into speech features. For the NIST evaluation, Speech@FIT researchers improved the existing technologies by adding discriminative training of acoustic models, which is improving the separation among languages.
  • In Phonotactic LID, speech is first transcribed by phoneme recognizer into strings or graphs (lattices) of phonemes. On these, "language" models are trained to capture statistics of couples and triples of phonemes. In this way, German and English can be for example separated based on different statistics of "und" and "and". Speech@FIT group pioneered the use of so called "anti-models" for this task, that are also able to improve the discrimination among target languages.

NIST (National Institute of Standards and Technology), which is a US Government agency, regularly organizes series of bench-mark tests of different speech technologies, such as speaker recognition, speech recognition, machine translation, etc. Before the evaluations, only the scoring methodology is known, in some cases, a "development" data-set is at the disposal of participating labs to test their technologies. The evaluation itself runs in a precise time period (usually 2 weeks). At its beginning, all participants receive the evaluation data, and have to send their results back to NIST till its end. NIST evaluates the results, all participants are then invited to a workshop where these results and details of participants' systems are discussed.

This year, Speech@FIT participated already in the NIST meeting recognition evaluation as member of AMI team coordinated by University of Sheffield. NIST-LRE-2005 was its first independent participation. We are proud of the success and wish to thank everyone who has supported Pavel, Lukas and Petr in the sleepless days of 24/10-7/11/2005.

More info:

Newspapers

MF DNES - 22.12.2005 ROVNOST - 22.12.2005

PRÁVO - 22.12.2005

Online papers

Television

Radio

  • Rádio Student 107 FM - pořad Vědník 22.12.2005, 21.00 [mp3: část 1., část 2.]
  • Český rozhlas Brno - leden 2006