The system supports the following languages: Arabic, English, Farsi, French, German, Hindu, Japanese, Korean, Mandarin, Spanish, Tamil and Vietnamese trained on CallFriend corpus and Czech, Polish and Russian trained on SpeechDat East.

The system is based on phonotactic approach - speech is first converted into string of phones (basic speech units) and a model making use of statistics of 3-phone sequences is used to determine the language. If you want to have more technical information, have a look at our Eurospeech 2005 paper and references therein.

Language identification work at Speech@FIT is supported by Ministry of Education of Czech Republic under project No. MSM0021630528 Security-Oriented Research in Information Technology, EC projects AMIDA and MOBIO, Grant Agency of Czech Republic under project No. GA102/08/0707 and Czech Ministry of Defense.

The hardware for this work was partly supported by CESNET under project No. 162/2005 "Advancing the automatic language recognition using streamed audio media".

