Welcome to Speech@FIT language identification web-demo. Just upload a WAV file with speech (size up to 4MB) and see if the language is correctly identified.

The system supports the following languages: Arabic, English, Farsi, French, German, Hindu, Japanese, Korean, Mandarin, Spanish, Tamil and Vietnamese trained on CallFriend corpus and Czech, Polish and Russian trained on SpeechDat East.

The system is based on phonotactic approach - speech is first converted into string of phones (basic speech units) and a model making use of statistics of 3-phone sequences is used to determine the language. If you want to have more technical information, have a look at our Eurospeech 2005 paper and references therein.

Language identification work at Speech@FIT is supported by Ministry of Education of Czech Republic under project No. MSM0021630528 Security-Oriented Research in Information Technology, EC projects AMIDA and MOBIO, Grant Agency of Czech Republic under project No. GA102/08/0707 and Czech Ministry of Defense.

The hardware for this work was partly supported by CESNET under project No. 162/2005 "Advancing the automatic language recognition using streamed audio media".

As of 2018, we decided to stop the Language Identification web demo.

If you want to try Language Identification on your files, contact us at speech-web@fit.vutbr.cz.