BUT Speech@FIT

BUT Speech@FIT was formed in 1997 at the Faculty of Electrical Engineering and Computer science at BUT, and joined the Department of Computer Graphics and Multimedia of FIT at the creation of FIT in January 2002. The group is advised by Prof. Hermansky, managed by Dr. Jan "Honza" Cernocky, and its research director is Dr. Lukas Burget.

The main expertise of the group is in speaker and language identification, speech recognition, and keyword spotting. The best phone recognition system in the world, and continuous excellent results in NIST Language Recognition Evaluation and NIST Speaker Recognition Evaluation are among its main achievements. The group is also known for its work in feature extraction and acoustic modeling for LVCSR (posterior features, discriminative training and transforms). BUT Speech@FIT researchers are regularly invited to prestigious events, such as Johns Hopkins University summer workshops.

BUT Speech@FIT has a significant track in EC-sponsored projects, ranging from speech corpora collection (SpeechDat-E, SpeeCon), through audiovisual meeting recognition and processing (M4, AMI, AMIDA) to mobile biometric identification (MOBIO) and recognition of rare events both at the level of basic research (DIRAC) and industrial security applications (CareTaker). The group is funded by US Government (IARPA and DARPA programs), and local research agencies (Czech Ministries of Education, Trade and Commerce, Defense and Interior, Technological Agency of the Czech Republic).

BUT Speech@FIT has extensive cooperation with international and local industrial partners. It has generated two spin-offs: Phonexia Ltd. (2006), delivers speech analytics solutions to customers in commercial and security/defense sectors, and ReplayWell Ltd. (2011) commercializes BUT's lecture indexing and browsing technology.

BUT Speech@FIT is active in open-source software development, and its STK toolkit, PHNREC phone recognizer and SNet/TNet neural net training software are used in several labs worldwide. BUT is involved in the development of new generation speech toolkit - KALDI.

The group disposes of equipment for serious experiments in speech recognition: more than 500 CPUs including 3 IBM-Blade centers, all running Linux, file servers with total capacity of more than 100 TeraBytes and speech and language databases.

The group is also a known event organizer. After MLMI 2007 (4th Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms), BUT organized NIST Speaker recognition evaluation workshop and Odyssey “The Speaker and Language Recognition Workshop” (both in 2010), and regularly hosts research workshops. Honza Cernocky served as co-chair of IEEE ICASSP 2011 in Prague. BUT Speech@FIT was selected to host IEEE ASRU in 2013.

BUT Speech