This Wednesday, January 23rd, the stay of Shakti P. Rath (Ph.D. IIT Madras) at BUT Speech@FIT came to end. Shakti was with the group 2 years, from January 2011, supported by the SoMoPro program co-funded by the EC (Marie Curie actions) and the South-Moravian region.
Shakti came with a sound knowledge of normalization and adaptation techniques for automatic speech recognition (ASR), especially the vocal tract length normalization. In Brno, he worked on four important ASR topics:
Regional-feature maximum likelihood linear transforms (R-FMLLR), allowing to compensate inter-speaker variabilities with the help of class-specific feature-space transformation.
Speaker adaptation of Deep Neural Networks DNN Adaptation – this work was done in cooperation with Daniel Povey from Johns Hopkins University (Maryland, USA) mainly during Shakti’s visit to JHU’s Center for Language and Speech Processing.
Factorized FMLLR (QR-FMLLR) that allows to modify the number of adaptation parameters depending on the size of available training data.
and he started works on Regional vocal tract length normalization (R-VTLN) where similar approaches as in R-FMLLR are applied to train multiple-parameter VTLN by considering acoustic-class specific frequency warping.
Shakti’s work significantly advances the field of acoustic modeling and adaptations for speech recognition systems. During his stay in Brno, Shakti published his results at the prestigious Interspeech conference in Portland (Oregon, USA) and several other publications are in the pipeline.