Modeling Spontaneous Speech Variability in Professional Dictation

H. Schramm, X. Aubert, B. Bakker, C. Meyer, and H. Ney (Germany)


Automatic speech recognition, spontaneous speech, pronunciation modeling, speaking rate modeling, speakingrate compensation, filled-pause modeling


In this paper we present a technique for improved acous tic and pronunciation modeling of speech variabilities of different origin. For refined representation of the different speech variability classes the method applies class-specific acoustic and pronunciation modeling and recombines the specific models using a lexicon-based word-level model combination technique. A theoretical framework for the word-level model combination is provided that incorpo rates alternative pronunciations and acoustic models in a weighted sum of acoustic probabilities. This technique may in general be used to model various speech varieties. In a first step, however, we applied it to rate-of-speech and filled-pause related variability only. On a highly sponta neous real-life medical dictation task, we observed a 12% relative improvement of the word error rate.

