On the Relation between Statistical Properties of Spectrographic Masks and Recognition Accuracy

J.F. Gemmeke, B. Cranen, and L. ten Bosch (The Netherlands)


Speech Processing, Time-Frequency Signal Analysis, Ro bustness, Missing Data Techniques


Missing Data Techniques (MDT) can significantly improve the accuracy of automatic speech recognition (ASR) for speech corrupted by background noise. The increase in recognition accuracy obtained using MDT is largely depen dent on the estimation of spectrographic masks used to dis tinguish speech from noise. We present an analysis tech nique which enables us to compare two mask estimation techniques. By contrasting a sound-class independent and a sound-class dependent distance measure, we show that we can directly relate differences between masks to their difference in recognition accuracy using the sound-class dependent distance measure. Experiments on AURORA 2 using an oracle mask and an estimated mask show that modifying the estimated mask in order to reduce the statis tical differences with the oracle mask leads to an increase in word recognition accuracy.

Important Links:

Go Back