Distant Speech Recognition based on Position Dependent Cepstral Mean Normalization

L. Wang, N. Kitaoka, and S. Nakagawa (Japan)


distant speech recognition, speaker position estimation, position dependent CMN


In a distant environment, channel distortion may dramat ically degrade speech recognition performance. In this paper, we propose a robust speech recognition method based on position dependent Cepstral Mean Normalization (CMN). At first the system measures the transmission char acteristics according to the speaker positions from some grid points in the room a priori. In the recognition stage, the system estimates the speaker position in a 3-D space based on the time delay of arrival (TDOA) between dis tinct microphone pairs. And then the system selects the transmission characteristics estimated a priori correspond ing to the estimated position and applies a channel distor tion compensation method to the speech and recognizes it. In our proposed method, we also compensate the mis match between the cepstral means of utterances spoken by human and those emitted from loudspeaker. Our experi ments showed that the proposed method improved the per formance of speech recognition system in a distant environ ment efficiently and it could also compensate the mismatch between voices from human and loudspeaker well.

Important Links:

Go Back