Analog Auditory Perception Model for Robust Speaker Recognition

Y. Deng and R. Xu (USA)


Auditory model, Robust speech feature extraction, Speaker recognition, Analog VLSI, Speaker recognition hardware.


An auditory perception model for noise-robust speech feature extraction is presented to abstract the effective signal processing in human ear. The auditory effect taking into account including: insensitivity to low frequency signal, Mel-scale and multi-scale frequency resolution, static nonlinear compression, and adaptive compression. Unlike the widely used discrete digital signal processing methods, the model assumes continuous-time filtering and rectification, amenable to real-time, low-power analog VLSI implementation. A custom chip in 0.5um CMOS technology implements the general form of the model with digitally programmable filter parameters and consumes power of 9mW. Experiments on the YAHO speaker identification database demonstrate consistent robustness of the new features to noise of various statistics, yielding significant improvements in text independent speaker recognition accuracy over models identically trained using Mel-scale Frequency Cepstral Coefficient (MFCC) features.

Important Links:

Go Back