A Comparison between Spiking and Differentiable Recurrent Neural Networks on Spoken Digit Recognition

A. Graves, N. Beringer, and J. Schmidhuber (Switzerland)


Speech Recognition, LSTM, RNN, SNN, Timewarping


In this paper we demonstrate that Long Short-Term Memory (LSTM) is a differentiable recurrent neu ral net (RNN) capable of robustly categorizing time warped speech data. We measure its performance on a spoken digit identification task, where the data was spike-encoded in such a way that classifying the utter ances became a difficult challenge in non-linear time warping. We find that LSTM gives greatly superior results to an SNN found in the literature, and conclude that the architecture has a place in domains that re quire the learning of large timewarped datasets, such as automatic speech recognition.

Important Links:

Go Back