Long short-term memory recurrent neural network architectures for large scale acoustic modeling

TLDR

LSTM is a recurrent neural network designed to capture long‑range dependencies in temporal sequences better than conventional RNNs. The study investigates LSTM RNN architectures for large‑scale acoustic modeling in speech recognition and introduces their first distributed training via asynchronous stochastic gradient descent on a large cluster. They train LSTM RNNs using asynchronous stochastic gradient descent across a large machine cluster. The distributed two‑layer deep LSTM with linear recurrent projections outperforms DNNs, conventional RNNs, and a larger deep feed‑forward network, achieving state‑of‑the‑art speech recognition performance with faster convergence. Index terms include LSTM, RNN, speech recognition, and acoustic modeling.

Abstract

Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single machine. Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines. We show that a two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters. Index Terms: Long Short-Term Memory, LSTM, recurrent neural network, RNN, speech recognition, acoustic modeling.

References

Page 1

	Year	Citations

Page 1