Concepedia

Publication | Closed Access

Generating Text with Recurrent Neural Networks

1.2K

Citations

21

References

2011

Year

TLDR

Recurrent Neural Networks are powerful sequence models, but training them has been difficult until recent Hessian‑free optimization advances made them practical for challenging sequence problems. This paper demonstrates the effectiveness of RNNs trained with the Hessian‑free optimizer on character‑level language modeling tasks. The authors introduce a multiplicative (gated) RNN variant that lets each input character determine the transition matrix between hidden states, improving suitability for character modeling. Training this multiplicative RNN with the Hessian‑free optimizer for five days on eight GPUs outperformed the best prior single method for character‑level language modeling—a hierarchical non‑parametric sequence model—and represents the largest recurrent neural network application to date.

Abstract

Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or gated) connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for character-level language modeling – a hierarchical non-parametric sequence model. To our knowledge this represents the largest recurrent neural network application to date.

References

YearCitations

Page 1