Neural Machine Translation by Jointly Learning to Align and Translate

TLDR

Neural machine translation uses encoder‑decoder networks that map a source sentence to a fixed‑length vector from which a decoder generates a translation, contrasting with traditional statistical approaches. The authors aim to overcome the fixed‑length vector bottleneck by enabling the model to automatically soft‑search relevant source segments for each target word. They extend the basic encoder‑decoder architecture so that the decoder can attend to parts of the source sentence without explicitly segmenting it, effectively performing soft alignment during decoding. This approach yields English‑to‑French translation quality comparable to the best phrase‑based systems, and the learned soft alignments align well with human intuition.

Abstract

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

References

Page 1

	Year	Citations
Bidirectional recurrent neural networks Mike Schuster, Kuldip K. Paliwal IEEE Transactions on Signal Processing Natural Language ProcessingStructured PredictionConditional Posterior ProbabilityEngineeringMachine Learning	1997	9.6K
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler arXiv (Cornell University) Incremental LearningGradient DescentEngineeringMachine LearningSpeech Recognition	2012	5.5K
Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le arXiv (Cornell University)	2014	3.5K
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Kyunghyun Cho, Bart van Merriënboer, Çaǧlar Gülçehre, arXiv (Cornell University) Natural Language ProcessingStructured PredictionSequence ModellingEngineeringMachine Learning	2014	3.3K
Statistical phrase-based translation Philipp Koehn, Franz Josef Och, Daniel Marcu EngineeringMultilingual PretrainingLarge Language ModelText MiningNatural Language Processing	2003	3.3K
Generating Sequences With Recurrent Neural Networks DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)	2013	3.1K
10.1162/153244303322533223 Applied Physics Letters	2000	1.8K
Hybrid speech recognition with Deep Bidirectional LSTM Alex Graves, Navdeep Jaitly, Abdelrahman Mohamed EngineeringMachine LearningHybrid Speech RecognitionTimit Speech DatabaseSpoken Language Processing	2013	1.8K
Statistical machine translation Choice Reviews Online Natural Language ProcessingTranslation StudiesComputer-assisted TranslationEngineeringSpeech Translation	2010	1.4K
Recurrent Continuous Translation Models Nal Kalchbrenner, Phil Blunsom Natural Language ProcessingComputer-assisted TranslationSyntaxEngineeringData Science	2013	1.3K

Page 1