Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information

Abstract

This work explores extending attentionbased neural models to include alignment information as input. We modify the attention component to have dependence on the current source position. The attention model is then used as a lexical model together with an additional alignment model to generate translation. The attention model is trained using external alignment information, and it is applied in decoding by performing beam search over the lexical and alignment hypotheses. The alignment model is used to score these alignment candidates. We demonstrate that the attention layer is capable of using the alignment information to improve over the baseline attention model that uses no such alignments. Our experiments are performed on two tasks: WMT 2016 EnglishRomanian and WMT 2017 GermanEnglish.

References

Page 1

	Year	Citations

Page 1