Publication | Open Access
Learning Deep Transformer Models for Machine Translation
612
Citations
22
References
2019
Year
Unknown Venue
Natural Language ProcessingLarge Language ModelsTransformer SystemMachine LearningEngineeringComputational LinguisticsDeep Transformer ModelLarge Language ModelPre-trained ModelsComputer ScienceLayer NormalizationLanguage StudiesDeep LearningLanguage ModelsLinguisticsMachine TranslationNeural Machine Translation
Transformer is the state‑of‑the‑art model for machine translation, and research to improve it focuses on either widening networks (Transformer‑Big) or deepening language representations, the latter facing challenges in training deep models. The study aims to advance deep Transformer models. The authors propose a deep Transformer that uses layer normalization and a novel residual combination of previous layers to improve performance. The deep model achieved 0.4‑2.4 BLEU point gains over Transformer‑Big on WMT’16 English‑German and NIST OpenMT’12 Chinese‑English, while being 1.6× smaller and 3× faster to train.
Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.
| Year | Citations | |
|---|---|---|
Page 1
Page 1