Publication | Open Access
Beyond English-Centric Multilingual Machine Translation
467
Citations
61
References
2020
Year
Llm Fine-tuningEngineeringMachine LearningLarge Language ModelCorpus LinguisticsText MiningNatural Language ProcessingData ScienceComputational LinguisticsLanguage StudiesSingle ModelMachine TranslationComputer-assisted TranslationDense ScalingComputer ScienceDeep LearningMultilingual Machine TranslationNeural Machine TranslationSpeech TranslationLinguistics
Massively multilingual machine translation has shown promise, but existing models are largely English‑centric, relying on data translated to or from English and failing to meet global translation needs. The authors aim to build and release a true many‑to‑many multilingual translation model covering 100 languages and provide open‑source scripts for reproducibility. They constructed a large, mined dataset of thousands of language directions and trained a model using dense scaling plus language‑specific sparse parameters to boost capacity. The resulting model achieves over 10 BLEU improvement on non‑English language pairs and matches the performance of top single‑system WMT models.
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.
| Year | Citations | |
|---|---|---|
Page 1
Page 1