Publication | Open Access
Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM
12
Citations
0
References
2023
Year
EngineeringWrong LanguageCross-lingual RepresentationMultilingualismMultilingual PretrainingLarge Language ModelLanguage ProcessingLow-resource Language ProcessingNlp CommunityApplied LinguisticsNatural Language ProcessingLarge Language ModelsComputational LinguisticsCorpus AnalysisLanguage StudiesMachine Translation PerformanceMachine TranslationComputer-assisted TranslationLinguisticsNeural Machine TranslationLanguage LocalisationCross-lingual Natural Language ProcessingSpeech TranslationTranslation Performance
The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.