Publication | Open Access
Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment
41
Citations
37
References
2020
Year
Unknown Venue
Shared TaskEngineeringCross-lingual RepresentationCorpus LinguisticsLanguage ProcessingText MiningNatural Language ProcessingApplied LinguisticsInformation RetrievalData ScienceComputational LinguisticsWmt Shared TaskCorpus AnalysisLanguage StudiesMachine TranslationWmt 2020Nlp TaskLinguisticsSentence AlignmentNeural Machine TranslationSpeech ProcessingText ProcessingSpeech TranslationParallel Corpus Filtering
Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems. This year, the task tackled the low resource condition of Pashto–English and Khmer–English and also included the challenge of sentence alignment from document pairs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1