Publication | Open Access
WCC-JC: A Web-Crawled Corpus for Japanese-Chinese Neural Machine Translation
15
Citations
15
References
2022
Year
EngineeringCross-lingual RepresentationMultilingualismSubtitle DataCorpus LinguisticsText MiningNatural Language ProcessingLanguage DocumentationComputational LinguisticsCorpus AnalysisLanguage StudiesMachine TranslationComputer-assisted TranslationEast Asian LanguagesNeural Machine TranslationWeb-crawled CorpusJapanese-chinese Bilingual CorporaSpeech TranslationLinguistics
Currently, there are only a limited number of Japanese-Chinese bilingual corpora of a sufficient amount that can be used as training data for neural machine translation (NMT). In particular, there are few corpora that include spoken language such as daily conversation. In this research, we attempt to construct a Japanese-Chinese bilingual corpus of a certain scale by crawling the subtitle data of movies and TV series from the websites. We calculated the BLEU scores of the constructed WCC-JC (Web Crawled Corpus—Japanese and Chinese) and the other compared corpora. We also manually evaluated the translation results using the translation model trained on the WCC-JC to confirm the quality and effectiveness.
| Year | Citations | |
|---|---|---|
Page 1
Page 1