Publication | Open Access
Fully Character-Level Neural Machine Translation without Explicit Segmentation
413
Citations
18
References
2017
Year
EngineeringMachine LearningCross-lingual RepresentationLarge Language ModelLanguage ProcessingNatural Language ProcessingLarge Language ModelsComputational LinguisticsWord Segmentation (Natural Language Processing)Language StudiesMachine TranslationComputer-assisted TranslationExplicit SegmentationPre-trained ModelsSource Character SequenceComputer ScienceDeep LearningNeural Machine TranslationSpeech TranslationLinguistics
Most machine translation systems rely on explicit word segmentation to extract tokens. The study introduces a character‑level NMT model that translates source to target character sequences without segmentation. The model employs a character‑level convolutional encoder with max‑pooling to reduce sequence length, enabling training speed comparable to subword models, and can share a single encoder across multiple languages in a many‑to‑one setting. The character‑to‑character model outperforms a subword baseline on WMT’15 DE‑EN and CS‑EN, matches performance on FI‑EN and RU‑EN, and in a multilingual setting surpasses subword encoders on all pairs, even exceeding monolingual models on CS‑EN, FI‑EN, and RU‑EN in BLEU and human judgment.
Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of the BLEU score and human judgment.
| Year | Citations | |
|---|---|---|
Page 1
Page 1