Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

TLDR

Recent advances in machine translation enable building NLP applications in low‑resource settings, framing cross‑lingual sentiment classification as a domain‑adaptation problem with prior positive results. The authors aim to provide general insights into cross‑lingual adaptation challenges. They propose translating labeled data from a resource‑rich language and training a classifier on the translated text, and describe experiments that led to their conclusions. They find that domain mismatch persists even with perfect MT and that cross‑lingual adaptation is qualitatively different from monolingual adaptation, indicating a need for new algorithms.

Abstract

Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about cross-lingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefully-designed experiments that led us to these conclusions.

References

Page 1

	Year	Citations

Page 1