Multi-Source Cross-Lingual Model Transfer: Learning What to Share

TLDR

Modern NLP has benefited from neural networks, but the lack of annotated data limits their use in many languages; cross‑lingual transfer learning can build models for low‑resource languages by leveraging data from other languages. The study aims to improve multilingual transfer by leveraging multiple source languages to boost target language performance. The model uses adversarial networks to learn language‑invariant features and mixture‑of‑experts to exploit target‑source similarity, and can operate zero‑resource with unsupervised multilingual embeddings. The approach learns what to share across languages and achieves significant performance gains over prior art on multiple text classification and sequence tagging tasks, including a large‑scale industry dataset.

Abstract

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

References

Page 1

	Year	Citations

Page 1