Concepedia

Abstract

Word embedding has opened new and exciting avenues for understanding and processing languages. The simple yet effective word embedding models rapidly became a dominant building block for Natural Language Processing (NLP) applications as they impressively encode linguistic similarities and syntactic regularities between words. However, ignoring the morphological structure of words degrades its performance when applied to languages with complex morphology like Arabic. In this paper, we investigate enhancing Arabic word embedding by incorporating morphological annotations to the embedding model. We further tune the generated word vectors to their lemma forms using linear compositionality to generate lemma-based embedding. To assess the effectiveness of our model, we perform evaluation using Arabic analogy, sentiment and subjectivity analysis. Our results show improvements over existing state-of-the-art methods for Arabic word embedding.

References

YearCitations

Page 1