Concepedia

Publication | Open Access

Generalizing Word Embeddings using Bag of Subwords

49

Citations

16

References

2018

Year

Abstract

We approach the problem of generalizing pretrained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subwordlevel word vector generation model that views words as bags of character n-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves stateof-the-art performances in English word similarity task and in joint prediction of part-ofspeech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.

References

YearCitations

Page 1