Word Representations: A Simple and General Method for Semi-Supervised Learning

TLDR

Existing supervised NLP systems can be improved by adding unsupervised word representations as extra features. This study evaluates the impact of Brown clusters, Collobert & Weston embeddings, and HLBL embeddings on named entity recognition and chunking. The authors apply these word representations to near state‑of‑the‑art supervised baselines for NER and chunking. All three representations improve baseline accuracy, further gains are achieved by combining them, and the resulting word features and code are available at http://metaoptimize.com/projects/wordreprs/.

Abstract

If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/

References

Page 1

	Year	Citations

Page 1