A high-performance semi-supervised learning method for text chunking

TLDR

Semi‑supervised learning’s potential to improve classifier accuracy remains uncertain, especially for NLP tasks where prior methods have shown mixed results. This study introduces a novel semi‑supervised structural learning approach for text chunking. The method learns from thousands of automatically generated auxiliary classification problems on unlabeled data to uncover a shared predictive structure that can be leveraged to improve the target task. It achieves state‑of‑the‑art results on CoNLL'00 syntactic chunking and CoNLL'03 named entity chunking in English and German, surpassing prior best performances.

Abstract

In machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. Although a number of semi-supervised methods have been proposed, their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. The idea is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data. By doing so, the common predictive structure shared by the multiple classification problems can be discovered, which can then be used to improve performance on the target problem. The method produces performance higher than the previous best results on CoNLL'00 syntactic chunking and CoNLL'03 named entity chunking (English and German).

References

Page 1

	Year	Citations

Page 1