Publication | Open Access
Semi-supervised conditional random fields for improved sequence segmentation and labeling
130
Citations
18
References
2006
Year
Unknown Venue
Structured PredictionEngineeringMachine LearningImproved Sequence SegmentationAutomatic Annotation ToolCorpus LinguisticsText MiningNatural Language ProcessingData SciencePattern RecognitionText SegmentationComputational LinguisticsLanguage StudiesNamed-entity RecognitionSemi-supervised LearningConditional EntropySupervised TrainingMachine TranslationKnowledge DiscoveryComputer ScienceInformation ExtractionConditional Random FieldsPo TaggingAutomatic Annotation
The authors develop a semi‑supervised training procedure for conditional random fields that combines labeled and unlabeled data to improve sequence segmentation and labeling. The method extends minimum‑entropy regularization to structured prediction, creating a training objective that mixes unlabeled conditional entropy with labeled likelihood and can iteratively improve an initial supervised model. Applying the algorithm to gene and protein mention detection in biomedical text, the authors demonstrate that incorporating unlabeled data improves supervised CRF performance.
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.
| Year | Citations | |
|---|---|---|
Page 1
Page 1