Publication | Open Access
On proper unit selection in active learning
22
Citations
15
References
2009
Year
Unknown Venue
Artificial IntelligenceMultiple Instance LearningEngineeringMachine LearningText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsEntity RecognitionLanguage StudiesNamed-entity RecognitionSemi-supervised LearningMissed Cluster EffectStatisticsSupervised LearningMachine TranslationInstance-based LearningComputational Learning TheoryNlp TaskKnowledge DiscoveryStatistical Learning TheoryActive LearningProper Unit SelectionLinguistics
Active learning is an effective method for creating training sets cheaply, but it is a biased sampling process and fails to explore large regions of the instance space in many applications. This can result in a missed cluster effect, which signficantly lowers recall and slows down learning for infrequent classes. We show that missed clusters can be avoided in sequence classification tasks by using sentences as natural multi-instance units for labeling. Co-selection of other tokens within sentences provides an implicit exploratory component since we found for the task of named entity recognition on two corpora that entity classes co-occur with sufficient frequency within sentences.
| Year | Citations | |
|---|---|---|
Page 1
Page 1