Active Learning from Crowds

Abstract

Obtaining labels can be expensive or time-consuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled in-stances to be labeled by an oracle. The general problem of optimally choosing these instances is known as active learning. As it is usually set in the context of supervised learning, active learn-ing relies on a single oracle playing the role of a teacher. We focus on the multiple annotator scenario where an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for query-ing. This paradigm posits new challenges to the active learning scenario. We can now ask which data sample should be labeled next and which annotator should be queried to benefit our learn-ing model the most. In this paper, we employ a probabilistic model for learning from multiple annotators that can also learn the annotator ex-pertise even when their expertise may not be con-sistently accurate across the task domain. We then focus on providing a criterion and formu-lation that allows us to select both a sample and the annotator/s to query the labels from. 1.

References

Page 1

	Year	Citations

Page 1