Publication | Closed Access
Semi-Supervised Consensus Labeling for Crowdsourcing
80
Citations
11
References
2011
Year
Unknown Venue
Artificial IntelligenceData AnnotationIndividual Crowd WorkersMachine LearningEngineeringText MiningNatural Language ProcessingComputational Social ScienceData ScienceSemi-supervised LearningAnnotation CostSemi-supervised Consensus LabelingKnowledge DiscoveryComputer ScienceCrowdsourcingCrowd ComputingAnnotation ToolAnnotation AccuracyAutomatic Annotation
Because individual crowd workers often exhibit high variance in annotation accuracy, we often ask multiple crowd workers to label each example to infer a single consensus label. While simple majority vote computes consensus by equally weighting each worker’s vote, weighted voting assigns greater weight to more accurate workers, where accuracy is estimated by inner-annotator agreement (unsupervised) and/or agreement with known expert labels (supervised). In this paper, we investigate the annotation cost vs. consensus accuracy benefit from increasing the amount of expert supervision. To maximize benefit from supervision, we propose a semi-supervised approach which infers consensus labels using both labeled and unlabeled examples. We compare our semi-supervised approach with several existing unsupervised and supervised baselines, evaluating on both synthetic data and Amazon Mechanical Turk data. Results show (a) a very modest amount of supervision can provide significant benefit, and (b) consensus accuracy from full supervision with a large amount of labeled data is matched by our semi-supervised approach with much less supervision.
| Year | Citations | |
|---|---|---|
Page 1
Page 1