Publication | Closed Access
Vox Populi: Collecting High-Quality Labels from a Crowd
147
Citations
10
References
2009
Year
Unknown Venue
With the emergence of search engines and crowd-sourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a fixed distribution. In many cases, the number of teachers actually scales with the number of exam-ples, with each teacher providing just a handful of labels, precluding any statistically reliable assess-ment of an individual teacher’s quality. In this pa-per, we study the problem of pruning low-quality teachers in a crowd, in order to improve the la-bel quality of our training set. Despite the hur-dles mentioned above, we show that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be re-peatedly labeled by multiple teachers. We provide a theoretical analysis of our algorithm and back our findings with empirical evidence. 1
| Year | Citations | |
|---|---|---|
Page 1
Page 1