Concepedia

Publication | Closed Access

Active Sampling for Class Probability Estimation and Ranking

190

Citations

31

References

2008

Year

Abstract

Abstract. In many cost-sensitive environments class probability estimates are used by decision makers to evaluate\nthe expected utility from a set of alternatives. Supervised learning can be used to build class probability\nestimates; however, it often is very costly to obtain training data with class labels. Active learning acquires\ndata incrementally, at each phase identifying especially useful additional data for labeling, and can be used to\neconomize on examples needed for learning. We outline the critical features of an active learner and present a\nsampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV\nidentifies particularly informative new data for learning based on the variance in probability estimates, and uses\nweighted sampling to account for a potential example’s informative value for the rest of the input space.We show\nempirically that the method reduces the number of data items that must be obtained and labeled, across a wide\nvariety of domains.We investigate the contribution of the components of the algorithm and showthat each provides\nvaluable information to help identify informative examples.We also compare BOOTSTRAP-LV with UNCERTAINTY\nSAMPLING, an existing active learning method designed to maximize classification accuracy. The results show\nthat BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the\nbehavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both\nUNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAPLV\ncompared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing\nactive sampling algorithms for classification.

References

YearCitations

Page 1