PAC Subset Selection in Stochastic Multi-armed Bandits

Abstract

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “sub-set selection ” problem finds application in a variety of areas. In the authors ’ previ-ous work (Kalyanakrishnan &amp; Stone, 2010), this problem is framed under a PAC setting (denoted “Explore-m”), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algo-rithms, in this paper, we design and ana-lyze an algorithm (“LUCB”) with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimiza-tion. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample com-plexity of PAC algorithms for Explore-m. 1.

References

Page 1

	Year	Citations

Page 1