Concepedia

Publication | Open Access

Best Arm Identification in Multi-Armed Bandits

372

Citations

7

References

2010

Year

TLDR

The problem of identifying the best arm in a stochastic multi‑armed bandit game is framed by defining regret as the gap between the optimal arm’s mean reward and that of the chosen arm, generalizing the classic 1/Δ² sample requirement for distinguishing two distributions. The authors aim to develop efficient strategies for best‑arm identification in stochastic bandits. They introduce a highly exploring UCB policy and a parameter‑free successive‑rejects algorithm for best‑arm selection. Both algorithms achieve near‑optimal regret, decreasing exponentially at the best possible rate up to logarithmic factors, with the successive‑rejects method being parameter‑free and sample‑complexity bounded by O(∑_{i≠*}1/Δ_i²) up to log factors.

Abstract

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible. However, while the UCB policy needs the tuning of a parameter depending on the unobservable hardness of the task, the successive rejects policy benefits from being parameter-free, and also independent of the scaling of the rewards. As a by-product of our analysis, we show that identifying the best arm (when it is unique) requires a number of samples of order (up to a log(K) factor) Σ i 1/Δ2i, where the sum is on the suboptimal arms andΔi represents the difference between the mean reward of the best arm and the one of arm i. This generalizes the well-known fact that one needs of order of 1/Δ2 samples to differentiate the means of two distributions with gap Δ.

References

YearCitations

Page 1