Publication | Open Access
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
34
Citations
10
References
2012
Year
Mathematical ProgrammingContextual BanditMechanism DesignEngineeringStochastic OptimizationStochastic GameGame TheoryRobbins Lower BoundBusinessSampling TheoryThompson SamplingSequential Decision MakingProbability TheoryCumulative RegretGamesDecision TheoryStatisticsExploration V Exploitation
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
| Year | Citations | |
|---|---|---|
Page 1
Page 1