Publication | Closed Access
EXP3 with drift detection for the switching bandit problem
37
Citations
12
References
2015
Year
Unknown Venue
Mathematical ProgrammingEngineeringMachine LearningStochastic OptimizationStochastic GameMulti-armed BanditGame TheoryTime PeriodCumulative RewardBusinessSequential Decision MakingProbability TheoryComputer ScienceDrift DetectionCombinatorial OptimizationMechanism DesignExploration V ExploitationOperations Research
The multi-armed bandit is a model of exploration and exploitation, where one must select, within a finite set of arms, the one which maximizes the cumulative reward up to the time horizon T. For the adversarial multi-armed bandit problem, where the sequence of rewards is chosen by an oblivious adversary, the notion of best arm during the time horizon is too restrictive for applications such as ad-serving, where the best ad could change during time range. In this paper, we consider a variant of the adversarial multi-armed bandit problem, where the time horizon is divided into unknown time periods within which rewards are drawn from stochastic distributions. During each time period, there is an optimal arm which may be different from the optimal arm at the previous time period. We present an algorithm taking advantage of the constant exploration of EXP3 to detect when the best arm changes. Its analysis shows that on a run divided into N periods where the best arm changes, the proposed algorithms achieves a regret in O(N √T log T).
| Year | Citations | |
|---|---|---|
Page 1
Page 1