Publication | Closed Access
{Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits}
121
Citations
14
References
2015
Year
Unknown Venue
A stochastic combinatorial semi-bandit is an on-line learning problem where at each step a learn-ing agent chooses a subset of ground items sub-ject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In partic-ular, we analyze a UCB-like algorithm for solv-ing the problem, which is known to be computa-tionally efficient; and prove O(KL(1/∆) log n) and O( KLn log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polyloga-rithmic factor. 1
| Year | Citations | |
|---|---|---|
Page 1
Page 1