Publication | Closed Access
The Epoch-Greedy algorithm for contextual multi-armed bandits
328
Citations
9
References
2007
Year
Unknown Venue
The complexity term S in a standard supervised learning sample‑complexity bound underlies the analysis. The authors introduce Epoch‑Greedy, an algorithm for contextual multi‑armed bandits with side information. Epoch‑Greedy alternates exploration and exploitation epochs, leveraging contextual side information to guide action selection. Epoch‑Greedy achieves horizon‑independent performance, with regret bounded by O(T^{2/3}S^{1/3}) or better, driven by the hypothesis‑class sample‑complexity bound.
We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales as O(T2/3S1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.
| Year | Citations | |
|---|---|---|
Page 1
Page 1