Publication | Closed Access
Contextual bandits with linear Payoff functions
577
Citations
16
References
2011
Year
Unknown Venue
In this paper, we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d(√ dimensional feature vectors, we prove an O Td ln 3) (KT ln(T)/δ) regret bound that holds with probability 1 − δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω ( √ Td) for this setting, matching the upper bound up to logarithmic factors. 1
| Year | Citations | |
|---|---|---|
2002 | 5.7K | |
2010 | 2.4K | |
1985 | 2.4K | |
2002 | 2.2K | |
2008 | 618 | |
1995 | 555 | |
2010 | 454 | |
2007 | 328 | |
1979 | 109 | |
2008 | 106 |
Page 1
Page 1