Concepedia

Publication | Closed Access

Contextual bandits with linear Payoff functions

577

Citations

16

References

2011

Year

Abstract

In this paper, we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d(√ dimensional feature vectors, we prove an O Td ln 3) (KT ln(T)/δ) regret bound that holds with probability 1 − δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω ( √ Td) for this setting, matching the upper bound up to logarithmic factors. 1

References

YearCitations

2002

5.7K

2010

2.4K

1985

2.4K

2002

2.2K

2008

618

1995

555

2010

454

2007

328

1979

109

2008

106

Page 1