Contextual bandits with linear Payoff functions

Abstract

In this paper, we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d(√ dimensional feature vectors, we prove an O Td ln 3) (KT ln(T)/δ) regret bound that holds with probability 1 − δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω ( √ Td) for this setting, matching the upper bound up to logarithmic factors. 1

References

Page 1

	Year	Citations
Finite-time Analysis of the Multiarmed Bandit Problem Peter Auer, Nicolò Cesa‐Bianchi, Paul Fischer Machine Learning	2002	5.7K
A contextual-bandit approach to personalized news article recommendation Lihong Li, Wei Chu, John Langford,	2010	2.4K
Asymptotically efficient adaptive allocation rules Tze-Leung Lai, Herbert Robbins Advances in Applied Mathematics Mathematical ProgrammingEngineeringDynamic Resource AllocationComputational ComplexityComputer Science	1985	2.4K
The Nonstochastic Multiarmed Bandit Problem Peter Auer, Nicolò Cesa‐Bianchi, Yoav Freund, SIAM Journal on Computing Mathematical ProgrammingBandit ProblemEngineeringMultiarmed Bandit ProblemStochastic Game	2002	2.2K
Stochastic Linear Optimization Under Bandit Feedback Varsha Dani, Thomas P. Hayes, Sham M. Kakade ScholarlyCommons (University of Pennsylvania)	2008	618
Sample mean based index policies by<i>O</i>(log<i>n</i>) regret for the multi-armed bandit problem Rajeev Agrawal Advances in Applied Probability	1995	555
Linearly Parameterized Bandits Paat Rusmevichientong, John N. Tsitsiklis Mathematics of Operations Research Contextual BanditEngineeringStochastic OptimizationStatistical InferenceDecision Theory	2010	454
The Epoch-Greedy algorithm for contextual multi-armed bandits John Langford, Tong Zhang	2007	328
A One-Armed Bandit Problem with a Concomitant Variable Michael Woodroofe Journal of the American Statistical Association Mathematical ProgrammingBayesian StatisticEngineeringGame TheoryBayesian Inference	1979	109
Online Models for Content Optimization Deepak Agarwal, Bee-Chung Chen, Pradheep Elango,	2008	106

Page 1