The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

Concepedia

Publication | Closed Access

452

Citations

References

2007

Year

John Langford, Tong Zhang

Rare & Special e-Zone (The Hong Kong University of Science and Technology)

Abstract

We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information. Epoch-Greedy has the following properties: No knowledge of a time horizon is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. The regret scales as or better (sometimes, much better). Here is the complexity term in a sample complexity bound for standard supervised learning.

References

Page 1

	Year	Citations

Page 1