Publication | Closed Access
Fairness in learning: classic and contextual bandits
121
Citations
18
References
2016
Year
EngineeringContextual BanditsGame TheoryAlgorithmic LearningLearning AlgorithmFairness (Computer Systems)Language StudiesDecision TheoryCognitive ScienceOnline AlgorithmProbability TheoryComputer ScienceFairness (Language Acquisition)Cumulative Regret BoundImperfect Information GameExploration V ExploitationChained Confidence IntervalsContextual BanditAlgorithmic Fairness
We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on chained confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.
| Year | Citations | |
|---|---|---|
2002 | 5.7K | |
2012 | 3.3K | |
1985 | 2.4K | |
2016 | 1K | |
2013 | 992 | |
2010 | 760 | |
2011 | 577 | |
2011 | 383 | |
2014 | 313 | |
2011 | 270 |
Page 1
Page 1