Fairness in learning: classic and contextual bandits

Abstract

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on chained confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

References

Page 1

	Year	Citations
Finite-time Analysis of the Multiarmed Bandit Problem Peter Auer, Nicolò Cesa‐Bianchi, Paul Fischer Machine Learning	2002	5.7K
Fairness through awareness Cynthia Dwork, Moritz Hardt, Toniann Pitassi, EngineeringDiscriminationFairness Through AwarenessSocial StratificationClassification Task	2012	3.3K
Asymptotically efficient adaptive allocation rules Tze-Leung Lai, Herbert Robbins Advances in Applied Mathematics Mathematical ProgrammingEngineeringDynamic Resource AllocationComputational ComplexityComputer Science	1985	2.4K
Big Data�s Disparate Impact Solon Barocas California Law Review	2016	1K
Learning Fair Representations Rich Zemel, Yu Wu, Kevin Swersky,	2013	992
Three naive Bayes approaches for discrimination-free classification Toon Calders, Sicco Verwer Data Mining and Knowledge Discovery Artificial IntelligenceEngineeringMachine LearningDiscriminationClassification Method	2010	760
Contextual bandits with linear Payoff functions Wei Chu, Lihong Li, Lev Reyzin,	2011	577
Fairness-aware Learning through Regularization Approach Toshihiro Kamishima, Shotaro Akaho, Jun Sakuma Artificial IntelligencePrivacy ProtectionEngineeringMachine LearningComputational Social Science	2011	383
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal, Daniel Hsu, Satyen Kale, arXiv (Cornell University)	2014	313
k-NN as an implementation of situation testing for discrimination discovery and prevention Binh Thanh Luong, Salvatore Ruggieri, Franco Turini EngineeringDiscriminationLawData ScienceData Mining	2011	270

Page 1