A survey on datasets for fairness-aware machine learning

Abstract

As decision-making increasingly relies on Machine Learning (ML) and (big) data, the issue of fairness in data-driven Artificial Intelligence (AI) systems is receiving increasing attention from both research and industry. A large variety of fairness-aware machine learning solutions have been proposed which involve fairness-related interventions in the data, learning algorithms and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware machine learning. We focus on tabular data as the most common data representation for fairness-aware machine learning. We start our analysis by identifying relationships between the different attributes, particularly w.r.t. protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate the interesting relationships using exploratory analysis.

References

Page 1

	Year	Citations
SMOTE: Synthetic Minority Over-sampling Technique Nitesh V. Chawla, Kevin W. Bowyer, Lawrence Hall, Journal of Artificial Intelligence Research	2002	29.6K
UCI Machine Learning Repository Arthur Asuncion Medical Entomology and Zoology EngineeringMachine LearningData ScienceData MiningPattern Recognition	2007	24.3K
Advances in neural information processing systems 7 Computers & Mathematics with Applications Intelligent Information ProcessingEngineeringMachine LearningComputational NeuroscienceSystems 7	1996	14.4K
Advances in Neural Information Processing Systems 29 Onur Teymur, Kostas Zygalakis, Ben Calderhead EngineeringComputational NeuroscienceComputer ScienceIntelligent SystemsNeuromorphic Engineering	2016	13.4K
Proceedings of the 21st International Conference on Neural Information Processing Systems Daphne Koller, Dale Schuurmans, Yoshua Bengio, Intelligent Information ProcessingNeural Networks (Machine Learning)Computational NeuroscienceNeural RecodingNeurocomputers	2008	4.5K
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification Joy Buolamwini, Timnit Gebru Gender DisparityGendered PerceptionGender IdentityGender StudiesIntersectionality	2018	3.3K
Fairness through awareness Cynthia Dwork, Moritz Hardt, Toniann Pitassi, EngineeringDiscriminationFairness Through AwarenessSocial StratificationClassification Task	2012	3.3K
The Regression Analysis of Binary Sequences D. R. Cox Journal of the Royal Statistical Society Series B (Statistical Methodology) More Independent VariablesParticular TrialIndependent VariablesStatistical FoundationRare Event Estimation	1958	2.5K
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics 2016 Iain Murray, Matthew Graham Artificial IntelligenceAi ArchitectureEngineeringData ScienceStatistical Foundation	2016	2.1K
Equality of Opportunity in Supervised Learning Moritz Hardt, Eric Price, Nathan Srebro arXiv (Cornell University) Artificial IntelligenceEngineeringMachine LearningSpecified Sensitive AttributeDiscrimination	2016	1.9K

Page 1