Publication | Open Access
A survey on datasets for fairness‐aware machine learning
40
Citations
101
References
2022
Year
Artificial IntelligenceEngineeringMachine LearningBig Data AnalyticsFairness In Natural Language ProcessingData ScienceData MiningBiasData ResourcesFairness (Computer Systems)Language StudiesStatisticsAlgorithmic BiasKnowledge DiscoveryData PrivacyBayesian NetworkComputer ScienceEthical IssuesFairness (Language Acquisition)Automated Decision-makingBias DetectionDataset BiasFairness‐aware Machine LearningAlgorithmic FairnessArtificial Intelligence EthicsBig Data
Machine‑learning fairness has attracted growing attention as decision‑making increasingly relies on data, yet evaluating new methods requires realistic benchmark datasets that capture diverse settings. This paper surveys real‑world datasets used for fairness‑aware machine learning. The authors analyze tabular datasets by modeling attribute relationships with Bayesian networks and conducting exploratory analyses to uncover bias patterns. The article is classified under Commercial, Legal, and Ethical Issues > Fairness in Data Mining; Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.
Abstract As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing
| Year | Citations | |
|---|---|---|
Page 1
Page 1