Publication | Open Access
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
52
Citations
18
References
2020
Year
Artificial IntelligenceEngineeringMachine LearningSociotechnical SystemsComputational Social ScienceData ScienceData MiningPattern RecognitionBiasDataset ConstructionBenchmark DatasetsPeople BackAlgorithmic BiasMachine Learning ModelKnowledge DiscoveryComputer ScienceAutomated Decision-makingBias DetectionDataset BiasAlgorithmic Fairness
Significant attention has focused on algorithmic unfairness in machine learning datasets that reveal biases toward white, cisgender, male, and Western subjects, yet less focus has been given to the histories, values, and norms embedded in such datasets. The study proposes a research program—a genealogy of machine learning data—to investigate how and why these datasets are created and whose values shape data collection. The program examines the contextual and contingent conditions of dataset creation and outlines four research questions about their operation as infrastructure. This interrogation forces us to “bring the people back in” by revealing the labor embedded in dataset construction and opening new avenues for contestation by other researchers.
In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1