Publication | Closed Access
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
606
Citations
85
References
2021
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningBig Data AnalyticsAi ModelsAi SafetyModel WorkResponsible AiData ScienceData ResourcesManagementHumanartificial Intelligence CollaborationData ManagementStatisticsTrustworthy Artificial IntelligencePredictive AnalyticsData QualityData CascadesComputer ScienceData-centric AiHigh-stakes AiHealth Data ScienceDataset BiasTrustworthy AiData-driven MethodsSafe Artificial IntelligenceData Modeling
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define, identify, and present empirical evidence on Data Cascades—compounding events causing negative, downstream effects from data issues—triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a first-class citizen of AI, resulting in safer and more robust systems for all.
| Year | Citations | |
|---|---|---|
Page 1
Page 1