Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning

TLDR

Machine learning models are now routinely deployed across sectors such as criminal justice and healthcare, yet the interpretability tools designed to aid practitioners have received little systematic evaluation. This study investigates how data scientists use two popular interpretability tools—InterpretML’s GAMs and SHAP—to understand and evaluate machine learning models. We employed a contextual inquiry with 11 participants and a survey of 197 data scientists to observe their interactions with these tools and identify common challenges. Results show that data scientists over‑trust and misuse interpretability tools, struggle to accurately interpret their visualizations, and exhibit mental models that reveal gaps in tool design, highlighting implications for researchers and designers.

Abstract

Machine learning (ML) models are now routinely deployed in domains ranging from criminal justice to healthcare. With this newfound ubiquity, ML has moved beyond academia and grown into an engineering discipline. To that end, interpretability tools have been designed to help data scientists and machine learning practitioners better understand how ML models work. However, there has been little evaluation of the extent to which these tools achieve this goal. We study data scientists' use of two existing interpretability tools, the InterpretML implementation of GAMs and the SHAP Python package. We conduct a contextual inquiry (N=11) and a survey (N=197) of data scientists to observe how they use interpretability tools to uncover common issues that arise when building and evaluating ML models. Our results indicate that data scientists over-trust and misuse interpretability tools. Furthermore, few of our participants were able to accurately describe the visualizations output by these tools. We highlight qualitative themes for data scientists' mental models of interpretability tools. We conclude with implications for researchers and tool designers, and contextualize our findings in the social science literature.

References

Page 1

	Year	Citations

Page 1