Faithful and Customizable Explanations of Black Box Models

TLDR

Predictive models increasingly aid experts, so understanding their behavior in feature subspaces is essential for trust. The authors introduce MUSE, a model‑agnostic framework that explains black‑box models by revealing their behavior in feature‑defined subspaces. MUSE learns a small set of compact decision rules that jointly optimize fidelity to the original model, unambiguity, and interpretability. Experiments and user studies show that MUSE generates customizable, compact, and accurate explanations that outperform state‑of‑the‑art baselines.

Abstract

As predictive models increasingly assist human experts (e.g., doctors) in day-to-day decision making, it is crucial for experts to be able to explore and understand how such models behave in different feature subspaces in order to know if and when to trust them. To this end, we propose Model Understanding through Subspace Explanations (MUSE), a novel model agnostic framework which facilitates understanding of a given black box model by explaining how it behaves in subspaces characterized by certain features of interest. Our framework provides end users (e.g., doctors) with the flexibility of customizing the model explanations by allowing them to input the features of interest. The construction of explanations is guided by a novel objective function that we propose to simultaneously optimize for fidelity to the original model, unambiguity and interpretability of the explanation. More specifically, our objective allows us to learn, with optimality guarantees, a small number of compact decision sets each of which captures the behavior of a given black box model in unambiguous, well-defined regions of the feature space. Experimental evaluation with real-world datasets and user studies demonstrate that our approach can generate customizable, highly compact, easy-to-understand, yet accurate explanations of various kinds of predictive models compared to state-of-the-art baselines.

References

Page 1

	Year	Citations
"Why Should I Trust You?" Marco Túlio Ribeiro, Sameer Singh, Carlos Guestrin Artificial IntelligenceEngineeringMachine LearningTrust Management ArchitectureVerification	2016	14K
Fast algorithms for mining association rules Rakesh Agrawal, Ramakrishnan Srikant	1998	10.7K
Anchors: High-Precision Model-Agnostic Explanations Marco Túlio Ribeiro, Sameer Singh, Carlos Guestrin Proceedings of the AAAI Conference on Artificial Intelligence Artificial IntelligenceEngineeringMachine LearningData ScienceHigh-precision Model-agnostic Explanations	2018	2K
Understanding Neural Networks Through Deep Visualization Jason Yosinski, Jeff Clune, Anh Mai Nguyen, arXiv (Cornell University) Convolutional Neural NetworkDeep Neural NetworksImage AnalysisMachine VisionData Science	2015	1.5K
The budgeted maximum coverage problem Samir Khuller, Anna Moss, Joseph Naor Information Processing Letters Mathematical ProgrammingCost AllocationEngineeringCost IssueOptimization Problem	1999	979
Top-Down Induction of Decision Trees Classifiers—A Survey Lior Rokach, Oded Maimon IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) EngineeringMachine LearningText MiningKnowledge Discovery In DatabasesData Science	2005	775
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, The Annals of Applied Statistics Artificial IntelligenceBayesian Decision TheoryEngineeringMachine LearningBayesian Rule Lists	2015	756
Interpretable Decision Sets Himabindu Lakkaraju, Stephen H. Bach, Jure Leskovec	2016	650
Not Just a Black Box: Learning Important Features Through Propagating Activation Differences Avanti Shrikumar, Peyton Greenside, Anshul Kundaje arXiv (Cornell University) Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningNeural Network	2016	547
Intelligible models for classification and regression Yin Lou, Rich Caruana, Johannes Gehrke EngineeringMachine LearningMachine Learning ToolData SciencePattern Recognition	2012	509

Page 1