Counterfactual Explanations for Machine Learning: A Review.

TLDR

Machine learning is widely used in decision systems, yet its opaque decision processes hinder stakeholder understanding, prompting a growing research effort to define explainability goals and methods, including counterfactual explanations that link legal doctrine to high‑impact domains such as finance and healthcare. The paper reviews and categorizes counterfactual explanation research, aiming to design a rubric that captures desirable algorithmic properties. The authors evaluate all existing counterfactual explanation algorithms against the rubric, enabling systematic comparison. The rubric facilitates comparison of approaches, highlights major research themes, and identifies gaps and promising future directions.

Abstract

Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine-learning-based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently-proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.

References

Page 1

	Year	Citations
Greedy function approximation: A gradient boosting machine. Jerome H. Friedman The Annals of Statistics EngineeringMachine LearningData MiningPattern RecognitionGreedy Function Approximation	2001	27.3K
A note on two problems in connexion with graphs E. Dijkstra Numerische Mathematik Mathematical ProgrammingGraph MinorEngineeringGraph TheoryExtremal Graph Theory	1959	23.5K
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Natural Language ProcessingMultimodal LlmConvolutional Neural NetworkMachine VisionMachine Learning	2017	20.1K
"Why Should I Trust You?" Marco Túlio Ribeiro, Sameer Singh, Carlos Guestrin Artificial IntelligenceEngineeringMachine LearningTrust Management ArchitectureVerification	2016	14K
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper) Dagstuhl Research Online Publication Server	2024	13.1K
Learning Deep Features for Discriminative Localization Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Convolutional Neural NetworkEngineeringMachine LearningLocalizationDeep Features	2016	10.6K
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) Amina Adadi, Mohammed Berrada IEEE Access Artificial IntelligenceEngineeringMachine LearningIntelligent SystemsCommunication	2018	5.4K
Causality: Models, Reasoning and Inference Christopher Hitchcock, Judea Pearl The Philosophical Review ReasoningEconomicsCausalityPublic HealthCausal Reasoning	2001	4.8K
Making things happen: a theory of causal explanation Choice Reviews Online Cognitive ScienceLong Awaited BookBehavioral Decision MakingProximate CauseCausal Explanation	2004	3.4K
Norm theory: Comparing reality to its alternatives. Daniel Kahneman, Dale T. Miller Psychological Review Behavioral Decision MakingSocial PsychologySocial CategorizationCognitionNorm Theory	1986	3K

Page 1