Publication | Open Access
Explanations can be manipulated and geometry is to blame
145
Citations
19
References
2019
Year
Artificial IntelligenceCognitive ScienceEngineeringMachine LearningGeometryExplanation-based LearningGeometric ReasoningExplanation MethodsExplainable AiInterpretabilityComputer ScienceNeural NetworksCausalityUpper BoundCausal InferencePlausible Reasoning
Explanation methods aim to make neural networks more trustworthy and interpretable. The paper demonstrates that explanation methods can be manipulated—a disconcerting property—and proposes mechanisms to enhance robustness. The authors theoretically link the manipulability of explanations to geometric properties of neural networks and propose mechanisms to improve robustness. They show that explanations can be arbitrarily manipulated with imperceptible input changes while preserving outputs, and they derive an upper bound on this susceptibility.
Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.
| Year | Citations | |
|---|---|---|
Page 1
Page 1