Publication | Open Access
Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks
2.8K
Citations
35
References
2018
Year
Unknown Venue
Convolutional Neural NetworkMachine VisionMachine LearningImage AnalysisEngineeringPattern RecognitionCnn ArchitectureVisual ExplanationFeature LearningExplanation-based LearningVisual GroundingVision Language ModelVisual Question AnsweringComputer ScienceDeep Convolutional NetworksDeep LearningExplainable AiComputer Vision
CNNs have achieved remarkable success in vision tasks, yet their opaque nature has spurred interest in explainable deep learning. This work proposes Grad‑CAM++ to generate more accurate visual explanations for CNN predictions, improving object localization and handling multiple instances of a class. Grad‑CAM++ computes class‑specific weights by taking a weighted sum of the positive partial derivatives of the final convolutional feature maps with respect to the target class score, then uses these weights to produce the heatmap. Experiments on standard datasets demonstrate that Grad‑CAM++ yields superior visual explanations compared to the original Grad‑CAM, as confirmed by both subjective and objective evaluations.
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose Grad-CAM++ to provide better visual explanations of CNN model predictions (when compared to Grad-CAM), in terms of better localization of objects as well as explaining occurrences of multiple objects of a class in a single image. We provide a mathematical explanation for the proposed method, Grad-CAM++, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the class label under consideration. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ indeed provides better visual explanations for a given CNN architecture when compared to Grad-CAM.
| Year | Citations | |
|---|---|---|
Page 1
Page 1