Table of Contents
Fetching ...

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yulan Guo, Yinghui Gao, Biao Li

TL;DR

This work addresses the lack of theoretical grounding in CAM-based CNN visualizations by introducing two axioms, Sensitivity and Conservation, and proposing XGrad-CAM, an axiom-informed visualization that generalizes to arbitrary CNNs. It derives an approximate, gradient-weighted scheme for feature maps and provides a Guided variant for richer details, demonstrating improved alignment with the axioms. Through experiments on VGG-16 and multiple benchmarks, XGrad-CAM shows enhanced axiom satisfaction and competitive localization and class-discrimination, while offering substantial efficiency advantages over Ablation-CAM. The approach offers a principled framework for interpreting CNN decisions with practical impact for visualization and debugging.

Abstract

To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years. In particular, several Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions. In spite of the reasonable visualization, lack of clear and sufficient theoretical support is the main limitation of these methods. In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods. Meanwhile, a dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible. Experiments demonstrate that XGrad-CAM is an enhanced version of Grad-CAM in terms of conservation and sensitivity. It is able to achieve better visualization performance than Grad-CAM, while also be class-discriminative and easy-to-implement compared with Grad-CAM++ and Ablation-CAM. The code is available at https://github.com/Fu0511/XGrad-CAM.

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

TL;DR

This work addresses the lack of theoretical grounding in CAM-based CNN visualizations by introducing two axioms, Sensitivity and Conservation, and proposing XGrad-CAM, an axiom-informed visualization that generalizes to arbitrary CNNs. It derives an approximate, gradient-weighted scheme for feature maps and provides a Guided variant for richer details, demonstrating improved alignment with the axioms. Through experiments on VGG-16 and multiple benchmarks, XGrad-CAM shows enhanced axiom satisfaction and competitive localization and class-discrimination, while offering substantial efficiency advantages over Ablation-CAM. The approach offers a principled framework for interpreting CNN decisions with practical impact for visualization and debugging.

Abstract

To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years. In particular, several Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions. In spite of the reasonable visualization, lack of clear and sufficient theoretical support is the main limitation of these methods. In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods. Meanwhile, a dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible. Experiments demonstrate that XGrad-CAM is an enhanced version of Grad-CAM in terms of conservation and sensitivity. It is able to achieve better visualization performance than Grad-CAM, while also be class-discriminative and easy-to-implement compared with Grad-CAM++ and Ablation-CAM. The code is available at https://github.com/Fu0511/XGrad-CAM.

Paper Structure

This paper contains 18 sections, 23 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The visualization of our XGrad-CAM and Guided XGrad-CAM. It is clear that both of these two approaches are class-discriminative and able to highlight the object of interest. In addition, Guided XGrad-CAM provides more details than XGrad-CAM.
  • Figure 2: (a) Normalized $\zeta({\bf F}^{l};k)$ is small in the last spatial layers of different CNN models, including AlexNet Krizhevsky2012ImageNet, VGG-16 Simonyan2014Very, VGG-19 Simonyan2014Very, Inception_V3 Szegedy2016Rethinking and ResNet-101 He2016Identity; (b) Normalized $\epsilon({\bf F}^l)$ is also small in the last spatial layers of different CNN models. The mean values are provided above the box-plots.
  • Figure 3: An overview of the XGrad-CAM scheme.
  • Figure 4: (a) A game of "What do you see" to evaluate the class-discriminability of each CAM method. Subject needs to answer what is being depicted in the visualization; (b) An example of XGrad-CAM visualization and its corresponding perturbed image.
  • Figure 5: Example explanation maps generated by Grad-CAM Selvaraju2017Grad, Grad-CAM++ Aditya2017Grad, Ablation-CAM ramaswamy2020ablation and our XGrad-CAM.
  • ...and 3 more figures