Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

Soham Mitra; Atri Sukul; Swalpa Kumar Roy; Pravendra Singh; Vinay Verma

Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Verma

TL;DR

The paper addresses the interpretability gap in CNN decisions by refining visual explanations through ScoreCAM++. By replacing ScoreCAM's normalization with a tanh-based function and applying tanh to upsampled activation maps, ScoreCAM++ explicitly gates low-priority regions and magnifies high-priority ones. The approach is formalized with $L_{ScoreCAM++}^c = ReLU(\sum_k \alpha_k^c \cdot tanh(A^k_l))$, where $H^k_l = s(Up(A^k_l))$ and $s(.) = tanh(.)$, and $\alpha_k^c$ encodes the change in confidence. Extensive experiments on Dogs/Cats and ImageNet across ResNet-18, VGG-19, and ViT show consistent improvements in Average Drop, Increase in Confidence, and Win Percentage over ScoreCAM and other baselines, supported by qualitative visualizations. Ablation studies confirm the benefit of tanh normalization and upsampled scaling, highlighting ScoreCAM++ as a practical, more trustworthy method for visual explanations in CNNs.

Abstract

Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods.

Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

TL;DR

, where

and

, and

encodes the change in confidence. Extensive experiments on Dogs/Cats and ImageNet across ResNet-18, VGG-19, and ViT show consistent improvements in Average Drop, Increase in Confidence, and Win Percentage over ScoreCAM and other baselines, supported by qualitative visualizations. Ablation studies confirm the benefit of tanh normalization and upsampled scaling, highlighting ScoreCAM++ as a practical, more trustworthy method for visual explanations in CNNs.

Abstract

Paper Structure (20 sections, 14 equations, 6 figures, 8 tables)

This paper contains 20 sections, 14 equations, 6 figures, 8 tables.

Introduction
Related Work
Background
Proposed Approach: ScoreCAM++
Experimental Results
Results on Dog and Cat Dataset
Results on ImageNet Dataset
Results Analysis
Ablation study
Comparison among Different Activation Functions
Evaluation on Images with Multiple Objects
Importance of Scaling Upsampled Layer
Average Drop in Logit
Conclusion
Evaluation Metrics
...and 5 more sections

Figures (6)

Figure 1: Stepwise illustration of our proposed approach: Step 1: Acquiring activation maps from the last layer of the CNN model. Step 2: Upsampling the activation maps to the size of the image and applying the tanh function. Step 3: Performing pointwise multiplication of the maps obtained in Step 2 with the original image. Step 4: Passing the resulting images through the CNN model to obtain its scores. Step 5: Calculating the weighted sum of the maps from Step 2 and their corresponding scores to generate the final output.
Figure 2: Qualitative results obtained using VGG-19 architecture over the Cat and Dog Dataset for various methods. AugGradCAM++ refers to Augmented GradCAM++.
Figure 3: Qualitative results obtained using ResNet-18 architecture over the ImageNet Dataset for various methods. It is observed that ScoreCAM++ provides the best visual explanations compared to baseline methods. AugGradCAM++ refers to Augmented GradCAM++.
Figure 4: Qualitative results obtained using VGG-19 architecture over the ImageNet Dataset for various methods. It is observed that ScoreCAM++ provides the best visual explanations compared to baseline methods. AugGradCAM++ refers to Augmented GradCAM++.
Figure 5: Visual explanations generated by ScoreCAM++ using VGG-19 over ImageNet dataset with different types of activation functions.
...and 1 more figures

Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

TL;DR

Abstract

Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)