Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs
Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Verma
TL;DR
The paper addresses the interpretability gap in CNN decisions by refining visual explanations through ScoreCAM++. By replacing ScoreCAM's normalization with a tanh-based function and applying tanh to upsampled activation maps, ScoreCAM++ explicitly gates low-priority regions and magnifies high-priority ones. The approach is formalized with $L_{ScoreCAM++}^c = ReLU(\sum_k \alpha_k^c \cdot tanh(A^k_l))$, where $H^k_l = s(Up(A^k_l))$ and $s(.) = tanh(.)$, and $\alpha_k^c$ encodes the change in confidence. Extensive experiments on Dogs/Cats and ImageNet across ResNet-18, VGG-19, and ViT show consistent improvements in Average Drop, Increase in Confidence, and Win Percentage over ScoreCAM and other baselines, supported by qualitative visualizations. Ablation studies confirm the benefit of tanh normalization and upsampled scaling, highlighting ScoreCAM++ as a practical, more trustworthy method for visual explanations in CNNs.
Abstract
Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods.
