SCAN: Visual Explanations with Self-Confidence and Analysis Networks

Gwanghee Lee; Sungyoon Jeong; Kyoungson Jhang

SCAN: Visual Explanations with Self-Confidence and Analysis Networks

Gwanghee Lee, Sungyoon Jeong, Kyoungson Jhang

TL;DR

By providing a unified framework that is both architecturally universal and highly faithful, SCAN enhances model transparency and offers a more reliable tool for understanding the decision-making processes of complex neural networks.

Abstract

Explainable AI (XAI) has become essential in computer vision to make the decision-making processes of deep learning models transparent. However, current visual explanation (XAI) methods face a critical trade-off between the high fidelity of architecture-specific methods and the broad applicability of universal ones. This often results in abstract or fragmented explanations and makes it difficult to compare explanatory power across diverse model families, such as CNNs and Transformers. This paper introduces the Self-Confidence and Analysis Networks (SCAN), a novel universal framework that overcomes these limitations for both convolutional neural network and transformer architectures. SCAN utilizes an AutoEncoder-based approach to reconstruct features from a model's intermediate layers. Guided by the Information Bottleneck principle, it generates a high-resolution Self-Confidence Map that identifies information-rich regions. Extensive experiments on diverse architectures and datasets demonstrate that SCAN consistently achieves outstanding performance on various quantitative metrics such as AUC-D, Negative AUC, Drop%, and Win%. Qualitatively, it produces significantly clearer, object-focused explanations than existing methods. By providing a unified framework that is both architecturally universal and highly faithful, SCAN enhances model transparency and offers a more reliable tool for understanding the decision-making processes of complex neural networks.

SCAN: Visual Explanations with Self-Confidence and Analysis Networks

TL;DR

Abstract

Paper Structure (33 sections, 10 equations, 9 figures, 9 tables)

This paper contains 33 sections, 10 equations, 9 figures, 9 tables.

Introduction
Related Works
Model-Agnostic Perturbation-Based Methods
Architecture-Specific Methods
The Research Gap: Fidelity vs. Universality
Methodology
Concepts
Gradient-masked Feature Map
Information Bottleneck Theory
Information Bottleneck Theory of SCAN
Loss functions with Information Bottleneck
Confidence Loss
Reconstruction Loss
Analysis Networks
Experiments
...and 18 more sections

Figures (9)

Figure 1: SCAN process. Feature maps are extracted from the target model and reconstructed, and then important regions containing significant information are visualized using the self-confidence map.
Figure 2: Analysis networks for CNN and transformer models. The ResNet-based decoder is optimized for CNN model structures, while the transformer-based decoder is designed for transformer model structures.
Figure 3: Qualitative comparison of visual explanation methods for a ViT-b16 model trained on ImageNet. Compared to baselines such as Raw Attention, Rollout, and others, SCAN generates a more coherent and object-focused explanation.
Figure 4: Qualitative comparison of SCAN and other methods on ResNet50V2. While conventional methods generate abstract saliency maps, SCAN produced more distinct explanations with clear object boundaries.
Figure 5: Qualitative results across various models. SCAN consistently generated clear and object-focused explanations.
...and 4 more figures

SCAN: Visual Explanations with Self-Confidence and Analysis Networks

TL;DR

Abstract

SCAN: Visual Explanations with Self-Confidence and Analysis Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)