CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

Townim Faisal Chowdhury; Kewen Liao; Vu Minh Hieu Phan; Minh-Son To; Yutong Xie; Kevin Hung; David Ross; Anton van den Hengel; Johan W. Verjans; Zhibin Liao

CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

TL;DR

CAPE addresses the interpretability gap of CAM by reformulating CAM as a probabilistic ensemble, producing per-region contributions that sum to the image-level prediction and are comparable across classes. It introduces CAPE and μ-CAPE, leveraging a bias-adjusted map $M' = M + \mathbf{b}$, region-wise softmax weighting, and a distillation-based bootstrap training with temperatures $T$ and $T'$. The approach yields voxel-level contributions $\hat{\mathbf{P}}_{ijc}$ that support cross-class comparisons while maintaining competitive accuracy on CUB, ImageNet, and CMML, and it offers efficient inference suitable for practical deployment. The work demonstrates interpretability gains via probabilistic explanations, acknowledges training convergence challenges due to softmax-based soft predictions, and proposes strategies like selective KLD to mitigate them.

Abstract

Deep Neural Networks (DNNs) are widely used for visual classification tasks, but their complex computation process and black-box nature hinder decision transparency and interpretability. Class activation maps (CAMs) and recent variants provide ways to visually explain the DNN decision-making process by displaying 'attention' heatmaps of the DNNs. Nevertheless, the CAM explanation only offers relative attention information, that is, on an attention heatmap, we can interpret which image region is more or less important than the others. However, these regions cannot be meaningfully compared across classes, and the contribution of each region to the model's class prediction is not revealed. To address these challenges that ultimately lead to better DNN Interpretation, in this paper, we propose CAPE, a novel reformulation of CAM that provides a unified and probabilistically meaningful assessment of the contributions of image regions. We quantitatively and qualitatively compare CAPE with state-of-the-art CAM methods on CUB and ImageNet benchmark datasets to demonstrate enhanced interpretability. We also test on a cytology imaging dataset depicting a challenging Chronic Myelomonocytic Leukemia (CMML) diagnosis problem. Code is available at: https://github.com/AIML-MED/CAPE.

CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

TL;DR

, region-wise softmax weighting, and a distillation-based bootstrap training with temperatures

and

. The approach yields voxel-level contributions

that support cross-class comparisons while maintaining competitive accuracy on CUB, ImageNet, and CMML, and it offers efficient inference suitable for practical deployment. The work demonstrates interpretability gains via probabilistic explanations, acknowledges training convergence challenges due to softmax-based soft predictions, and proposes strategies like selective KLD to mitigate them.

Abstract

Paper Structure (15 sections, 11 equations, 4 figures, 2 tables)

This paper contains 15 sections, 11 equations, 4 figures, 2 tables.

Introduction
Related Work
Methodology of Model Interpretation
Class Activation Maps (CAMs)
CAM as a Probabilistic Ensemble (CAPE)
Image Region Importance (Saliency)
CAPE Explanation
$\mu$-CAPE Explanation
Bootstrap Training
Experiments
Datasets and Implementation Details
Qualitative Analysis
Quantitative Analysis
Ablation Study on Classification Performance
Discussion and Conclusion

Figures (4)

Figure 1: The comparison between CAM and the proposed CAPE explanation methods for a fine-grained class difference analysis example between Siberian Husky (Husky) and Alaskan Malamute (Malamute) classes on ImageNet. We overlay the explanation values before up-sampling on top of the produced heatmaps. CAM explanation is class independent which highlights similar regions for similar object classes, making the explanation maps incomparable. Instead, CAPE-produced explanation values (before up-sampling and min-max normalization) are probability values for each spatial location (image region) and class combination. We color code the top-5, next-5 (top-6 to top-10), etc., for the positive values (i.e., more Husky) and the negative values (i.e., more Malamute) on the Diff graph. The green box shows an example analysis of the $+1.9\%$ class difference by summing the color-coded regions and demonstrating to what levels they explain the class difference.
Figure 2: The overview of the proposed CAPE classification layer with bootstrap training. AVG stands for averaging.
Figure 3: Qualitative visualisation using ResNet-50. Each dataset has two rows for the top-2 predicted classes' explanation maps. Class confidence scores are on the left side of each explanation map. We select CAM, Smooth Grad-CAM++, Lift-CAM, and Score-CAM to represent different visualization ways for the same vanilla classification layer. We show CAPE and $\mu$-CAPE (PF) explanations for the proposed CAPE model, full comparisons are in Fig. 2 to 4 in the supplementary material. “SG-CAM++” denotes Smooth Grad-CAM++.
Figure 4: The ResNet-50 training and validation classification accuracy recorded during the training course for the CUB dataset.

CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

TL;DR

Abstract

CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)