CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao
TL;DR
CAPE addresses the interpretability gap of CAM by reformulating CAM as a probabilistic ensemble, producing per-region contributions that sum to the image-level prediction and are comparable across classes. It introduces CAPE and μ-CAPE, leveraging a bias-adjusted map $M' = M + \mathbf{b}$, region-wise softmax weighting, and a distillation-based bootstrap training with temperatures $T$ and $T'$. The approach yields voxel-level contributions $\hat{\mathbf{P}}_{ijc}$ that support cross-class comparisons while maintaining competitive accuracy on CUB, ImageNet, and CMML, and it offers efficient inference suitable for practical deployment. The work demonstrates interpretability gains via probabilistic explanations, acknowledges training convergence challenges due to softmax-based soft predictions, and proposes strategies like selective KLD to mitigate them.
Abstract
Deep Neural Networks (DNNs) are widely used for visual classification tasks, but their complex computation process and black-box nature hinder decision transparency and interpretability. Class activation maps (CAMs) and recent variants provide ways to visually explain the DNN decision-making process by displaying 'attention' heatmaps of the DNNs. Nevertheless, the CAM explanation only offers relative attention information, that is, on an attention heatmap, we can interpret which image region is more or less important than the others. However, these regions cannot be meaningfully compared across classes, and the contribution of each region to the model's class prediction is not revealed. To address these challenges that ultimately lead to better DNN Interpretation, in this paper, we propose CAPE, a novel reformulation of CAM that provides a unified and probabilistically meaningful assessment of the contributions of image regions. We quantitatively and qualitatively compare CAPE with state-of-the-art CAM methods on CUB and ImageNet benchmark datasets to demonstrate enhanced interpretability. We also test on a cytology imaging dataset depicting a challenging Chronic Myelomonocytic Leukemia (CMML) diagnosis problem. Code is available at: https://github.com/AIML-MED/CAPE.
