Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
Lintong Zhang, Kang Yin, Seong-Whan Lee
TL;DR
This work tackles the challenge of explaining misclassifications in fine-grained visual tasks by proposing FG-VCE, a non-generative, end-to-end framework that delivers fine-grained counterfactual explanations. It combines object-level and part-level insights through a Saliency Partition module that approximates per-feature Shapley contributions and an iterative counterfactual generation process that replaces the most contributive features with semantically similar candidates until the prediction aligns with the correct class, guided by a joint objective $L_{tot} = L_{sim} + L_{cls}$. The approach additionally defines invariant and dominant regions as $\Delta M_{Inv.}$ and $\Delta M_{Dom.}$ to contrast model behavior before and after counterfactual changes, enabling clear contrastive explanations for misclassification. Experiments on CUB-200-2011 and Stanford Dogs using ResNet-50 and VGG-16 show FG-VCE yields more granular, human-interpretable explanations and outperforms baselines on insertion/deletion metrics and the proposed Compact Activation Score $\xi$, highlighting its potential for fine-grained interpretability in real-world tasks.
Abstract
Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks, particularly in cases of model misclassification, where explanations may be insufficiently detailed. To address this limitation, we propose a fine-grained counterfactual explanation framework that generates both object-level and part-level interpretability, addressing two fundamental questions: (1) which fine-grained features contribute to model misclassification, and (2) where dominant local features influence counterfactual adjustments. Our approach yields explainable counterfactuals in a non-generative manner by quantifying similarity and weighting component contributions within regions of interest between correctly classified and misclassified samples. Furthermore, we introduce a saliency partition module grounded in Shapley value contributions, isolating features with region-specific relevance. Extensive experiments demonstrate the superiority of our approach in capturing more granular, intuitively meaningful regions, surpassing fine-grained methods.
