Table of Contents
Fetching ...

Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition

Lintong Zhang, Kang Yin, Seong-Whan Lee

TL;DR

This work tackles the challenge of explaining misclassifications in fine-grained visual tasks by proposing FG-VCE, a non-generative, end-to-end framework that delivers fine-grained counterfactual explanations. It combines object-level and part-level insights through a Saliency Partition module that approximates per-feature Shapley contributions and an iterative counterfactual generation process that replaces the most contributive features with semantically similar candidates until the prediction aligns with the correct class, guided by a joint objective $L_{tot} = L_{sim} + L_{cls}$. The approach additionally defines invariant and dominant regions as $\Delta M_{Inv.}$ and $\Delta M_{Dom.}$ to contrast model behavior before and after counterfactual changes, enabling clear contrastive explanations for misclassification. Experiments on CUB-200-2011 and Stanford Dogs using ResNet-50 and VGG-16 show FG-VCE yields more granular, human-interpretable explanations and outperforms baselines on insertion/deletion metrics and the proposed Compact Activation Score $\xi$, highlighting its potential for fine-grained interpretability in real-world tasks.

Abstract

Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks, particularly in cases of model misclassification, where explanations may be insufficiently detailed. To address this limitation, we propose a fine-grained counterfactual explanation framework that generates both object-level and part-level interpretability, addressing two fundamental questions: (1) which fine-grained features contribute to model misclassification, and (2) where dominant local features influence counterfactual adjustments. Our approach yields explainable counterfactuals in a non-generative manner by quantifying similarity and weighting component contributions within regions of interest between correctly classified and misclassified samples. Furthermore, we introduce a saliency partition module grounded in Shapley value contributions, isolating features with region-specific relevance. Extensive experiments demonstrate the superiority of our approach in capturing more granular, intuitively meaningful regions, surpassing fine-grained methods.

Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition

TL;DR

This work tackles the challenge of explaining misclassifications in fine-grained visual tasks by proposing FG-VCE, a non-generative, end-to-end framework that delivers fine-grained counterfactual explanations. It combines object-level and part-level insights through a Saliency Partition module that approximates per-feature Shapley contributions and an iterative counterfactual generation process that replaces the most contributive features with semantically similar candidates until the prediction aligns with the correct class, guided by a joint objective . The approach additionally defines invariant and dominant regions as and to contrast model behavior before and after counterfactual changes, enabling clear contrastive explanations for misclassification. Experiments on CUB-200-2011 and Stanford Dogs using ResNet-50 and VGG-16 show FG-VCE yields more granular, human-interpretable explanations and outperforms baselines on insertion/deletion metrics and the proposed Compact Activation Score , highlighting its potential for fine-grained interpretability in real-world tasks.

Abstract

Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks, particularly in cases of model misclassification, where explanations may be insufficiently detailed. To address this limitation, we propose a fine-grained counterfactual explanation framework that generates both object-level and part-level interpretability, addressing two fundamental questions: (1) which fine-grained features contribute to model misclassification, and (2) where dominant local features influence counterfactual adjustments. Our approach yields explainable counterfactuals in a non-generative manner by quantifying similarity and weighting component contributions within regions of interest between correctly classified and misclassified samples. Furthermore, we introduce a saliency partition module grounded in Shapley value contributions, isolating features with region-specific relevance. Extensive experiments demonstrate the superiority of our approach in capturing more granular, intuitively meaningful regions, surpassing fine-grained methods.

Paper Structure

This paper contains 16 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Traditional attribution-based explanation techniques often produce similar visual explanations for both correctly classified and misclassified samples.
  • Figure 2: The proposed FG-VCE framework. The framework consists of three main stages: (1) feature extraction, in which the model processes both the misclassified sample and correctly classified reference samples to generate feature representations; (2) saliency partition, which computes the contribution of individual feature points using an approximation of the Shapley value; and (3) fine-grained contrastive counterfactual generation and explanation, where the most informative regions are selectively modified to produce a contrastive explanation that aligns with the target class while preserving semantic consistency.
  • Figure 3: Qualitative comparison of fine-grained attribution maps generated by PGD and our proposed method. The efficiency distribution of misclassified samples under both PGD and our method is presented on the right.
  • Figure 4: Evaluating the rationality of visual contrastive explanations. Answer the question 'Why P (misclassification class) rather than Q (correct class)?' based on invariant regions and dominant regions in fine-grained classification task.
  • Figure 5: The impact of refining shapley operations on fine-grained saliency maps (invariant and dominant region) when generating explanations for misclassifications.
  • ...and 2 more figures