Table of Contents
Fetching ...

Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector

Xianren Zhang, Dongwon Lee, Suhang Wang

TL;DR

This work tackles the need for faithful explanations in deep vision models, focusing on inherently explainable attributions and two failure modes: incompleteness (missing discriminative features in attributions) and interlocking (the selector and predictor cycle). It introduces COMET, a selector-predictor-detector framework where a pre-trained detector monitors masked-out regions and a novel objective encourages complete feature selection while penalizing information left behind in masked-out areas. The method optimizes $\mathcal{L}(x) = \mathcal{L}_P(x_s) - a\,\mathcal{L}_D(x_{1-s}) + bR(S(x))$, with $x_s = S(x) \odot x + (1 - S(x)) \odot q$ and $x_{1-s} = (1 - S(x)) \odot x + S(x) \odot q$, to enforce sufficiency and completeness. Empirical results on ImageNet-9, NICO++, and BAM show higher predictive accuracy than baselines and attribution maps with better coverage, localization, and fidelity, highlighting the practical impact for trustworthiness and model debugging in real-world vision tasks.

Abstract

As deep vision models' popularity rapidly increases, there is a growing emphasis on explanations for model predictions. The inherently explainable attribution method aims to enhance the understanding of model behavior by identifying the important regions in images that significantly contribute to predictions. It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor (making predictions using the identified features). Despite many advancements, existing methods suffer from the incompleteness problem, where discriminative features are masked out, and the interlocking problem, where the non-optimized selector initially selects noise, causing the predictor to fit on this noise and perpetuate the cycle. To address these problems, we introduce a new objective that discourages the presence of discriminative features in the masked-out regions thus enhancing the comprehensiveness of feature selection. A pre-trained detector is introduced to detect discriminative features in the masked-out region. If the selector selects noise instead of discriminative features, the detector can observe and break the interlocking situation by penalizing the selector. Extensive experiments show that our model makes accurate predictions with higher accuracy than the regular black-box model, and produces attribution maps with high feature coverage, localization ability, fidelity and robustness. Our code will be available at \href{https://github.com/Zood123/COMET}{https://github.com/Zood123/COMET}.

Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector

TL;DR

This work tackles the need for faithful explanations in deep vision models, focusing on inherently explainable attributions and two failure modes: incompleteness (missing discriminative features in attributions) and interlocking (the selector and predictor cycle). It introduces COMET, a selector-predictor-detector framework where a pre-trained detector monitors masked-out regions and a novel objective encourages complete feature selection while penalizing information left behind in masked-out areas. The method optimizes , with and , to enforce sufficiency and completeness. Empirical results on ImageNet-9, NICO++, and BAM show higher predictive accuracy than baselines and attribution maps with better coverage, localization, and fidelity, highlighting the practical impact for trustworthiness and model debugging in real-world vision tasks.

Abstract

As deep vision models' popularity rapidly increases, there is a growing emphasis on explanations for model predictions. The inherently explainable attribution method aims to enhance the understanding of model behavior by identifying the important regions in images that significantly contribute to predictions. It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor (making predictions using the identified features). Despite many advancements, existing methods suffer from the incompleteness problem, where discriminative features are masked out, and the interlocking problem, where the non-optimized selector initially selects noise, causing the predictor to fit on this noise and perpetuate the cycle. To address these problems, we introduce a new objective that discourages the presence of discriminative features in the masked-out regions thus enhancing the comprehensiveness of feature selection. A pre-trained detector is introduced to detect discriminative features in the masked-out region. If the selector selects noise instead of discriminative features, the detector can observe and break the interlocking situation by penalizing the selector. Extensive experiments show that our model makes accurate predictions with higher accuracy than the regular black-box model, and produces attribution maps with high feature coverage, localization ability, fidelity and robustness. Our code will be available at \href{https://github.com/Zood123/COMET}{https://github.com/Zood123/COMET}.
Paper Structure (14 sections, 1 theorem, 6 equations, 5 figures, 6 tables)

This paper contains 14 sections, 1 theorem, 6 equations, 5 figures, 6 tables.

Key Result

proposition thmcounterproposition

(Optimal Selector). Assume that the predictor and the detector can effectively predict the labels of their input, i.e., $f_P(x_s) = p(y|x_s)$ and $f_D(x_{1-s}) = p(y|x_{1-s})$, then the optimal selector satisfies both the sufficiency and completeness requirements.

Figures (5)

  • Figure 1: Attribution maps from different explanation approaches.
  • Figure 2: The framework consists of a selector $S$, predictor $f_P$, and a pre-trained detector $f_D$. The selector first identifies discriminative features and generates the attribution map $S(x)$. The $S(x)$ selects features $x_s$, which are then fed into the predictor. Meanwhile, the remaining parts of the image, represented as $x_{1-s}$, are fed into the feature detector. The selector is optimized to assist the predictor in accurately predicting the label while confusing the detector by not leaving any discriminative features in $x_{1-s}$.
  • Figure 3: Ablation study.
  • Figure 4: Test for different coefficients. The left two figures are PxAP for different $b$ and $t$. The right two figures are PxAP for different $a$.
  • Figure 5: Top (an8Flower): Attribution maps comparison; Bottom (BAM): Attribution maps when model trained on the object (left) or scene (right) labels.

Theorems & Definitions (3)

  • definition thmcounterdefinition
  • proposition thmcounterproposition
  • proof