Table of Contents
Fetching ...

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

TL;DR

This work proposes two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness, which are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms.

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

TL;DR

This work proposes two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness, which are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms.

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
Paper Structure (33 sections, 2 theorems, 1 equation, 15 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 2 theorems, 1 equation, 15 figures, 3 tables, 2 algorithms.

Key Result

Theorem 4.11

Given a set $\mathcal{A}_{\mathrm{inc}}\subseteq\mathcal{A}$ and $\mathcal{A}_{\mathrm{inc}}\cap\mathcal{I}\neq\emptyset$, suppose that $\mathcal{S}_v(\mathcal{A}_{\mathrm{inc}})=\{\mathcal{S} \subseteq \mathcal{A}_{\mathrm{inc}}: \rho(f(\mathcal{S})) = \rho(f(\mathcal{A}_{\mathrm{inc}}))\}$ is not

Figures (15)

  • Figure 1: Analysis of retraining-based metrics. Compared to (a) $\mathbb{D}_{\mathrm{P, Train}}^{(1)}$, (b) $\mathbb{D}_{\mathrm{P, Train}}^{(2)}$ introduces an additional class-related spurious correlation during perturbation, visible in the upper-right region of the sample. (c) Despite equivalent removal of informative features (central portions of images) using both perturbation strategies, the two retrained models demonstrate different test accuracy (0.66 vs. 0.88), suggesting that the test accuracy of the retrained model does not accurately reflect the quantity of information removal.
  • Figure 2: Analysis of evaluation on semi-natural datasets. (a) Designed semi-natural datasets and attribution maps from crafted attribution methods. (b) Each method excels on the dataset for which it has prior knowledge, but it underperforms on the other.
  • Figure 3: Evaluation on semi-natural datasets vs. on real-world datasets. Evaluation results on a semi-natural and real dataset can be markedly different. On the semi-natural dataset $\mathbb{D}_{\mathrm{S}}^{(1)}$, A "dummy" method Rect simply using the prior information about the dataset $\mathbb{D}_{\mathrm{S}}^{(1)}$ performs the best, while it has the worst performance on CIFAR-100.
  • Figure 4: Graphical demonstration for a better understanding of (a) the relationship between two attributions ($\mathcal{A}$ and $\mathcal{A}^\prime$). Although $\mathcal{A}$ and $\mathcal{A}^\prime$ have equal soundness ($1.0$ in this case), $\mathcal{A}^\prime$ has higher completeness. (b) Although $\mathcal{A}$ and $\mathcal{A}^\prime$ have equal completeness, $\mathcal{A}^\prime$ has higher soundness. (c) We compare $\mathcal{A}\cap\mathcal{I}$ with $\mathcal{A}$ and $\mathcal{I}$ to measure soundness and completeness.
  • Figure 5: Soundness evaluation. Computing the soundness of $\mathcal{A}$ in a single step is unfeasible. Instead, we incrementally include a subset $\mathcal{A}_{\mathrm{inc}}$ in input and compute its soundness. This process involves identifying the optimal set $\mathcal{A}^{*}$ and calculating $\frac{|\mathcal{A}^{*}|_{\eta}}{|\mathcal{A}_{\mathrm{inc}}|_{\eta}}$. A particular $\mathcal{A}^{*}$ is associated with a specific predictive level (i.e., model performance). When comparing two attribution methods, we can standardize the predictive level, allowing us to evaluate the soundness at this fixed level.
  • ...and 10 more figures

Theorems & Definitions (11)

  • Definition 4.1: Predictive information measurement $\varphi$
  • Definition 4.2: Attribution method $\eta$
  • Definition 4.3: Optimality of attribution method
  • Definition 4.4: Predictive feature set $\mathcal{I}$
  • Definition 4.5: Attributed feature set $\mathcal{A}$
  • Definition 4.6: Optimality of attributed feature set $\mathcal{A}$
  • Definition 4.7: Operator $|\cdot|_g$
  • Definition 4.8: Soundness
  • Definition 4.9: Completeness
  • Theorem 4.11
  • ...and 1 more