A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Yawei Li; Yang Zhang; Kenji Kawaguchi; Ashkan Khakzar; Bernd Bischl; Mina Rezaei

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

TL;DR

This work proposes two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness, which are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms.

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

TL;DR

Abstract

Paper Structure (33 sections, 2 theorems, 1 equation, 15 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 2 theorems, 1 equation, 15 figures, 3 tables, 2 algorithms.

Introduction
Related work
Feature attribution methods
Evaluation metrics for feature attribution methods
Analysis of prior evaluation metrics
Retraining-based evaluation metrics
Evaluation on semi-natural datasets
Order-based evaluation metrics
Method
Problem formulation
Soundness evaluation
Completeness evaluation
Experiments
Validation of the proposed metrics
Comparison with order-based metrics
...and 18 more sections

Key Result

Theorem 4.11

Given a set $\mathcal{A}_{\mathrm{inc}}\subseteq\mathcal{A}$ and $\mathcal{A}_{\mathrm{inc}}\cap\mathcal{I}\neq\emptyset$, suppose that $\mathcal{S}_v(\mathcal{A}_{\mathrm{inc}})=\{\mathcal{S} \subseteq \mathcal{A}_{\mathrm{inc}}: \rho(f(\mathcal{S})) = \rho(f(\mathcal{A}_{\mathrm{inc}}))\}$ is not

Figures (15)

Figure 1: Analysis of retraining-based metrics. Compared to (a) $\mathbb{D}_{\mathrm{P, Train}}^{(1)}$, (b) $\mathbb{D}_{\mathrm{P, Train}}^{(2)}$ introduces an additional class-related spurious correlation during perturbation, visible in the upper-right region of the sample. (c) Despite equivalent removal of informative features (central portions of images) using both perturbation strategies, the two retrained models demonstrate different test accuracy (0.66 vs. 0.88), suggesting that the test accuracy of the retrained model does not accurately reflect the quantity of information removal.
Figure 2: Analysis of evaluation on semi-natural datasets. (a) Designed semi-natural datasets and attribution maps from crafted attribution methods. (b) Each method excels on the dataset for which it has prior knowledge, but it underperforms on the other.
Figure 3: Evaluation on semi-natural datasets vs. on real-world datasets. Evaluation results on a semi-natural and real dataset can be markedly different. On the semi-natural dataset $\mathbb{D}_{\mathrm{S}}^{(1)}$, A "dummy" method Rect simply using the prior information about the dataset $\mathbb{D}_{\mathrm{S}}^{(1)}$ performs the best, while it has the worst performance on CIFAR-100.
Figure 4: Graphical demonstration for a better understanding of (a) the relationship between two attributions ($\mathcal{A}$ and $\mathcal{A}^\prime$). Although $\mathcal{A}$ and $\mathcal{A}^\prime$ have equal soundness ($1.0$ in this case), $\mathcal{A}^\prime$ has higher completeness. (b) Although $\mathcal{A}$ and $\mathcal{A}^\prime$ have equal completeness, $\mathcal{A}^\prime$ has higher soundness. (c) We compare $\mathcal{A}\cap\mathcal{I}$ with $\mathcal{A}$ and $\mathcal{I}$ to measure soundness and completeness.
Figure 5: Soundness evaluation. Computing the soundness of $\mathcal{A}$ in a single step is unfeasible. Instead, we incrementally include a subset $\mathcal{A}_{\mathrm{inc}}$ in input and compute its soundness. This process involves identifying the optimal set $\mathcal{A}^{*}$ and calculating $\frac{|\mathcal{A}^{*}|_{\eta}}{|\mathcal{A}_{\mathrm{inc}}|_{\eta}}$. A particular $\mathcal{A}^{*}$ is associated with a specific predictive level (i.e., model performance). When comparing two attribution methods, we can standardize the predictive level, allowing us to evaluate the soundness at this fixed level.
...and 10 more figures

Theorems & Definitions (11)

Definition 4.1: Predictive information measurement $\varphi$
Definition 4.2: Attribution method $\eta$
Definition 4.3: Optimality of attribution method
Definition 4.4: Predictive feature set $\mathcal{I}$
Definition 4.5: Attributed feature set $\mathcal{A}$
Definition 4.6: Optimality of attributed feature set $\mathcal{A}$
Definition 4.7: Operator $|\cdot|_g$
Definition 4.8: Soundness
Definition 4.9: Completeness
Theorem 4.11
...and 1 more

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

TL;DR

Abstract

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (11)