Table of Contents
Fetching ...

Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Joakim Edin, Andreas Geert Motzfeldt, Casper L. Christensen, Tuukka Ruotsalo, Lars Maaløe, Maria Maistro

TL;DR

The paper argues that AOPC-based faithfulness scores are inherently model-dependent due to varying lower and upper AOPC limits across models and inputs, making cross-model comparisons unreliable. It introduces Normalized AOPC (NAOPC) with exact and beam-approximate variants to align these limits and enable meaningful cross-model evaluation of feature attribution methods. Empirical results across five datasets and multiple architectures show that NAOPC can substantially alter model rankings of faithfulness, challenging conclusions drawn from unnormalized AOPC. The authors provide two implementations (NAOPC_exact and NAOPC_beam), release a PyPI package, and discuss practical guidance on when normalization is necessary and how to manage computational costs.

Abstract

Deep neural network predictions are notoriously difficult to interpret. Feature attribution methods aim to explain these predictions by identifying the contribution of each input feature. Faithfulness, often evaluated using the area over the perturbation curve (AOPC), reflects feature attributions' accuracy in describing the internal mechanisms of deep neural networks. However, many studies rely on AOPC to compare faithfulness across different models, which we show can lead to false conclusions about models' faithfulness. Specifically, we find that AOPC is sensitive to variations in the model, resulting in unreliable cross-model comparisons. Moreover, AOPC scores are difficult to interpret in isolation without knowing the model-specific lower and upper limits. To address these issues, we propose a normalization approach, Normalized AOPC (NAOPC), enabling consistent cross-model evaluations and more meaningful interpretation of individual scores. Our experiments demonstrate that this normalization can radically change AOPC results, questioning the conclusions of earlier studies and offering a more robust framework for assessing feature attribution faithfulness. Our code is available at https://github.com/JoakimEdin/naopc.

Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

TL;DR

The paper argues that AOPC-based faithfulness scores are inherently model-dependent due to varying lower and upper AOPC limits across models and inputs, making cross-model comparisons unreliable. It introduces Normalized AOPC (NAOPC) with exact and beam-approximate variants to align these limits and enable meaningful cross-model evaluation of feature attribution methods. Empirical results across five datasets and multiple architectures show that NAOPC can substantially alter model rankings of faithfulness, challenging conclusions drawn from unnormalized AOPC. The authors provide two implementations (NAOPC_exact and NAOPC_beam), release a PyPI package, and discuss practical guidance on when normalization is necessary and how to manage computational costs.

Abstract

Deep neural network predictions are notoriously difficult to interpret. Feature attribution methods aim to explain these predictions by identifying the contribution of each input feature. Faithfulness, often evaluated using the area over the perturbation curve (AOPC), reflects feature attributions' accuracy in describing the internal mechanisms of deep neural networks. However, many studies rely on AOPC to compare faithfulness across different models, which we show can lead to false conclusions about models' faithfulness. Specifically, we find that AOPC is sensitive to variations in the model, resulting in unreliable cross-model comparisons. Moreover, AOPC scores are difficult to interpret in isolation without knowing the model-specific lower and upper limits. To address these issues, we propose a normalization approach, Normalized AOPC (NAOPC), enabling consistent cross-model evaluations and more meaningful interpretation of individual scores. Our experiments demonstrate that this normalization can radically change AOPC results, questioning the conclusions of earlier studies and offering a more robust framework for assessing feature attribution faithfulness. Our code is available at https://github.com/JoakimEdin/naopc.
Paper Structure (30 sections, 8 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 8 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Distributions of lower and upper AOPC limits across models on the Yelp$_{\text{short}}$ test set computed with exhaustive search. The substantially different distributions demonstrate that AOPC bounds are model-specific, making both cross-model comparisons and interpretation of individual scores unreliable without normalization.
  • Figure 2: Effect of normalization on faithfulness rankings across models and attribution methods. For both comprehensiveness (higher is better) and sufficiency (lower is better), NAOPC$_{\text{beam}}$ changes cross-model rankings but preserves within-model rankings.
  • Figure 3: Faithfulness ranking of model and feature attribution method pairs when evaluated on Yelp$_{\text{short}}$ using AOPC, NAOPC$_{\text{exact}}$, and NAOPC$_{\text{beam}}$. The figure shows that normalization changes the cross-model comparisons and that NAOPC$_{\text{beam}}$ accurately approximates NAOPC$_{\text{exact}}$
  • Figure 4: Lower and upper AOPC limits calculated with NAOPC$_{\text{beam}}$ using different beam sizes. RoBERTa$_{\text{Yelp}}$ and BERT$_{\text{Yelp}}$ (a,b) stabilize at $B=5$, while BERT$_{\text{AG-News}}$ (c) requires $B=1000$ for stable results.
  • Figure 5: Distributions of lower and upper AOPC limits for various models on the SST2$_{\text{short}}$ test set. Each distribution reflects the range of possible AOPC scores for a given model, influenced by individual input examples. The inter-model variations demonstrate the need for normalization when comparing AOPC scores across different models.
  • ...and 5 more figures