Table of Contents
Fetching ...

The Directed Prediction Change - Efficient and Trustworthy Fidelity Assessment for Local Feature Attribution Methods

Kevin Iselborn, David Dembinsky, Adriano Lucieri, Andreas Dengel

TL;DR

This work tackles the challenge of trustworthy fidelity assessment for local feature attribution (FA) methods in high-stakes domains by identifying a fundamental flaw in using Prediction Change (PC) alone to evaluate local explanations. It introduces Directed Prediction Change (DPC), a directional extension that incorporates both attribution and perturbation directions within Guided Perturbation, yielding a deterministic and computationally efficient fidelity metric that aligns with the same property as local Infidelity. Across two datasets (HELOC and ISIC), two models (linear and nonlinear), seven FA algorithms, and 4,744 configurations, DPC shifts the evaluation toward local FA methods and demonstrates strong determinism and substantial runtime savings (median speedup ~$9.91\times$, up to ~ $20\times$ when combined with PC). The approach facilitates holistic hyperparameter tuning and trustworthy comparisons, with practical implications for deploying XAI in medicine and finance, while acknowledging limitations on complex, high-dimensional data and outlining directions for improvement and extension in VXAI. Overall, DPC provides a principled, efficient, and deterministic framework for evaluating local FA methods that complements existing fidelity metrics and supports scalable, trustworthy explainability in high-risk settings.

Abstract

The utility of an explanation method critically depends on its fidelity to the underlying machine learning model. Especially in high-stakes medical settings, clinicians and regulators require explanations that faithfully reflect the model's decision process. Existing fidelity metrics such as Infidelity rely on Monte Carlo approximation, which demands numerous model evaluations and introduces uncertainty due to random sampling. This work proposes a novel metric for evaluating the fidelity of local feature attribution methods by modifying the existing Prediction Change (PC) metric within the Guided Perturbation Experiment. By incorporating the direction of both perturbation and attribution, the proposed Directed Prediction Change (DPC) metric achieves an almost tenfold speedup and eliminates randomness, resulting in a deterministic and trustworthy evaluation procedure that measures the same property as local Infidelity. DPC is evaluated on two datasets (skin lesion images and financial tabular data), two black-box models, seven explanation algorithms, and a wide range of hyperparameters. Across $4\,744$ distinct explanations, the results demonstrate that DPC, together with PC, enables a holistic and computationally efficient evaluation of both baseline-oriented and local feature attribution methods, while providing deterministic and reproducible outcomes.

The Directed Prediction Change - Efficient and Trustworthy Fidelity Assessment for Local Feature Attribution Methods

TL;DR

This work tackles the challenge of trustworthy fidelity assessment for local feature attribution (FA) methods in high-stakes domains by identifying a fundamental flaw in using Prediction Change (PC) alone to evaluate local explanations. It introduces Directed Prediction Change (DPC), a directional extension that incorporates both attribution and perturbation directions within Guided Perturbation, yielding a deterministic and computationally efficient fidelity metric that aligns with the same property as local Infidelity. Across two datasets (HELOC and ISIC), two models (linear and nonlinear), seven FA algorithms, and 4,744 configurations, DPC shifts the evaluation toward local FA methods and demonstrates strong determinism and substantial runtime savings (median speedup ~, up to ~ when combined with PC). The approach facilitates holistic hyperparameter tuning and trustworthy comparisons, with practical implications for deploying XAI in medicine and finance, while acknowledging limitations on complex, high-dimensional data and outlining directions for improvement and extension in VXAI. Overall, DPC provides a principled, efficient, and deterministic framework for evaluating local FA methods that complements existing fidelity metrics and supports scalable, trustworthy explainability in high-risk settings.

Abstract

The utility of an explanation method critically depends on its fidelity to the underlying machine learning model. Especially in high-stakes medical settings, clinicians and regulators require explanations that faithfully reflect the model's decision process. Existing fidelity metrics such as Infidelity rely on Monte Carlo approximation, which demands numerous model evaluations and introduces uncertainty due to random sampling. This work proposes a novel metric for evaluating the fidelity of local feature attribution methods by modifying the existing Prediction Change (PC) metric within the Guided Perturbation Experiment. By incorporating the direction of both perturbation and attribution, the proposed Directed Prediction Change (DPC) metric achieves an almost tenfold speedup and eliminates randomness, resulting in a deterministic and trustworthy evaluation procedure that measures the same property as local Infidelity. DPC is evaluated on two datasets (skin lesion images and financial tabular data), two black-box models, seven explanation algorithms, and a wide range of hyperparameters. Across distinct explanations, the results demonstrate that DPC, together with PC, enables a holistic and computationally efficient evaluation of both baseline-oriented and local feature attribution methods, while providing deterministic and reproducible outcomes.

Paper Structure

This paper contains 38 sections, 1 theorem, 14 equations, 10 figures, 4 tables.

Key Result

Theorem 1

Let $x$ be the considered data point, $x' \in \mathbb{R}^d$ be a baseline, $\mathcal{A}$ a FA method with consistent rankings on all perturbation paths between $x$ and $x'$, and $s_f^y$ the scoring function for model $f$ and class $y$. If $\mathcal{A}$ satisfies Sensitivity-$N$ for $s_f^y$ on all pe

Figures (10)

  • Figure 1: (a) The Monte Carlo sampling used by Infidelity exhibits high variance, requiring many samples for reliable FA evaluation (i.e., low standard deviation). (b) The proposed DPC metric achieves an almost tenfold speedup while providing a deterministic, and hence trustworthy, evaluation.
  • Figure 2: Overview of Prediction Change (PC) results for all considered models. The PC ranks baseline-oriented FA methods higher than other methods, whereas the Gradient (an optimal local FA method for linear models) performs comparably to random attribution.
  • Figure 3: Prediction Change (PC) evaluation of Integrated Gradients for nonlinear models when baseline-oriented methods are converted into pseudo-local variants ($multiply\_by\_inputs=$ False) following anconaGradientBasedAttributionMethods2019. As with the Sensitivity-$N$ property, the pseudo-local variants are consistently rated lower by PC than their baseline-oriented counterparts.
  • Figure 4: For a perturbation step $t$ in which feature $i$ of data point $x$ is replaced with a baseline value $b$, four relevant evaluation scenarios arise. Cases (i) and (ii) represent situations where the Prediction Change correctly ranks the local FA method, whereas (iii) and (iv) illustrate failure cases.
  • Figure 5: Directed Prediction Change (DPC) evaluation similar to Figure \ref{['fig:mbi_pc']}. DPC improves the scoring of local FA methods, with a weaker effect on image data than on tabular data.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1