Table of Contents
Fetching ...

Reliable Explanations or Random Noise? A Reliability Metric for XAI

Poushali Sengupta, Sabita Maharjan, Frank Eliassen, Shashi Raj Pandey, Yan Zhang

TL;DR

The paper tackles the problem of explaining the explanations themselves by introducing the Explanation Reliability Index (ERI), a principled, axiomatic framework to quantify how stable attribution signals are under non-adversarial perturbations, redundancy, model evolution, and distributional shifts. ERI decomposes stability into five components (ERI-S, ERI-R, ERI-M, ERI-D, ERI-T) and aggregates them into a scalar reliability score, supported by Lipschitz-type guarantees and temporal bounds. It also introduces ERI-Bench, a standardized benchmark to stress-test explanation reliability across diverse data modalities, and provides theoretical results showing that popular explainers often fail reliability axioms while dependence-aware methods like MCIR exhibit stronger reliability. Across EEG, HAR, energy-load forecasting, and CIFAR-10, empirical results reveal substantial reliability gaps in common explainers and highlight the practical value of ERI for reliability-aware XAI. The work positions ERI as a complementary layer to faithfulness and causality, enabling safer, more trustworthy deployment of XAI in high-stakes applications.

Abstract

In recent years, explaining decisions made by complex machine learning models has become essential in high-stakes domains such as energy systems, healthcare, finance, and autonomous systems. However, the reliability of these explanations, namely, whether they remain stable and consistent under realistic, non-adversarial changes, remains largely unmeasured. Widely used methods such as SHAP and Integrated Gradients (IG) are well-motivated by axiomatic notions of attribution, yet their explanations can vary substantially even under system-level conditions, including small input perturbations, correlated representations, and minor model updates. Such variability undermines explanation reliability, as reliable explanations should remain consistent across equivalent input representations and small, performance-preserving model changes. We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms: robustness to small input perturbations, consistency under feature redundancy, smoothness across model evolution, and resilience to mild distributional shifts. For each axiom, we derive formal guarantees, including Lipschitz-type bounds and temporal stability results. We further propose ERI-T, a dedicated measure of temporal reliability for sequential models, and introduce ERI-Bench, a benchmark designed to systematically stress-test explanation reliability across synthetic and real-world datasets. Experimental results reveal widespread reliability failures in popular explanation methods, showing that explanations can be unstable under realistic deployment conditions. By exposing and quantifying these instabilities, ERI enables principled assessment of explanation reliability and supports more trustworthy explainable AI (XAI) systems.

Reliable Explanations or Random Noise? A Reliability Metric for XAI

TL;DR

The paper tackles the problem of explaining the explanations themselves by introducing the Explanation Reliability Index (ERI), a principled, axiomatic framework to quantify how stable attribution signals are under non-adversarial perturbations, redundancy, model evolution, and distributional shifts. ERI decomposes stability into five components (ERI-S, ERI-R, ERI-M, ERI-D, ERI-T) and aggregates them into a scalar reliability score, supported by Lipschitz-type guarantees and temporal bounds. It also introduces ERI-Bench, a standardized benchmark to stress-test explanation reliability across diverse data modalities, and provides theoretical results showing that popular explainers often fail reliability axioms while dependence-aware methods like MCIR exhibit stronger reliability. Across EEG, HAR, energy-load forecasting, and CIFAR-10, empirical results reveal substantial reliability gaps in common explainers and highlight the practical value of ERI for reliability-aware XAI. The work positions ERI as a complementary layer to faithfulness and causality, enabling safer, more trustworthy deployment of XAI in high-stakes applications.

Abstract

In recent years, explaining decisions made by complex machine learning models has become essential in high-stakes domains such as energy systems, healthcare, finance, and autonomous systems. However, the reliability of these explanations, namely, whether they remain stable and consistent under realistic, non-adversarial changes, remains largely unmeasured. Widely used methods such as SHAP and Integrated Gradients (IG) are well-motivated by axiomatic notions of attribution, yet their explanations can vary substantially even under system-level conditions, including small input perturbations, correlated representations, and minor model updates. Such variability undermines explanation reliability, as reliable explanations should remain consistent across equivalent input representations and small, performance-preserving model changes. We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms: robustness to small input perturbations, consistency under feature redundancy, smoothness across model evolution, and resilience to mild distributional shifts. For each axiom, we derive formal guarantees, including Lipschitz-type bounds and temporal stability results. We further propose ERI-T, a dedicated measure of temporal reliability for sequential models, and introduce ERI-Bench, a benchmark designed to systematically stress-test explanation reliability across synthetic and real-world datasets. Experimental results reveal widespread reliability failures in popular explanation methods, showing that explanations can be unstable under realistic deployment conditions. By exposing and quantifying these instabilities, ERI enables principled assessment of explanation reliability and supports more trustworthy explainable AI (XAI) systems.
Paper Structure (183 sections, 43 theorems, 402 equations, 19 figures, 13 tables)

This paper contains 183 sections, 43 theorems, 402 equations, 19 figures, 13 tables.

Key Result

Theorem 1

Assume the predictive model $\medmath{f:\mathbb{R}^d\to\mathbb{R}^k}$ is locally $\medmath{L_f(x)}$-Lipschitz in a neighborhood of $\medmath{x}$, i.e., $\medmath{\|f(x)-f(x+\delta)\|\le L_f(x)\,\|\delta\|} \ \text{for all }\medmath{\|\delta\|\le\epsilon},$ and the explanation map $\medmath{E:\mathbb

Figures (19)

  • Figure 1: Synthetic ERI-R collapse curves under increasing feature redundancy.
  • Figure 2: CIFAR--10 reliability diagnostics for IG: (a) perturbation robustness under bounded noise, (b) attribution map illustrating gradient saturation, and (c) deletion-curve instability.
  • Figure 3: Robustness histogram of IG under noise on EEG (ERI-T).
  • Figure 4: Schematic illustration of ERI-R (redundancy-based reliability). As input redundancy increases ($\alpha \to 1$), reliable explainers should yield nearly identical attributions ($D_{\mathrm{R}} \to 0$).
  • Figure 5: Temporal smoothness of explanations captured by ERI-T. Abrupt attribution changes without corresponding input events indicate low temporal reliability.
  • ...and 14 more figures

Theorems & Definitions (150)

  • Definition 1: Collapse Operator.
  • Definition 2: Collapsed Explanation
  • Definition 3: Explanation Drift
  • Definition 4: Explanation Reliability Index (ERI).
  • Definition 5: ERI-S: Perturbation Stability
  • Definition 6: Redundancy Drift (ERI-R)
  • Definition 7: ERI-M: Model-Evolution Consistency
  • Definition 8: ERI-D: Distributional Robustness
  • Definition 9: ERI-T: Temporal Reliability
  • Theorem 1: Lipschitz Stability Bound
  • ...and 140 more