Reliable Explanations or Random Noise? A Reliability Metric for XAI

Poushali Sengupta; Sabita Maharjan; Frank Eliassen; Shashi Raj Pandey; Yan Zhang

Reliable Explanations or Random Noise? A Reliability Metric for XAI

Poushali Sengupta, Sabita Maharjan, Frank Eliassen, Shashi Raj Pandey, Yan Zhang

TL;DR

The paper tackles the problem of explaining the explanations themselves by introducing the Explanation Reliability Index (ERI), a principled, axiomatic framework to quantify how stable attribution signals are under non-adversarial perturbations, redundancy, model evolution, and distributional shifts. ERI decomposes stability into five components (ERI-S, ERI-R, ERI-M, ERI-D, ERI-T) and aggregates them into a scalar reliability score, supported by Lipschitz-type guarantees and temporal bounds. It also introduces ERI-Bench, a standardized benchmark to stress-test explanation reliability across diverse data modalities, and provides theoretical results showing that popular explainers often fail reliability axioms while dependence-aware methods like MCIR exhibit stronger reliability. Across EEG, HAR, energy-load forecasting, and CIFAR-10, empirical results reveal substantial reliability gaps in common explainers and highlight the practical value of ERI for reliability-aware XAI. The work positions ERI as a complementary layer to faithfulness and causality, enabling safer, more trustworthy deployment of XAI in high-stakes applications.

Abstract

In recent years, explaining decisions made by complex machine learning models has become essential in high-stakes domains such as energy systems, healthcare, finance, and autonomous systems. However, the reliability of these explanations, namely, whether they remain stable and consistent under realistic, non-adversarial changes, remains largely unmeasured. Widely used methods such as SHAP and Integrated Gradients (IG) are well-motivated by axiomatic notions of attribution, yet their explanations can vary substantially even under system-level conditions, including small input perturbations, correlated representations, and minor model updates. Such variability undermines explanation reliability, as reliable explanations should remain consistent across equivalent input representations and small, performance-preserving model changes. We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms: robustness to small input perturbations, consistency under feature redundancy, smoothness across model evolution, and resilience to mild distributional shifts. For each axiom, we derive formal guarantees, including Lipschitz-type bounds and temporal stability results. We further propose ERI-T, a dedicated measure of temporal reliability for sequential models, and introduce ERI-Bench, a benchmark designed to systematically stress-test explanation reliability across synthetic and real-world datasets. Experimental results reveal widespread reliability failures in popular explanation methods, showing that explanations can be unstable under realistic deployment conditions. By exposing and quantifying these instabilities, ERI enables principled assessment of explanation reliability and supports more trustworthy explainable AI (XAI) systems.

Reliable Explanations or Random Noise? A Reliability Metric for XAI

TL;DR

Abstract

Paper Structure (183 sections, 43 theorems, 402 equations, 19 figures, 13 tables)

This paper contains 183 sections, 43 theorems, 402 equations, 19 figures, 13 tables.

Introduction
Related Work
Axioms of Reliable Explanations
The Explanation Reliability Index (ERI)
Theoretical Guarantees
Experiments
Datasets, Explainers & ERI Metrics.
Results and Discussion
Representative Reliability Visualizations
CIFAR-10: IG Reliability Under CNN:
Additional reliability diagnostics.
Discussion and Limitations
Conclusion
MCIR: Dependence-Aware Explanation Method
Energy-domain example.
...and 168 more sections

Key Result

Theorem 1

Assume the predictive model $\medmath{f:\mathbb{R}^d\to\mathbb{R}^k}$ is locally $\medmath{L_f(x)}$-Lipschitz in a neighborhood of $\medmath{x}$, i.e., $\medmath{\|f(x)-f(x+\delta)\|\le L_f(x)\,\|\delta\|} \ \text{for all }\medmath{\|\delta\|\le\epsilon},$ and the explanation map $\medmath{E:\mathbb

Figures (19)

Figure 1: Synthetic ERI-R collapse curves under increasing feature redundancy.
Figure 2: CIFAR--10 reliability diagnostics for IG: (a) perturbation robustness under bounded noise, (b) attribution map illustrating gradient saturation, and (c) deletion-curve instability.
Figure 3: Robustness histogram of IG under noise on EEG (ERI-T).
Figure 4: Schematic illustration of ERI-R (redundancy-based reliability). As input redundancy increases ($\alpha \to 1$), reliable explainers should yield nearly identical attributions ($D_{\mathrm{R}} \to 0$).
Figure 5: Temporal smoothness of explanations captured by ERI-T. Abrupt attribution changes without corresponding input events indicate low temporal reliability.
...and 14 more figures

Theorems & Definitions (150)

Definition 1: Collapse Operator.
Definition 2: Collapsed Explanation
Definition 3: Explanation Drift
Definition 4: Explanation Reliability Index (ERI).
Definition 5: ERI-S: Perturbation Stability
Definition 6: Redundancy Drift (ERI-R)
Definition 7: ERI-M: Model-Evolution Consistency
Definition 8: ERI-D: Distributional Robustness
Definition 9: ERI-T: Temporal Reliability
Theorem 1: Lipschitz Stability Bound
...and 140 more

Reliable Explanations or Random Noise? A Reliability Metric for XAI

TL;DR

Abstract

Reliable Explanations or Random Noise? A Reliability Metric for XAI

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (150)