How Reliable and Stable are Explanations of XAI Methods?

José Ribeiro; Lucas Cardoso; Vitor Santos; Eduardo Carvalho; Níkolas Carneiro; Ronnie Alves

How Reliable and Stable are Explanations of XAI Methods?

José Ribeiro, Lucas Cardoso, Vitor Santos, Eduardo Carvalho, Níkolas Carneiro, Ronnie Alves

TL;DR

The paper investigates the reliability and stability of explanations produced by XAI methods under data perturbations, using a diabetes dataset and four learners (LGBM, MLP, DT, KNN). It centers on the eXirt method based on Item Response Theory, leveraging item parameters $a_i$ (discrimination), $b_i$ (difficulty), $c_i$ (guessing) and ability $\theta_j$ to derive an Item Characteristic Curve (ICC) and a reliability interpretation via $P(U_{ij}=1|\theta_j)= c_i + (1-c_i)\frac{1}{1+ e^{-a_i(\theta_j - b_i)}}$. The study benchmarks six XAI methods (Dalex, Eli5, Lofo, Shap, Skater, eXirt) for stability of feature relevance rankings under perturbations and analyzes results with Spearman correlations and bump charts. Key findings show SHAP as the most stable across models, while eXirt can identify the most reliable explanations (e.g., unperturbed LGBM) and reveal perturbation sensitivities in non-IRT methods, motivating future work to derive a single confidence score from the IRT parameters and extend the framework to broader tasks.

Abstract

Black box models are increasingly being used in the daily lives of human beings living in society. Along with this increase, there has been the emergence of Explainable Artificial Intelligence (XAI) methods aimed at generating additional explanations regarding how the model makes certain predictions. In this sense, methods such as Dalex, Eli5, eXirt, Lofo and Shap emerged as different proposals and methodologies for generating explanations of black box models in an agnostic way. Along with the emergence of these methods, questions arise such as "How Reliable and Stable are XAI Methods?". With the aim of shedding light on this main question, this research creates a pipeline that performs experiments using the diabetes dataset and four different machine learning models (LGBM, MLP, DT and KNN), creating different levels of perturbations of the test data and finally generates explanations from the eXirt method regarding the confidence of the models and also feature relevances ranks from all XAI methods mentioned, in order to measure their stability in the face of perturbations. As a result, it was found that eXirt was able to identify the most reliable models among all those used. It was also found that current XAI methods are sensitive to perturbations, with the exception of one specific method.

How Reliable and Stable are Explanations of XAI Methods?

TL;DR

(discrimination),

(difficulty),

(guessing) and ability

to derive an Item Characteristic Curve (ICC) and a reliability interpretation via

. The study benchmarks six XAI methods (Dalex, Eli5, Lofo, Shap, Skater, eXirt) for stability of feature relevance rankings under perturbations and analyzes results with Spearman correlations and bump charts. Key findings show SHAP as the most stable across models, while eXirt can identify the most reliable explanations (e.g., unperturbed LGBM) and reveal perturbation sensitivities in non-IRT methods, motivating future work to derive a single confidence score from the IRT parameters and extend the framework to broader tasks.

Abstract

Paper Structure (14 sections, 1 equation, 6 figures, 2 tables)

This paper contains 14 sections, 1 equation, 6 figures, 2 tables.

Introduction
Related works
Background
Explainable Artificial Intelligence
Item Response Theory
Estimation of Item Parameters.
Estimation of ability.
Model confidence from IRT.
Methodology
Results and discussion
Are the most reliable models, according to the eXirt method, those being less affected by data perturbations?
Do existing XAI methods generate stable explanations even after data perturbations?
Conclusion and Future Works
Disclosure of Interests.

Figures (6)

Figure 1: The Item Characteristic Curve - ICC, the letters $a$, $b$ and $c$ represent the discrimination, difficulty and guessing properties, respectively.
Figure 2: Representation of the pipeline of the experiments carried out.
Figure 3: Statistical test summary Friedman Nemenyi.
Figure 4: Summary of the global ICC of the LGBM and MLP models.
Figure 5: Summary of the global ICC of the KNN and DT models.
...and 1 more figures

How Reliable and Stable are Explanations of XAI Methods?

TL;DR

Abstract

How Reliable and Stable are Explanations of XAI Methods?

Authors

TL;DR

Abstract

Table of Contents

Figures (6)