Table of Contents
Fetching ...

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Xu Zheng, Farhad Shirani, Zhuomin Chen, Chaohao Lin, Wei Cheng, Wenbo Guo, Dongsheng Luo

TL;DR

This work tackles the challenge of faithfully evaluating explainable AI methods by addressing two key problems: information leakage when using explainer outputs and the out-of-distribution (OOD) effects of perturbation-based evaluation. It introduces Fine-tuned Fidelity (F-Fidelity), a robust framework that combines explanation-agnostic fine-tuning with a controlled stochastic removal process to keep perturbed inputs in-distribution, enabling reliable ranking of explanations across modalities. The method defines FFid$^+$ and FFid$^-$ metrics and demonstrates superior correlation with ground-truth explanation quality compared to Fidelity, ROAR, and R-Fidelity across image, time-series, and NLP tasks; theory shows FFid$^+$ can reveal the true size of ground-truth explanations. Practically, F-Fidelity is most beneficial in domains with high perturbation sensitivity, provides guidelines for parameter choices, and offers a scalable evaluation path for comparing diverse explainers in real-world settings.

Abstract

Recent research has developed a number of eXplainable AI (XAI) techniques, such as gradient-based approaches, input perturbation-base methods, and black-box explanation methods. While these XAI techniques can extract meaningful insights from deep learning models, how to properly evaluate them remains an open problem. The most widely used approach is to perturb or even remove what the XAI method considers to be the most important features in an input and observe the changes in the output prediction. This approach, although straightforward, suffers the Out-of-Distribution (OOD) problem as the perturbed samples may no longer follow the original data distribution. A recent method RemOve And Retrain (ROAR) solves the OOD issue by retraining the model with perturbed samples guided by explanations. However, using the model retrained based on XAI methods to evaluate these explainers may cause information leakage and thus lead to unfair comparisons. We propose Fine-tuned Fidelity (F-Fidelity), a robust evaluation framework for XAI, which utilizes i) an explanation-agnostic fine-tuning strategy, thus mitigating the information leakage issue, and ii) a random masking operation that ensures that the removal step does not generate an OOD input. We also design controlled experiments with state-of-the-art (SOTA) explainers and their degraded version to verify the correctness of our framework. We conduct experiments on multiple data modalities, such as images, time series, and natural language. The results demonstrate that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of the explainers. Furthermore, we show both theoretically and empirically that, given a faithful explainer, F-Fidelity metric can be used to compute the sparsity of influential input components, i.e., to extract the true explanation size.

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

TL;DR

This work tackles the challenge of faithfully evaluating explainable AI methods by addressing two key problems: information leakage when using explainer outputs and the out-of-distribution (OOD) effects of perturbation-based evaluation. It introduces Fine-tuned Fidelity (F-Fidelity), a robust framework that combines explanation-agnostic fine-tuning with a controlled stochastic removal process to keep perturbed inputs in-distribution, enabling reliable ranking of explanations across modalities. The method defines FFid and FFid metrics and demonstrates superior correlation with ground-truth explanation quality compared to Fidelity, ROAR, and R-Fidelity across image, time-series, and NLP tasks; theory shows FFid can reveal the true size of ground-truth explanations. Practically, F-Fidelity is most beneficial in domains with high perturbation sensitivity, provides guidelines for parameter choices, and offers a scalable evaluation path for comparing diverse explainers in real-world settings.

Abstract

Recent research has developed a number of eXplainable AI (XAI) techniques, such as gradient-based approaches, input perturbation-base methods, and black-box explanation methods. While these XAI techniques can extract meaningful insights from deep learning models, how to properly evaluate them remains an open problem. The most widely used approach is to perturb or even remove what the XAI method considers to be the most important features in an input and observe the changes in the output prediction. This approach, although straightforward, suffers the Out-of-Distribution (OOD) problem as the perturbed samples may no longer follow the original data distribution. A recent method RemOve And Retrain (ROAR) solves the OOD issue by retraining the model with perturbed samples guided by explanations. However, using the model retrained based on XAI methods to evaluate these explainers may cause information leakage and thus lead to unfair comparisons. We propose Fine-tuned Fidelity (F-Fidelity), a robust evaluation framework for XAI, which utilizes i) an explanation-agnostic fine-tuning strategy, thus mitigating the information leakage issue, and ii) a random masking operation that ensures that the removal step does not generate an OOD input. We also design controlled experiments with state-of-the-art (SOTA) explainers and their degraded version to verify the correctness of our framework. We conduct experiments on multiple data modalities, such as images, time series, and natural language. The results demonstrate that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of the explainers. Furthermore, we show both theoretically and empirically that, given a faithful explainer, F-Fidelity metric can be used to compute the sparsity of influential input components, i.e., to extract the true explanation size.
Paper Structure (34 sections, 1 theorem, 10 equations, 5 figures, 26 tables, 1 algorithm)

This paper contains 34 sections, 1 theorem, 10 equations, 5 figures, 26 tables, 1 algorithm.

Key Result

Theorem 1

For the classification task described above, and a given pre-trained classifier $f(\cdot)$, consider a Shapley-value-based explainer $\psi(\cdot)$. For $\alpha_{orig}^+\in [0,1]$ and $\beta \in [0,\alpha^+]$, let Then, $e(s)$ is monotonically increasing for $s\in [0,c_1]$ and monotonically decreasing for $s\in [\max(\frac{\beta}{\alpha_{orig}^+}td, c_1),td]$.

Figures (5)

  • Figure 1: Effects of random masking fine-tuning on model behavior. Higher fine-tuning $\beta$ values improve robustness to perturbations but reduce agreement with the original model's predictions.
  • Figure 2: Case study of a sample in Tiny-Imagenet. We use GradCAM to obtain the explanation and get degraded explanations by adding random noise to the explanation. The noise level is a list of $[0.2,0.4,0.6,0.8,1.0]$. We visualize the Fidelity score and Correlation on different sparsity.
  • Figure 3: The ablation study between ground truth explanation size $\gamma$ and $FFid^+$ in F-Fidelity.
  • Figure 4: The relationship between ground truth explanation size $\gamma$ and $RFid^+$ in F-Fidelity. We compare different $\alpha^+$ and $\beta$ in the evaluation stage. We observe that the ground truth can be inferred from the $RFid^+$.
  • Figure 5: Ablation study on sampling number $N$, $\beta$, and $\alpha= \alpha^+=\alpha^-$. We report the Macro Spearman rank correlations.

Theorems & Definitions (1)

  • Theorem 1