Table of Contents
Fetching ...

A comprehensive study on fidelity metrics for XAI

Miquel Miró-Nicolau, Antoni Jaume-i-Capó, Gabriel Moyà-Alcover

TL;DR

Problem: Fidelity metrics for XAI lack ground truth and show inconsistent results across methods. Approach: The authors introduce a ground-truth verification framework using a transparent model (decision tree) as an objective benchmark and evaluate four fidelity metrics on two synthetic 52k-image datasets (AIXI-Shape and TXUXIv3). Findings: None of the metrics reliably matched the true fidelity, and performance degraded with higher OOD content. Impact: The study advocates developing new fidelity metrics and adopting the proposed benchmark to enable reliable evaluation in XAI research.

Abstract

The use of eXplainable Artificial Intelligence (XAI) systems has introduced a set of challenges that need resolution. Herein, we focus on how to correctly select an XAI method, an open questions within the field. The inherent difficulty of this task is due to the lack of a ground truth. Several authors have proposed metrics to approximate the fidelity of different XAI methods. These metrics lack verification and have concerning disagreements. In this study, we proposed a novel methodology to verify fidelity metrics, using a well-known transparent model, namely a decision tree. This model allowed us to obtain explanations with perfect fidelity. Our proposal constitutes the first objective benchmark for these metrics, facilitating a comparison of existing proposals, and surpassing existing methods. We applied our benchmark to assess the existing fidelity metrics in two different experiments, each using public datasets comprising 52,000 images. The images from these datasets had a size a 128 by 128 pixels and were synthetic data that simplified the training process. All metric values, indicated a lack of fidelity, with the best one showing a 30 \% deviation from the expected values for perfect explanation. Our experimentation led us to conclude that the current fidelity metrics are not reliable enough to be used in real scenarios. From this finding, we deemed it necessary to development new metrics, to avoid the detected problems, and we recommend the usage of our proposal as a benchmark within the scientific community to address these limitations.

A comprehensive study on fidelity metrics for XAI

TL;DR

Problem: Fidelity metrics for XAI lack ground truth and show inconsistent results across methods. Approach: The authors introduce a ground-truth verification framework using a transparent model (decision tree) as an objective benchmark and evaluate four fidelity metrics on two synthetic 52k-image datasets (AIXI-Shape and TXUXIv3). Findings: None of the metrics reliably matched the true fidelity, and performance degraded with higher OOD content. Impact: The study advocates developing new fidelity metrics and adopting the proposed benchmark to enable reliable evaluation in XAI research.

Abstract

The use of eXplainable Artificial Intelligence (XAI) systems has introduced a set of challenges that need resolution. Herein, we focus on how to correctly select an XAI method, an open questions within the field. The inherent difficulty of this task is due to the lack of a ground truth. Several authors have proposed metrics to approximate the fidelity of different XAI methods. These metrics lack verification and have concerning disagreements. In this study, we proposed a novel methodology to verify fidelity metrics, using a well-known transparent model, namely a decision tree. This model allowed us to obtain explanations with perfect fidelity. Our proposal constitutes the first objective benchmark for these metrics, facilitating a comparison of existing proposals, and surpassing existing methods. We applied our benchmark to assess the existing fidelity metrics in two different experiments, each using public datasets comprising 52,000 images. The images from these datasets had a size a 128 by 128 pixels and were synthetic data that simplified the training process. All metric values, indicated a lack of fidelity, with the best one showing a 30 \% deviation from the expected values for perfect explanation. Our experimentation led us to conclude that the current fidelity metrics are not reliable enough to be used in real scenarios. From this finding, we deemed it necessary to development new metrics, to avoid the detected problems, and we recommend the usage of our proposal as a benchmark within the scientific community to address these limitations.
Paper Structure (15 sections, 3 equations, 4 figures, 4 tables)

This paper contains 15 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Flows of different configurations: AI model, AI with an XAI method, and AI with an XAI method and a fidelity metric. Inside the dash box, the element that must be trusted is shown.
  • Figure 2: Sample of images from the AIXI-Shape miro2023novel dataset.
  • Figure 3: Sample of images from the $\mathit{TXUXIv3}$miro2023txuxi dataset.
  • Figure 4: Examples of images from AIXI-Shape dataset miro2023novel, $\mathit{TXUXIv3}$miro2023txuxi dataset and its respective explanations from decision trees.