Exploring SAIG Methods for an Objective Evaluation of XAI

Miquel Miró-Nicolau; Gabriel Moyà-Alcover; Anna Arias-Duart

Exploring SAIG Methods for an Objective Evaluation of XAI

Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Anna Arias-Duart

TL;DR

This paper surveys Synthetic Artificial Intelligence Ground truth (SAIG) methods for objective evaluation of XAI, arguing that a ground-truth–based approach is needed because explanations lack a universal truth. It introduces a taxonomy with five design dimensions—GT definition, image features, GT value, generability, and evaluation measures—and applies it to sixteen image-based SAIG proposals to reveal how choices about data, GT, and metrics co-occur and influence results. The analysis uncovers strong interdependencies among design decisions and a persistent lack of consensus on which XAI methods perform best across contexts, highlighting the fragility of comparisons across SAIG studies. The authors advocate for a unified framework that integrates existing SAIG approaches to enable more robust, comparable evaluations of XAI techniques and guide the development of trustworthy explanations.

Abstract

The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the XAI evaluation, which, unlike traditional AI assessment, lacks a universally correct ground truth for the explanation, making objective evaluation challenging. One promising direction to address this issue involves the use of what we term Synthetic Artificial Intelligence Ground truth (SAIG) methods, which generate artificial ground truths to enable the direct evaluation of XAI techniques. This paper presents the first review and analysis of SAIG methods. We introduce a novel taxonomy to classify these approaches, identifying seven key features that distinguish different SAIG methods. Our comparative study reveals a concerning lack of consensus on the most effective XAI evaluation techniques, underscoring the need for further research and standardization in this area.

Exploring SAIG Methods for an Objective Evaluation of XAI

TL;DR

Abstract

Paper Structure (39 sections, 10 figures, 2 tables)

This paper contains 39 sections, 10 figures, 2 tables.

Introduction
XAI evaluation
SAIG Methods
https://doi.org/10.24963/ijcai.2017/371
https://openreview.net/forum?id=H1ziPjC5Fm
https://arxiv.org/pdf/1907.09701
https://proceedings.neurips.cc/paper_files/paper/2021/file/0fe6a94848e5c68a54010b61b3e94b0e-Paper.pdf
https://doi.org/10.1016/j.artint.2020.103428
https://doi.org/10.1109/FUZZ-IEEE55066.2022.9882821
https://doi.org/10.1016/j.inffus.2021.11.008
https://doi.org/10.1175/AIES-D-22-0012.1
https://doi.org/10.1109/CVPR52688.2022.00998
https://proceedings.mlr.press/v162/kim22h.html
https://doi.org/10.1109/ICCV51070.2023.00368
https://doi.org/10.1016/j.artint.2024.104179
...and 24 more sections

Figures (10)

Figure 1: Distribution of visual elements by its type.
Figure 2: Distribution of visual elements by position. Five out of six articles that use predefined positions rely on mosaics.
Figure 3: Types of backgrounds in SAIG datasets.
Figure 4: Representative examples of the different combinations of object type, position, and background used in the analyzed datasets.
Figure 5: Distribution of SAIG methods by GT definition categories. The majority use the Identity definition, followed by A priori.
...and 5 more figures

Exploring SAIG Methods for an Objective Evaluation of XAI

TL;DR

Abstract

Exploring SAIG Methods for an Objective Evaluation of XAI

Authors

TL;DR

Abstract

Table of Contents

Figures (10)