Table of Contents
Fetching ...

Exploring SAIG Methods for an Objective Evaluation of XAI

Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Anna Arias-Duart

TL;DR

This paper surveys Synthetic Artificial Intelligence Ground truth (SAIG) methods for objective evaluation of XAI, arguing that a ground-truth–based approach is needed because explanations lack a universal truth. It introduces a taxonomy with five design dimensions—GT definition, image features, GT value, generability, and evaluation measures—and applies it to sixteen image-based SAIG proposals to reveal how choices about data, GT, and metrics co-occur and influence results. The analysis uncovers strong interdependencies among design decisions and a persistent lack of consensus on which XAI methods perform best across contexts, highlighting the fragility of comparisons across SAIG studies. The authors advocate for a unified framework that integrates existing SAIG approaches to enable more robust, comparable evaluations of XAI techniques and guide the development of trustworthy explanations.

Abstract

The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the XAI evaluation, which, unlike traditional AI assessment, lacks a universally correct ground truth for the explanation, making objective evaluation challenging. One promising direction to address this issue involves the use of what we term Synthetic Artificial Intelligence Ground truth (SAIG) methods, which generate artificial ground truths to enable the direct evaluation of XAI techniques. This paper presents the first review and analysis of SAIG methods. We introduce a novel taxonomy to classify these approaches, identifying seven key features that distinguish different SAIG methods. Our comparative study reveals a concerning lack of consensus on the most effective XAI evaluation techniques, underscoring the need for further research and standardization in this area.

Exploring SAIG Methods for an Objective Evaluation of XAI

TL;DR

This paper surveys Synthetic Artificial Intelligence Ground truth (SAIG) methods for objective evaluation of XAI, arguing that a ground-truth–based approach is needed because explanations lack a universal truth. It introduces a taxonomy with five design dimensions—GT definition, image features, GT value, generability, and evaluation measures—and applies it to sixteen image-based SAIG proposals to reveal how choices about data, GT, and metrics co-occur and influence results. The analysis uncovers strong interdependencies among design decisions and a persistent lack of consensus on which XAI methods perform best across contexts, highlighting the fragility of comparisons across SAIG studies. The authors advocate for a unified framework that integrates existing SAIG approaches to enable more robust, comparable evaluations of XAI techniques and guide the development of trustworthy explanations.

Abstract

The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the XAI evaluation, which, unlike traditional AI assessment, lacks a universally correct ground truth for the explanation, making objective evaluation challenging. One promising direction to address this issue involves the use of what we term Synthetic Artificial Intelligence Ground truth (SAIG) methods, which generate artificial ground truths to enable the direct evaluation of XAI techniques. This paper presents the first review and analysis of SAIG methods. We introduce a novel taxonomy to classify these approaches, identifying seven key features that distinguish different SAIG methods. Our comparative study reveals a concerning lack of consensus on the most effective XAI evaluation techniques, underscoring the need for further research and standardization in this area.
Paper Structure (39 sections, 10 figures, 2 tables)

This paper contains 39 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Distribution of visual elements by its type.
  • Figure 2: Distribution of visual elements by position. Five out of six articles that use predefined positions rely on mosaics.
  • Figure 3: Types of backgrounds in SAIG datasets.
  • Figure 4: Representative examples of the different combinations of object type, position, and background used in the analyzed datasets.
  • Figure 5: Distribution of SAIG methods by GT definition categories. The majority use the Identity definition, followed by A priori.
  • ...and 5 more figures