Exploring SAIG Methods for an Objective Evaluation of XAI
Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Anna Arias-Duart
TL;DR
This paper surveys Synthetic Artificial Intelligence Ground truth (SAIG) methods for objective evaluation of XAI, arguing that a ground-truth–based approach is needed because explanations lack a universal truth. It introduces a taxonomy with five design dimensions—GT definition, image features, GT value, generability, and evaluation measures—and applies it to sixteen image-based SAIG proposals to reveal how choices about data, GT, and metrics co-occur and influence results. The analysis uncovers strong interdependencies among design decisions and a persistent lack of consensus on which XAI methods perform best across contexts, highlighting the fragility of comparisons across SAIG studies. The authors advocate for a unified framework that integrates existing SAIG approaches to enable more robust, comparable evaluations of XAI techniques and guide the development of trustworthy explanations.
Abstract
The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the XAI evaluation, which, unlike traditional AI assessment, lacks a universally correct ground truth for the explanation, making objective evaluation challenging. One promising direction to address this issue involves the use of what we term Synthetic Artificial Intelligence Ground truth (SAIG) methods, which generate artificial ground truths to enable the direct evaluation of XAI techniques. This paper presents the first review and analysis of SAIG methods. We introduce a novel taxonomy to classify these approaches, identifying seven key features that distinguish different SAIG methods. Our comparative study reveals a concerning lack of consensus on the most effective XAI evaluation techniques, underscoring the need for further research and standardization in this area.
