Table of Contents
Fetching ...

From Flexibility to Manipulation: The Slippery Slope of XAI Evaluation

Kristoffer Wickstrøm, Marina Marie-Claire Höhne, Anna Hedström

TL;DR

This paper tackles the challenge of evaluating XAI explanations without ground-truth labels and demonstrates that evaluation outcomes are vulnerable to hyperparameter manipulation. It introduces intra- and inter-manipulation strategies to alter faithfulness assessments and shows substantial effect sizes across multiple datasets and explanation methods. To counter this, it proposes Mean Resilience Rank (MRR), a ranking-based robustness measure that aggregates performance across a feasible hyperparameter set, reducing susceptibility to manipulation. The work highlights important implications for method selection and comparability in XAI and calls for holistic, transparent evaluation pipelines and open benchmarking resources.

Abstract

The lack of ground truth explanation labels is a fundamental challenge for quantitative evaluation in explainable artificial intelligence (XAI). This challenge becomes especially problematic when evaluation methods have numerous hyperparameters that must be specified by the user, as there is no ground truth to determine an optimal hyperparameter selection. It is typically not feasible to do an exhaustive search of hyperparameters so researchers typically make a normative choice based on similar studies in the literature, which provides great flexibility for the user. In this work, we illustrate how this flexibility can be exploited to manipulate the evaluation outcome. We frame this manipulation as an adversarial attack on the evaluation where seemingly innocent changes in hyperparameter setting significantly influence the evaluation outcome. We demonstrate the effectiveness of our manipulation across several datasets with large changes in evaluation outcomes across several explanation methods and models. Lastly, we propose a mitigation strategy based on ranking across hyperparameters that aims to provide robustness towards such manipulation. This work highlights the difficulty of conducting reliable XAI evaluation and emphasizes the importance of a holistic and transparent approach to evaluation in XAI.

From Flexibility to Manipulation: The Slippery Slope of XAI Evaluation

TL;DR

This paper tackles the challenge of evaluating XAI explanations without ground-truth labels and demonstrates that evaluation outcomes are vulnerable to hyperparameter manipulation. It introduces intra- and inter-manipulation strategies to alter faithfulness assessments and shows substantial effect sizes across multiple datasets and explanation methods. To counter this, it proposes Mean Resilience Rank (MRR), a ranking-based robustness measure that aggregates performance across a feasible hyperparameter set, reducing susceptibility to manipulation. The work highlights important implications for method selection and comparability in XAI and calls for holistic, transparent evaluation pipelines and open benchmarking resources.

Abstract

The lack of ground truth explanation labels is a fundamental challenge for quantitative evaluation in explainable artificial intelligence (XAI). This challenge becomes especially problematic when evaluation methods have numerous hyperparameters that must be specified by the user, as there is no ground truth to determine an optimal hyperparameter selection. It is typically not feasible to do an exhaustive search of hyperparameters so researchers typically make a normative choice based on similar studies in the literature, which provides great flexibility for the user. In this work, we illustrate how this flexibility can be exploited to manipulate the evaluation outcome. We frame this manipulation as an adversarial attack on the evaluation where seemingly innocent changes in hyperparameter setting significantly influence the evaluation outcome. We demonstrate the effectiveness of our manipulation across several datasets with large changes in evaluation outcomes across several explanation methods and models. Lastly, we propose a mitigation strategy based on ranking across hyperparameters that aims to provide robustness towards such manipulation. This work highlights the difficulty of conducting reliable XAI evaluation and emphasizes the importance of a holistic and transparent approach to evaluation in XAI.

Paper Structure

This paper contains 29 sections, 7 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Example of possible faithfulness curves for digit classification. The leftmost curve illustrates how an "intuitive" faithfulness curve might look, while the remaining curves show that there is a lot of variation in how these curves can appear.
  • Figure 2: Box plot showing faithfulness scores across all hyperparameter configurations in the feasible set for each dataset. The plot illustrates that the average faithfulness score is similar between different XAI methods across datasets. However the high variance enables a target manipulation. Note that the scores have been normalized dataset-wise by the highest score to allow for comparison across datasets.

Theorems & Definitions (2)

  • definition thmcounterdefinition: Intra-Manipulation
  • definition thmcounterdefinition: Inter-Manipulation