Table of Contents
Fetching ...

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

Christopher Burger

TL;DR

The paper tackles the problem that standard similarity measures overstate the success of adversarial perturbations on text-based XAI explanations by ignoring semantic synonymy. It introduces synonymity weighting, formalized via a function $ ext{Syn}(a,b)$, to adjust similarity measures (including Jaccard, Kendall's Tau, Spearman's footrule, and Rank-biased Overlap) and thereby yield more faithful assessments of XAI stability. Empirical validation on two datasets using $ ext{GloVe-Twitter-25}$ embeddings (with sensitivity analyses for fastText and WordNet) shows that Jaccard and Spearman-based evaluations can dramatically decrease perceived attack success when synonymy is accounted for, while RBO remains relatively robust. The results provide a practical tool for trustworthy XAI evaluation and highlight directions for deeper integration of semantic weighting into adversarial processes and contextual embeddings.

Abstract

Adversarial attacks challenge the reliability of Explainable AI (XAI) by altering explanations while the model's output remains unchanged. The success of these attacks on text-based XAI is often judged using standard information retrieval metrics. We argue these measures are poorly suited in the evaluation of trustworthiness, as they treat all word perturbations equally while ignoring synonymity, which can misrepresent an attack's true impact. To address this, we apply synonymity weighting, a method that amends these measures by incorporating the semantic similarity of perturbed words. This produces more accurate vulnerability assessments and provides an important tool for assessing the robustness of AI systems. Our approach prevents the overestimation of attack success, leading to a more faithful understanding of an XAI system's true resilience against adversarial manipulation.

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

TL;DR

The paper tackles the problem that standard similarity measures overstate the success of adversarial perturbations on text-based XAI explanations by ignoring semantic synonymy. It introduces synonymity weighting, formalized via a function , to adjust similarity measures (including Jaccard, Kendall's Tau, Spearman's footrule, and Rank-biased Overlap) and thereby yield more faithful assessments of XAI stability. Empirical validation on two datasets using embeddings (with sensitivity analyses for fastText and WordNet) shows that Jaccard and Spearman-based evaluations can dramatically decrease perceived attack success when synonymy is accounted for, while RBO remains relatively robust. The results provide a practical tool for trustworthy XAI evaluation and highlight directions for deeper integration of semantic weighting into adversarial processes and contextual embeddings.

Abstract

Adversarial attacks challenge the reliability of Explainable AI (XAI) by altering explanations while the model's output remains unchanged. The success of these attacks on text-based XAI is often judged using standard information retrieval metrics. We argue these measures are poorly suited in the evaluation of trustworthiness, as they treat all word perturbations equally while ignoring synonymity, which can misrepresent an attack's true impact. To address this, we apply synonymity weighting, a method that amends these measures by incorporating the semantic similarity of perturbed words. This produces more accurate vulnerability assessments and provides an important tool for assessing the robustness of AI systems. Our approach prevents the overestimation of attack success, leading to a more faithful understanding of an XAI system's true resilience against adversarial manipulation.
Paper Structure (12 sections, 11 equations, 2 figures, 6 tables)

This paper contains 12 sections, 11 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Successful attack rates under threshold $\tau$ for standard and synonymity weighted explanations
  • Figure 2: Successful attack similarity levels before and after synonymity weighting