Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

Christopher Burger

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

Christopher Burger

TL;DR

The paper tackles the problem that standard similarity measures overstate the success of adversarial perturbations on text-based XAI explanations by ignoring semantic synonymy. It introduces synonymity weighting, formalized via a function $ ext{Syn}(a,b)$, to adjust similarity measures (including Jaccard, Kendall's Tau, Spearman's footrule, and Rank-biased Overlap) and thereby yield more faithful assessments of XAI stability. Empirical validation on two datasets using $ ext{GloVe-Twitter-25}$ embeddings (with sensitivity analyses for fastText and WordNet) shows that Jaccard and Spearman-based evaluations can dramatically decrease perceived attack success when synonymy is accounted for, while RBO remains relatively robust. The results provide a practical tool for trustworthy XAI evaluation and highlight directions for deeper integration of semantic weighting into adversarial processes and contextual embeddings.

Abstract

Adversarial attacks challenge the reliability of Explainable AI (XAI) by altering explanations while the model's output remains unchanged. The success of these attacks on text-based XAI is often judged using standard information retrieval metrics. We argue these measures are poorly suited in the evaluation of trustworthiness, as they treat all word perturbations equally while ignoring synonymity, which can misrepresent an attack's true impact. To address this, we apply synonymity weighting, a method that amends these measures by incorporating the semantic similarity of perturbed words. This produces more accurate vulnerability assessments and provides an important tool for assessing the robustness of AI systems. Our approach prevents the overestimation of attack success, leading to a more faithful understanding of an XAI system's true resilience against adversarial manipulation.

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

TL;DR

, to adjust similarity measures (including Jaccard, Kendall's Tau, Spearman's footrule, and Rank-biased Overlap) and thereby yield more faithful assessments of XAI stability. Empirical validation on two datasets using

embeddings (with sensitivity analyses for fastText and WordNet) shows that Jaccard and Spearman-based evaluations can dramatically decrease perceived attack success when synonymy is accounted for, while RBO remains relatively robust. The results provide a practical tool for trustworthy XAI evaluation and highlight directions for deeper integration of semantic weighting into adversarial processes and contextual embeddings.

Abstract

Paper Structure (12 sections, 11 equations, 2 figures, 6 tables)

This paper contains 12 sections, 11 equations, 2 figures, 6 tables.

Introduction
Background and Related Work
Adversarial Attack Process
Mappings Between Explanations
Constructing Weighted Similarity
An Example: The Jaccard Index
Empirical Validation
Similarity Measures
Experimental Data
Results & Discussion
Sensitivity Analysis
Limitations and Conclusion

Figures (2)

Figure 1: Successful attack rates under threshold $\tau$ for standard and synonymity weighted explanations
Figure 2: Successful attack similarity levels before and after synonymity weighting

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

TL;DR

Abstract

Quantifying True Robustness: Synonymity-Weighted Similarity for Trustworthy XAI Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)