Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

Prithwijit Chowdhury; Mohit Prabhushankar; Ghassan AlRegib; Mohamed Deriche

Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

Prithwijit Chowdhury, Mohit Prabhushankar, Ghassan AlRegib, Mohamed Deriche

TL;DR

This work interrogates the reliability of objective causal metrics used to evaluate visual explanations in Explainable AI. It introduces SHAPE (Shifted Adversaries using Pixel Elimination), an adversarial yet model-faithful explanation grounded in the notions of necessity and sufficiency, and formalizes a region-wise necessity score $N_{I,f}(\lambda) = \mathbb{E}_M[f(I) \sim f(I \odot M) \mid M(\lambda)=0]$. Through Monte Carlo estimation, SHAPE generates explanations by masking pixels and measuring the resulting change in predictions, contrasting against GradCAM, GradCAM++, and RISE on CNNs trained on ImageNet. The results show SHAPE achieving superior insertion/deletion metric performance, thereby exposing potential flaws in purely objective evaluation frameworks and suggesting that human-in-the-loop validation remains essential for trustworthy explanations. Overall, the paper argues for reevaluating how we assess XAI explanations and highlights the need for metrics that correlate with human interpretability and trust.

Abstract

Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to elucidate their decision-making processes. The rise in popularity of XAI has led to the advent of different strategies to produce explanations, all of which only occasionally agree. Thus several objective evaluation metrics have been devised to decide which of these modules give the best explanation for specific scenarios. The goal of the paper is twofold: (i) we employ the notions of necessity and sufficiency from causal literature to come up with a novel explanatory technique called SHifted Adversaries using Pixel Elimination(SHAPE) which satisfies all the theoretical and mathematical criteria of being a valid explanation, (ii) we show that SHAPE is, infact, an adversarial explanation that fools causal metrics that are employed to measure the robustness and reliability of popular importance based visual XAI methods. Our analysis shows that SHAPE outperforms popular explanatory techniques like GradCAM and GradCAM++ in these tests and is comparable to RISE, raising questions about the sanity of these metrics and the need for human involvement for an overall better evaluation.

Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

TL;DR

. Through Monte Carlo estimation, SHAPE generates explanations by masking pixels and measuring the resulting change in predictions, contrasting against GradCAM, GradCAM++, and RISE on CNNs trained on ImageNet. The results show SHAPE achieving superior insertion/deletion metric performance, thereby exposing potential flaws in purely objective evaluation frameworks and suggesting that human-in-the-loop validation remains essential for trustworthy explanations. Overall, the paper argues for reevaluating how we assess XAI explanations and highlights the need for metrics that correlate with human interpretability and trust.

Abstract

Paper Structure (13 sections, 6 equations, 4 figures, 1 table)

This paper contains 13 sections, 6 equations, 4 figures, 1 table.

Introduction
Background
Causal Definitions of Importance: Necessity and Sufficiency
Visual Explanations: Importance Maps
Causal Metrics for Evaluation of Explanations
Adversarial Explanations: A New Robustness Test
Methodology
SHifted Adversaries using Pixel Elimination (SHAPE)
Experiments and Observations
Experimental Setup for Adversarial Explanation Generation
Faithfulness Test of Causal Evaluation Metrics
Discussion
Conclusion

Figures (4)

Figure 1: Even though the adversarial explanation generated by SHAPE (b) is much less comprehensible as compared to the GradCAM explanation (a), it ourperforms the later in both the causal metric test (c) & (d) by a significant margin thus claiming to be the more robust and reliable among the two.
Figure 2: Overview of SHAPE to generate adversarial explanations: Input image $I$ is element-wise multiplied by the random masks $M_i$ and are fed into the model $f$ along with the original image to calculate the change in prediction scores.The importance map is a weighted sum of masks where the weights for each mask is its corresponding change in probability scores.
Figure 3: SHAPE Maps for image (a) for a RestNet101 model. (b) shows the importance map for class Bull mastiff (prediction accuracy = $38.42\%$), (c) shows the importance map for class Tiger cat (prediction accuracy = $9.41\%$) and (d) shows the map for class Tabby (prediction accuracy = $04.97\%$).
Figure 4: CAM for prediction "great white shark" using (a) GradCAM, (b) GradCAM++, (c) RISE and d) SHAPE(ours). SHAPE significantly outperforms all other methods in both (e) deletion (lower AUC is better) and (f) deletion (higher AUC is better) game

Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

TL;DR

Abstract

Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (4)