On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Razieh Rezaei; Alireza Dizaji; Ashkan Khakzar; Anees Kazi; Nassir Navab; Daniel Rueckert

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Razieh Rezaei, Alireza Dizaji, Ashkan Khakzar, Anees Kazi, Nassir Navab, Daniel Rueckert

TL;DR

This paper tackles the problem of evaluating graph neural network explanations, where existing attribution methods disagree and lack a unifying benchmark. It introduces a graph-domain retraining evaluation framework, adapting the ROAR idea into RoMie (retrain on the most important edges) and RoLie (retrain on the least important edges) to test how well explanations identify edges that truly drive predictions. The study systematically analyzes four state-of-the-art explainers (GradCAM, GNNExplainer, PGExplainer, SubgraphX) across five datasets and two architectures (GCN, GIN), revealing high variability by dataset and network and showing that GNNExplainer often behaves similarly to random guidance rather than capturing robust, generalizable edge importance. The authors argue that retraining evaluation should be used as a problem-specific toolset rather than a universal benchmark and provide practical guidelines for interpreting RoMie/RoLie results, including the treatment of isolated nodes and considerations for out-of-distribution effects. Overall, the work emphasizes careful, dataset- and network-aware evaluation of graph explanations to avoid overgeneralizing attribution quality and to better inform practitioners about which explanations to trust in a given setting.

Abstract

Neural networks are increasingly finding their way into the realm of graphs and modeling relationships between features. Concurrently graph neural network explanation approaches are being invented to uncover relationships between the nodes of the graphs. However, there is a disparity between the existing attribution methods, and it is unclear which attribution to trust. Therefore research has introduced evaluation experiments that assess them from different perspectives. In this work, we assess attribution methods from a perspective not previously explored in the graph domain: retraining. The core idea is to retrain the network on important (or not important) relationships as identified by the attributions and evaluate how networks can generalize based on these relationships. We reformulate the retraining framework to sidestep issues lurking in the previous formulation and propose guidelines for correct analysis. We run our analysis on four state-of-the-art GNN attribution methods and five synthetic and real-world graph classification datasets. The analysis reveals that attributions perform variably depending on the dataset and the network. Most importantly, we observe that the famous GNNExplainer performs similarly to an arbitrary designation of edge importance. The study concludes that the retraining evaluation cannot be used as a generalized benchmark and recommends it as a toolset to evaluate attributions on a specifically addressed network, dataset, and sparsity.

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

TL;DR

Abstract

Paper Structure (77 sections, 4 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 77 sections, 4 equations, 4 figures, 2 tables, 1 algorithm.

1. Introduction
2. Background and Setup
Notation:
Attribution:
2.1. Attributions under Evaluation
3. Attribution Evaluation
3.1. Perturbed vs. Unperturbed Test Set
3.2. Elimination of Isolated Nodes
3.3. Implementation Details
4. Discussions
4.1. RoMie and RoLie are Complementary
4.2. Explainers Depend on Datasets and Networks
4.3. Can we Recommend an Explainer in General?
Copyright
Formatting Requirements in Brief
...and 62 more sections

Figures (4)

Figure 1: RoMie and RoLie
Figure 5: Using the trim and clip commands produces fragile layers that can result in disasters (like this one from an actual paper) when the color space is corrected or the PDF combined with others for the final proceedings. Crop your figures properly in a graphics program -- not in LaTeX
Figure 6: Adjusting the bounding box instead of actually removing the unwanted data resulted multiple layers in this paper. It also needlessly increased the PDF size. In this case, the size of the unwanted layer doubled the paper's size, and produced the following surprising results in final production. Crop your figures properly in a graphics program. Don't just alter the bounding box.
Figure 7: Example listing quicksort.hs

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

TL;DR

Abstract

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)