Graph Neural Network Explanations are Fragile

Jiate Li; Meng Pang; Yun Dong; Jinyuan Jia; Binghui Wang

Graph Neural Network Explanations are Fragile

Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang

TL;DR

The paper interrogates the robustness of perturbation-based GNN explainers under adversarial graph perturbations, introducing a practical threat model with limited explainer knowledge and a small perturbation budget. It proposes two attack strategies—loss-based and deduction-based—that aim to maximize changes in explanatory subgraphs while preserving GNN accuracy and graph structure similarity. Across multiple synthetic and real-world datasets and three explainers, the results show explanations are fragile, with substantial changes achievable by perturbing only a few edges and with strong transferability to other explainers. The findings underscore the need for robust, possibly provably secure, GNN explainers before deploying them in safety- and security-critical domains.

Abstract

Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.

Graph Neural Network Explanations are Fragile

TL;DR

Abstract

Paper Structure (20 sections, 35 equations, 7 figures, 10 tables, 2 algorithms)

This paper contains 20 sections, 35 equations, 7 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Background
Perturbation-based GNN Explainers
Power-Law Likelihood Ratio Test
Our Attack Design
Attack Formulation
Attack Methodology
Loss-based Attack
Deduction-based Attack
Experiment
Experiment Setup
Experimental Results
Discussion
Conclusion
...and 5 more sections

Figures (7)

Figure 1: GNN explanation for (a) node classification and (b) graph classification---It identifies the subgraph that ensures the best prediction for the target node and target graph, respectively.
Figure 2: Overview of our attacks. Take node classification for instance: we find perturbations that maximize the difference between $E_{S}$ and $\tilde{E}_{S}$ while satisfying the attack constraints.
Figure 3: Visualization of the explanation results before and after our deduction-based attack against the PGExplainer. We do not show OGBN-P as it is too big/dense to be visualized.
Figure 4: Impact of $k$ on our attack in three real-world datasets.
Figure 5: (a) Impact of $N$ on our deduction-based attack; (b) Impact of $\gamma$ and $\beta$ on our attack performance.
...and 2 more figures

Graph Neural Network Explanations are Fragile

TL;DR

Abstract

Graph Neural Network Explanations are Fragile

Authors

TL;DR

Abstract

Table of Contents

Figures (7)