Table of Contents
Fetching ...

Graph Neural Network Explanations are Fragile

Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang

TL;DR

The paper interrogates the robustness of perturbation-based GNN explainers under adversarial graph perturbations, introducing a practical threat model with limited explainer knowledge and a small perturbation budget. It proposes two attack strategies—loss-based and deduction-based—that aim to maximize changes in explanatory subgraphs while preserving GNN accuracy and graph structure similarity. Across multiple synthetic and real-world datasets and three explainers, the results show explanations are fragile, with substantial changes achievable by perturbing only a few edges and with strong transferability to other explainers. The findings underscore the need for robust, possibly provably secure, GNN explainers before deploying them in safety- and security-critical domains.

Abstract

Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.

Graph Neural Network Explanations are Fragile

TL;DR

The paper interrogates the robustness of perturbation-based GNN explainers under adversarial graph perturbations, introducing a practical threat model with limited explainer knowledge and a small perturbation budget. It proposes two attack strategies—loss-based and deduction-based—that aim to maximize changes in explanatory subgraphs while preserving GNN accuracy and graph structure similarity. Across multiple synthetic and real-world datasets and three explainers, the results show explanations are fragile, with substantial changes achievable by perturbing only a few edges and with strong transferability to other explainers. The findings underscore the need for robust, possibly provably secure, GNN explainers before deploying them in safety- and security-critical domains.

Abstract

Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.
Paper Structure (20 sections, 35 equations, 7 figures, 10 tables, 2 algorithms)

This paper contains 20 sections, 35 equations, 7 figures, 10 tables, 2 algorithms.

Figures (7)

  • Figure 1: GNN explanation for (a) node classification and (b) graph classification---It identifies the subgraph that ensures the best prediction for the target node and target graph, respectively.
  • Figure 2: Overview of our attacks. Take node classification for instance: we find perturbations that maximize the difference between $E_{S}$ and $\tilde{E}_{S}$ while satisfying the attack constraints.
  • Figure 3: Visualization of the explanation results before and after our deduction-based attack against the PGExplainer. We do not show OGBN-P as it is too big/dense to be visualized.
  • Figure 4: Impact of $k$ on our attack in three real-world datasets.
  • Figure 5: (a) Impact of $N$ on our deduction-based attack; (b) Impact of $\gamma$ and $\beta$ on our attack performance.
  • ...and 2 more figures