Graph Neural Network Explanations are Fragile
Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang
TL;DR
The paper interrogates the robustness of perturbation-based GNN explainers under adversarial graph perturbations, introducing a practical threat model with limited explainer knowledge and a small perturbation budget. It proposes two attack strategies—loss-based and deduction-based—that aim to maximize changes in explanatory subgraphs while preserving GNN accuracy and graph structure similarity. Across multiple synthetic and real-world datasets and three explainers, the results show explanations are fragile, with substantial changes achievable by perturbing only a few edges and with strong transferability to other explainers. The findings underscore the need for robust, possibly provably secure, GNN explainers before deploying them in safety- and security-critical domains.
Abstract
Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.
