Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks
Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang
TL;DR
This work addresses the fragility of explainable graph neural networks (XGNNs) under graph perturbations by introducing XGNNCert, a certifiably robust XGNN. It constructs multiple hybrid subgraphs by hashing edge indices and combining portions of the test graph with edges from the complete graph, then aggregates predictions and explanations via a majority-vote classifier and a majority-vote explainer to obtain deterministic robustness guarantees. The approach yields a bound on the perturbation budget $M_{\lambda}$ under which the predicted label remains unchanged and at least $\lambda$ groundtruth explanation edges are preserved, with empirical results showing competitive explanation/prediction accuracy on clean data and strong robustness against adversarial perturbations across multiple datasets and explainers. The work highlights a practical robustness framework for safety-critical applications where reliable explanations are essential, and points to future improvements in subgraph design and permutation-invariant guarantees.
Abstract
Explaining Graph Neural Network (XGNN) has gained growing attention to facilitate the trust of using GNNs, which is the mainstream method to learn graph data. Despite their growing attention, Existing XGNNs focus on improving the explanation performance, and its robustness under attacks is largely unexplored. We noticed that an adversary can slightly perturb the graph structure such that the explanation result of XGNNs is largely changed. Such vulnerability of XGNNs could cause serious issues particularly in safety/security-critical applications. In this paper, we take the first step to study the robustness of XGNN against graph perturbation attacks, and propose XGNNCert, the first provably robust XGNN. Particularly, our XGNNCert can provably ensure the explanation result for a graph under the worst-case graph perturbation attack is close to that without the attack, while not affecting the GNN prediction, when the number of perturbed edges is bounded. Evaluation results on multiple graph datasets and GNN explainers show the effectiveness of XGNNCert.
