E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks
Dingqiang Yuan, Xiaohua Xu, Lei Yu, Tongchang Han, Rongchang Li, Meng Han
TL;DR
Backdoor attacks via subgraph insertion threaten GNNs in node classification. E-SAGE defends by leveraging explainability to identify and prune adversarial edges during prediction, using integrated gradients and neighbor sampling for efficiency. It supports multiple subgraph insertions and adversarial attacks and demonstrates strong ACC retention with reduced ASR across several datasets and models, with scalable runtime. This work offers a practical, explainability-driven defense for GNNs and motivates further study of explainability-tool interactions with model robustness.
Abstract
Graph Neural Networks (GNNs) have recently been widely adopted in multiple domains. Yet, they are notably vulnerable to adversarial and backdoor attacks. In particular, backdoor attacks based on subgraph insertion have been shown to be effective in graph classification tasks while being stealthy, successfully circumventing various existing defense methods. In this paper, we propose E-SAGE, a novel approach to defending GNN backdoor attacks based on explainability. We find that the malicious edges and benign edges have significant differences in the importance scores for explainability evaluation. Accordingly, E-SAGE adaptively applies an iterative edge pruning process on the graph based on the edge scores. Through extensive experiments, we demonstrate the effectiveness of E-SAGE against state-of-the-art graph backdoor attacks in different attack settings. In addition, we investigate the effectiveness of E-SAGE against adversarial attacks.
