A Semantic and Clean-label Backdoor Attack against Graph Convolutional Networks
Jiazhu Dai, Haoyu Sun
TL;DR
This work reveals a semantic and clean-label backdoor vulnerability in Graph Convolutional Networks for graph classification. It introduces SCLBA, which selects a semantic trigger node via degree centrality, injects it into selected target-label graphs without changing labels, and trains a backdoored model that misclassifies triggered graphs to a chosen target while preserving normal performance on benign graphs. Empirical results on five real-world datasets show high attack success rates near 99% at poisoning rates under 3% with small trigger sizes, and strong transferability to other GNNs like GAT and GraphSAGE, highlighting a serious threat. The paper discusses defense limitations and suggests directions toward interpretability-based defenses to detect semantic backdoors in GNNs.
Abstract
Graph Convolutional Networks (GCNs) have shown excellent performance in graph-structured tasks such as node classification and graph classification. However, recent research has shown that GCNs are vulnerable to a new type of threat called the backdoor attack, where the adversary can inject a hidden backdoor into the GCNs so that the backdoored model performs well on benign samples, whereas its prediction will be maliciously changed to the attacker-specified target label if the hidden backdoor is activated by the attacker-defined trigger. Clean-label backdoor attack and semantic backdoor attack are two new backdoor attacks to Deep Neural Networks (DNNs), they are more imperceptible and have posed new and serious threats. The semantic and clean-label backdoor attack is not fully explored in GCNs. In this paper, we propose a semantic and clean-label backdoor attack against GCNs under the context of graph classification to reveal the existence of this security vulnerability in GCNs. Specifically, SCLBA conducts an importance analysis on graph samples to select one type of node as semantic trigger, which is then inserted into the graph samples to create poisoning samples without changing the labels of the poisoning samples to the attacker-specified target label. We evaluate SCLBA on multiple datasets and the results show that SCLBA can achieve attack success rates close to 99% with poisoning rates of less than 3%, and with almost no impact on the performance of model on benign samples.
