Are You Using Reliable Graph Prompts? Trojan Prompt Attacks on Graph Neural Networks
Minhua Lin, Zhiwei Zhang, Enyan Dai, Zongyu Wu, Yilong Wang, Xiang Zhang, Suhang Wang
TL;DR
This work addresses backdoor vulnerabilities in graph prompt learning (GPL) by introducing TGPA, a Trojan Graph Prompt Attack that poisons graph prompts rather than pretrained GNN encoders. TGPA uses a clean-prompt pretraining stage, a feature-aware trigger generator, and a bi-level optimization with a finetuning-resistant loss to preserve backdoor effectiveness even when downstream headers are fine-tuned. Across multiple datasets and GPL variants, TGPA achieves high attack success rates on trigger-attached samples while maintaining strong clean accuracy, and it transfers across datasets and header configurations. The findings underscore the need for defenses targeting trojan prompts in GPL and highlight practical risks in releasing and using shared graph prompts.
Abstract
Graph Prompt Learning (GPL) has been introduced as a promising approach that uses prompts to adapt pre-trained GNN models to specific downstream tasks without requiring fine-tuning of the entire model. Despite the advantages of GPL, little attention has been given to its vulnerability to backdoor attacks, where an adversary can manipulate the model's behavior by embedding hidden triggers. Existing graph backdoor attacks rely on modifying model parameters during training, but this approach is impractical in GPL as GNN encoder parameters are frozen after pre-training. Moreover, downstream users may fine-tune their own task models on clean datasets, further complicating the attack. In this paper, we propose TGPA, a backdoor attack framework designed specifically for GPL. TGPA injects backdoors into graph prompts without modifying pre-trained GNN encoders and ensures high attack success rates and clean accuracy. To address the challenge of model fine-tuning by users, we introduce a finetuning-resistant poisoning approach that maintains the effectiveness of the backdoor even after downstream model adjustments. Extensive experiments on multiple datasets under various settings demonstrate the effectiveness of TGPA in compromising GPL models with fixed GNN encoders.
