Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Jiahao Zhang; Yilong Wang; Suhang Wang

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Jiahao Zhang, Yilong Wang, Suhang Wang

Abstract

Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Abstract

Paper Structure (21 sections, 1 theorem, 11 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 11 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Problem Formulation
Threat Model
Graph Unlearning Corruption Attack
Proposed Method
Approximating Black-box Unlearning
Surrogate Objective via Pseudo Labels
Optimization
Scalability
Experiments
Experiment Settings
Unlearning Corruption Attack Performance
Ablation Studies
...and 6 more sections

Key Result

Proposition B.1

Let $\overline{\mathbf{A}}_{\mathrm{intra}} \in [0,1]^{m\times m}$ and $\overline{\mathbf{A}}_{\mathrm{inter}} \in [0,1]^{m\times n}$ be the relaxed injected adjacency matrices. Let $B$ be a positive integer budget. The Euclidean Projection onto the set $\mathcal{C}$, denoted by $\Pi_\mathcal{C}(\ov

Figures (2)

Figure 1: Attack setting of the proposed unlearning corruption attack on GNNs.
Figure 3: Attack Transferability across Victim Models. Performance of OptimAttack (generated using a GCN surrogate) when applied to different victim model architectures: GCN, SGC, and GAT. The blue bars represent original accuracy (stealthiness), and red bars represent unlearned accuracy (damage).

Theorems & Definitions (9)

Definition 4.1: Injected Graph
Definition 4.2: Graph Unlearning Corruption Attack
Remark 4.3: Difference to poisoning attacks
Remark 4.4: Difference to Backdoor Attacks
Definition 5.1: Relaxed problem for graph unlearning corruption
Remark A.1: Difference to Degradation Amplification Attacks
Remark A.2: Difference to Camouflaged Poisoning Attacks
Proposition B.1: Optimality of Greedy Projection
proof

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Abstract

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)