GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning
Ying Song, Balaji Palanisamy
TL;DR
GraphToxin presents the first full graph reconstruction attack against graph unlearning, leveraging a three-module framework—gradient matching, curvature matching via a Fisher-information surrogate, and feature-smoothness regularization—to recover deleted nodes, their neighbors, and associated topology from gradient differences $\Delta\mathcal{L}(\mathcal{G}_d,\theta,\theta^*)$. It extends to multiple node removals and data-free black-box scenarios using zeroth-order gradient estimation and semantic calibration, supported by a comprehensive evaluation framework with feature-, global-, and performance-level metrics. Empirical results show GraphToxin outperforms baselines across diverse datasets, GNN backbones, and unlearning methods, and defenses like node-DP and gradient compression largely fail to mitigate the attack. The findings highlight severe privacy risks of current graph unlearning approaches and underscore the need for stronger, more holistic defenses and worst-case evaluations in graph-based privacy research.
Abstract
Graph unlearning has emerged as a promising solution for complying with "the right to be forgotten" regulations by enabling the removal of sensitive information upon request. However, this solution is not foolproof. The involvement of multiple parties creates new attack surfaces, and residual traces of deleted data can still remain in the unlearned graph neural networks. These vulnerabilities can be exploited by attackers to recover the supposedly erased samples, thereby undermining the inherent functionality of graph unlearning. In this work, we propose GraphToxin, the first graph reconstruction attack against graph unlearning. Specifically, we introduce a novel curvature matching module to provide a fine-grained guidance for full unlearned graph recovery. We demonstrate that GraphToxin can successfully subvert the regulatory guarantees expected from graph unlearning - it can recover not only a deleted individual's information and personal links but also sensitive content from their connections, thereby posing substantially more detrimental threats. Furthermore, we extend GraphToxin to multiple node removals under both white-box and black-box setting. We highlight the necessity of a worst-case analysis and propose a comprehensive evaluation framework to systematically assess the attack performance under both random and worst-case node removals. This provides a more robust and realistic measure of the vulnerability of graph unlearning methods to graph reconstruction attacks. Our extensive experiments demonstrate the effectiveness and flexibility of GraphToxin. Notably, we show that existing defense mechanisms are largely ineffective against this attack and, in some cases, can even amplify its performance. Given the severe privacy risks posed by GraphToxin, our work underscores the urgent need for the development of more effective and robust defense strategies against this attack.
