OpenGU: A Comprehensive Benchmark for Graph Unlearning
Bowen Fan, Yuming Ai, Xunkai Li, Zhilin Guo, Rong-Hua Li, Guoren Wang
TL;DR
OpenGU addresses the pressing need for fair, scalable benchmarking of graph unlearning by introducing a unified platform that integrates $16$ SOTA GU algorithms over $37$ multi-domain datasets, enabling a $3\times3$ cross-task evaluation across $13$ GNN backbones. The benchmark provides unified APIs and evaluates GU methods along three dimensions—effectiveness, efficiency, and robustness—under realistic unlearning requests and privacy-attacks such as membership inference and poisoning. Across node, edge, and graph tasks, OpenGU reveals strong performance for certain learning-based and IF-based methods, highlights challenges like memory/time bottlenecks on large graphs, and exposes robustness gaps under noise, sparsity, and varying unlearning intensities. These insights illuminate practical pathways for generalized GU frameworks, standardized forgetting metrics, and scalable, privacy-preserving graph learning in real systems.
Abstract
Graph Machine Learning is essential for understanding and analyzing relational data. However, privacy-sensitive applications demand the ability to efficiently remove sensitive information from trained graph neural networks (GNNs), avoiding the unnecessary time and space overhead caused by retraining models from scratch. To address this issue, Graph Unlearning (GU) has emerged as a critical solution, with the potential to support dynamic graph updates in data management systems and enable scalable unlearning in distributed data systems while ensuring privacy compliance. Unlike machine unlearning in computer vision or other fields, GU faces unique difficulties due to the non-Euclidean nature of graph data and the recursive message-passing mechanism of GNNs. Additionally, the diversity of downstream tasks and the complexity of unlearning requests further amplify these challenges. Despite the proliferation of diverse GU strategies, the absence of a benchmark providing fair comparisons for GU, and the limited flexibility in combining downstream tasks and unlearning requests, have yielded inconsistencies in evaluations, hindering the development of this domain. To fill this gap, we present OpenGU, the first GU benchmark, where 16 SOTA GU algorithms and 37 multi-domain datasets are integrated, enabling various downstream tasks with 13 GNN backbones when responding to flexible unlearning requests. Based on this unified benchmark framework, we are able to provide a comprehensive and fair evaluation for GU. Through extensive experimentation, we have drawn $8$ crucial conclusions about existing GU methods, while also gaining valuable insights into their limitations, shedding light on potential avenues for future research.
