Table of Contents
Fetching ...

Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier

Lu Yi, Zhewei Wei

TL;DR

ScaleGUN tackles the bottleneck of certifiable graph unlearning on billion-edge graphs by integrating an approximate, lazy local propagation framework with Generalized PageRank into the certified unlearning pipeline. It derives theoretical bounds showing that approximation-induced model error remains bounded and thus can be masked by noise to preserve $(oldsymbol{psilon},oldsymbol{elta})$ guarantees for node feature, edge, and node unlearning. The approach achieves remarkable practical speedups (e.g., 20 seconds total for 5,000 random edge removals on ogbn-papers100M, with 5 seconds for embeddings) while maintaining competitive accuracy on large graphs, outperforming retraining in propagation cost. Empirically, ScaleGUN demonstrates strong unlearning efficacy, data-dependent bounds, and robustness under various unlearning settings, and extends to PPR-based models and spectral GNNs as backbones. This work paves the way for scalable privacy-preserving graph learning in industrial-scale networks, balancing privacy, utility, and computational efficiency.

Abstract

Graph unlearning has emerged as a pivotal research area for ensuring privacy protection, given the widespread adoption of Graph Neural Networks (GNNs) in applications involving sensitive user data. Among existing studies, certified graph unlearning is distinguished by providing robust privacy guarantees. However, current certified graph unlearning methods are impractical for large-scale graphs because they necessitate the costly re-computation of graph propagation for each unlearning request. Although numerous scalable techniques have been developed to accelerate graph propagation for GNNs, their integration into certified graph unlearning remains uncertain as these scalable approaches introduce approximation errors into node embeddings. In contrast, certified graph unlearning demands bounded model error on exact node embeddings to maintain its certified guarantee. To address this challenge, we present ScaleGUN, the first approach to scale certified graph unlearning to billion-edge graphs. ScaleGUN integrates the approximate graph propagation technique into certified graph unlearning, offering certified guarantees for three unlearning scenarios: node feature, edge, and node unlearning. Extensive experiments on real-world datasets demonstrate the efficiency and unlearning efficacy of ScaleGUN. Remarkably, ScaleGUN accomplishes $(ε,δ)=(1,10^{-4})$ certified unlearning on the billion-edge graph ogbn-papers100M in 20 seconds for a 5,000 random edge removal request -- of which only 5 seconds are required for updating the node embeddings -- compared to 1.91 hours for retraining and 1.89 hours for re-propagation. Our code is available at https://github.com/luyi256/ScaleGUN.

Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier

TL;DR

ScaleGUN tackles the bottleneck of certifiable graph unlearning on billion-edge graphs by integrating an approximate, lazy local propagation framework with Generalized PageRank into the certified unlearning pipeline. It derives theoretical bounds showing that approximation-induced model error remains bounded and thus can be masked by noise to preserve guarantees for node feature, edge, and node unlearning. The approach achieves remarkable practical speedups (e.g., 20 seconds total for 5,000 random edge removals on ogbn-papers100M, with 5 seconds for embeddings) while maintaining competitive accuracy on large graphs, outperforming retraining in propagation cost. Empirically, ScaleGUN demonstrates strong unlearning efficacy, data-dependent bounds, and robustness under various unlearning settings, and extends to PPR-based models and spectral GNNs as backbones. This work paves the way for scalable privacy-preserving graph learning in industrial-scale networks, balancing privacy, utility, and computational efficiency.

Abstract

Graph unlearning has emerged as a pivotal research area for ensuring privacy protection, given the widespread adoption of Graph Neural Networks (GNNs) in applications involving sensitive user data. Among existing studies, certified graph unlearning is distinguished by providing robust privacy guarantees. However, current certified graph unlearning methods are impractical for large-scale graphs because they necessitate the costly re-computation of graph propagation for each unlearning request. Although numerous scalable techniques have been developed to accelerate graph propagation for GNNs, their integration into certified graph unlearning remains uncertain as these scalable approaches introduce approximation errors into node embeddings. In contrast, certified graph unlearning demands bounded model error on exact node embeddings to maintain its certified guarantee. To address this challenge, we present ScaleGUN, the first approach to scale certified graph unlearning to billion-edge graphs. ScaleGUN integrates the approximate graph propagation technique into certified graph unlearning, offering certified guarantees for three unlearning scenarios: node feature, edge, and node unlearning. Extensive experiments on real-world datasets demonstrate the efficiency and unlearning efficacy of ScaleGUN. Remarkably, ScaleGUN accomplishes certified unlearning on the billion-edge graph ogbn-papers100M in 20 seconds for a 5,000 random edge removal request -- of which only 5 seconds are required for updating the node embeddings -- compared to 1.91 hours for retraining and 1.89 hours for re-propagation. Our code is available at https://github.com/luyi256/ScaleGUN.
Paper Structure (37 sections, 25 theorems, 100 equations, 9 figures, 13 tables, 4 algorithms)

This paper contains 37 sections, 25 theorems, 100 equations, 9 figures, 13 tables, 4 algorithms.

Key Result

Theorem 2.1

Let $\mathcal{A}$ be the learning algorithm that returns the unique optimum of the loss $\mathcal{L}_{\mathbf{b}}(\mathbf{w} ; \mathcal{D})$. Suppose that a removal mechanism $\mathcal{M}$ returns $\mathbf{w}^-$ with $\left\|\nabla \mathcal{L}\left(\mathbf{w}^{-} ; \mathcal{D}^{\prime}\right)\right\

Figures (9)

  • Figure 2: Comparison of the bounds of the gradient residual: Worst-case bound (Theorem \ref{['thm:worst_feat']}, \ref{['thm:worst_edge']}, \ref{['thm:worst_node']} for node feature, edge, node unlearning, respectively), data-dependent bound (Theorem \ref{['thm:data_norm']}) and the true value on Cora dataset.
  • Figure 3: Comparison of unlearning efficacy for linear models: Model accuracy v.s. the number of removed adversarial edges.
  • Figure 4: Unlearning more than 50% of training nodes/features: Test accuracy on ogbn-arxiv and ogbn-products.
  • Figure 5: Varying $\alpha$ ($\alpha\epsilon=0.1$ fixed): Test accuracy v.s. the number of removed nodes on Cora.
  • Figure 6: Comparison of the bounds of the gradient residual: Worst-case bound, data-dependent bound and the true value on Cora dataset for ScaleGUN, CGU and CEU.
  • ...and 4 more figures

Theorems & Definitions (37)

  • Theorem 2.1: Theorem 3 in guo2019certified
  • Remark
  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.3: Average Cost
  • Theorem 4.2: Worst-case bound of node feature unlearning
  • Theorem 4.3: Worst-case bound of edge unlearning
  • Theorem 4.4: Worst-case bound of node unlearning
  • Theorem 4.5: Data-dependent bound
  • Theorem D.1: Initialization Cost, Update Cost, and Total Cost
  • ...and 27 more