Table of Contents
Fetching ...

Certified Graph Unlearning

Eli Chien, Chao Pan, Olgica Milenkovic

TL;DR

This work introduces the first certified graph unlearning framework for graph neural networks, addressing node-feature, edge, and node deletion requests. It extends the unlearning paradigm to graph-structured data by deriving Hessian-based update rules and gradient-residual bounds for simple graph convolutions (SGC) and generalized PageRank (GPR) extensions. Theoretical guarantees quantify how unlearning affects the gradient on the retrained dataset, with bounds depending on node degree, propagation depth, and graph topology, and they are complemented by data-dependent refinements. Empirically, the method achieves significant speedups (e.g., around 4x) over full retraining with minimal loss in accuracy on standard benchmarks, while outperforming graph-unlearning baselines that do not exploit graph structure. The results establish a practical, theoretically grounded pathway for privacy-preserving updates in graph-based learning systems.

Abstract

Graph-structured data is ubiquitous in practice and often processed using graph neural networks (GNNs). With the adoption of recent laws ensuring the ``right to be forgotten'', the problem of graph data removal has become of significant importance. To address the problem, we introduce the first known framework for \emph{certified graph unlearning} of GNNs. In contrast to standard machine unlearning, new analytical and heuristic unlearning challenges arise when dealing with complex graph data. First, three different types of unlearning requests need to be considered, including node feature, edge and node unlearning. Second, to establish provable performance guarantees, one needs to address challenges associated with feature mixing during propagation. The underlying analysis is illustrated on the example of simple graph convolutions (SGC) and their generalized PageRank (GPR) extensions, thereby laying the theoretical foundation for certified unlearning of GNNs. Our empirical studies on six benchmark datasets demonstrate excellent performance-complexity trade-offs when compared to complete retraining methods and approaches that do not leverage graph information. For example, when unlearning $20\%$ of the nodes on the Cora dataset, our approach suffers only a $0.1\%$ loss in test accuracy while offering a $4$-fold speed-up compared to complete retraining. Our scheme also outperforms unlearning methods that do not leverage graph information with a $12\%$ increase in test accuracy for a comparable time complexity.

Certified Graph Unlearning

TL;DR

This work introduces the first certified graph unlearning framework for graph neural networks, addressing node-feature, edge, and node deletion requests. It extends the unlearning paradigm to graph-structured data by deriving Hessian-based update rules and gradient-residual bounds for simple graph convolutions (SGC) and generalized PageRank (GPR) extensions. Theoretical guarantees quantify how unlearning affects the gradient on the retrained dataset, with bounds depending on node degree, propagation depth, and graph topology, and they are complemented by data-dependent refinements. Empirically, the method achieves significant speedups (e.g., around 4x) over full retraining with minimal loss in accuracy on standard benchmarks, while outperforming graph-unlearning baselines that do not exploit graph structure. The results establish a practical, theoretically grounded pathway for privacy-preserving updates in graph-based learning systems.

Abstract

Graph-structured data is ubiquitous in practice and often processed using graph neural networks (GNNs). With the adoption of recent laws ensuring the ``right to be forgotten'', the problem of graph data removal has become of significant importance. To address the problem, we introduce the first known framework for \emph{certified graph unlearning} of GNNs. In contrast to standard machine unlearning, new analytical and heuristic unlearning challenges arise when dealing with complex graph data. First, three different types of unlearning requests need to be considered, including node feature, edge and node unlearning. Second, to establish provable performance guarantees, one needs to address challenges associated with feature mixing during propagation. The underlying analysis is illustrated on the example of simple graph convolutions (SGC) and their generalized PageRank (GPR) extensions, thereby laying the theoretical foundation for certified unlearning of GNNs. Our empirical studies on six benchmark datasets demonstrate excellent performance-complexity trade-offs when compared to complete retraining methods and approaches that do not leverage graph information. For example, when unlearning of the nodes on the Cora dataset, our approach suffers only a loss in test accuracy while offering a -fold speed-up compared to complete retraining. Our scheme also outperforms unlearning methods that do not leverage graph information with a increase in test accuracy for a comparable time complexity.
Paper Structure (28 sections, 26 theorems, 86 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 28 sections, 26 theorems, 86 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Theorem 4.1

Let $A$ be the learning algorithm that returns the unique optimum of the loss $L_{\mathbf{b}}(\mathbf{w},\mathcal{D})$. Suppose that $\|\nabla L(\mathbf{w}^-,\mathcal{D}^\prime)\|\leq \epsilon^\prime$ for some computable bound $\epsilon^\prime > 0$, independent of $\mathbf{b}$ and achieved by $M$. I

Figures (8)

  • Figure 1: Illustration of three different types of certified graph unlearning problems and a comparison with the case of unlearning without graph information guo2020certified. The colors of the nodes capture properties of node features, and the red frame indicates node embeddings affected by $1$-hop propagation. When no graph information is used, the node embeddings are uncorrelated. However, for the case of graph unlearning problems, removing one node or edge can affect the node embeddings of the entire graph for a large enough number of propagation steps.
  • Figure 2: Difference between machine unlearning (as defined in guo2020certified) and Differential Privacy (DP).
  • Figure 3: Comparison of proposed SGC node feature unlearning (left column), edge unlearning (middle column) and node unlearning (right column) with baseline methods. The shaded regions in the first row represent the standard deviation of test accuracy. In the second row, we show the accumulated unlearning time as a function of the number of unlearned points. The time needed for each unlearning procedure is given in Appendix \ref{['apx:more_exp_details']}.
  • Figure 4: (a), (b) Simulation verification of the result in Theorem \ref{['thm:NR_SGCcase']} and \ref{['thm:NFR_GPRcase']} pertaining to node degrees. (c), (d) Accumulated unlearning time as a function of the number of removed points. The unlearning time of Algorithm 2 from guo2020certified is often higher than that of our proposed certified graph unlearning algorithms, because the number of retraining steps needed may be larger. (e), (f) Performance of certified graph unlearning methods on different datasets. We set $\alpha=10,\lambda=10^{-4}$ for Computers and $\lambda=10^{-4}$ for ogbn-arxiv. The number of repeated trails is $3$ due to large amount of removed data. (g), (h) Tradeoff between privacy $\epsilon$ and performance. To match the number of retraining cycles, we set $\alpha\epsilon=0.1$.
  • Figure 5: Additional examination of the degree dependency result from Theorem \ref{['thm:NR_SGCcase']} (top) and Theorem \ref{['thm:NFR_GPRcase']} (bottom).
  • ...and 3 more figures

Theorems & Definitions (38)

  • Theorem 4.1: Theorem 3 from guo2020certified
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Theorem 4.6
  • Corollary 5.1: Application of Corollary 1 in guo2020certified
  • Theorem
  • Lemma 7.1
  • proof
  • Theorem
  • ...and 28 more