Table of Contents
Fetching ...

IDEA: A Flexible Framework of Certified Unlearning for Graph Neural Networks

Yushun Dong, Binchi Zhang, Zhenyu Lei, Na Zou, Jundong Li

TL;DR

IDEA introduces a flexible framework for certified unlearning in Graph Neural Networks, enabling removal of diverse privacy-sensitive information stored in GNN parameters without re-training from scratch. It achieves this by modeling an intermediate optimization state between the original and unlearned objectives, deriving a Hessian-based first-order approximation to update parameters, and providing $(\varepsilon-\delta)$-certification via Gaussian noise on the approximation. The approach supports four unlearning request types (node, edge, full attribute, partial attribute) in a graph-aware manner, with formal guarantees that generalize across GNN architectures and loss objectives. Extensive experiments on real-world datasets show IDEA yields tighter bounds than prior work, faster unlearning, and competitive utility while effectively mitigating privacy leakage under modern attacks. The work promises practical, scalable privacy-preserving graph learning applicable to diverse domains beyond node classification, with avenues for distributed and graph-level unlearning.

Abstract

Graph Neural Networks (GNNs) have been increasingly deployed in a plethora of applications. However, the graph data used for training may contain sensitive personal information of the involved individuals. Once trained, GNNs typically encode such information in their learnable parameters. As a consequence, privacy leakage may happen when the trained GNNs are deployed and exposed to potential attackers. Facing such a threat, machine unlearning for GNNs has become an emerging technique that aims to remove certain personal information from a trained GNN. Among these techniques, certified unlearning stands out, as it provides a solid theoretical guarantee of the information removal effectiveness. Nevertheless, most of the existing certified unlearning methods for GNNs are only designed to handle node and edge unlearning requests. Meanwhile, these approaches are usually tailored for either a specific design of GNN or a specially designed training objective. These disadvantages significantly jeopardize their flexibility. In this paper, we propose a principled framework named IDEA to achieve flexible and certified unlearning for GNNs. Specifically, we first instantiate four types of unlearning requests on graphs, and then we propose an approximation approach to flexibly handle these unlearning requests over diverse GNNs. We further provide theoretical guarantee of the effectiveness for the proposed approach as a certification. Different from existing alternatives, IDEA is not designed for any specific GNNs or optimization objectives to perform certified unlearning, and thus can be easily generalized. Extensive experiments on real-world datasets demonstrate the superiority of IDEA in multiple key perspectives.

IDEA: A Flexible Framework of Certified Unlearning for Graph Neural Networks

TL;DR

IDEA introduces a flexible framework for certified unlearning in Graph Neural Networks, enabling removal of diverse privacy-sensitive information stored in GNN parameters without re-training from scratch. It achieves this by modeling an intermediate optimization state between the original and unlearned objectives, deriving a Hessian-based first-order approximation to update parameters, and providing -certification via Gaussian noise on the approximation. The approach supports four unlearning request types (node, edge, full attribute, partial attribute) in a graph-aware manner, with formal guarantees that generalize across GNN architectures and loss objectives. Extensive experiments on real-world datasets show IDEA yields tighter bounds than prior work, faster unlearning, and competitive utility while effectively mitigating privacy leakage under modern attacks. The work promises practical, scalable privacy-preserving graph learning applicable to diverse domains beyond node classification, with avenues for distributed and graph-level unlearning.

Abstract

Graph Neural Networks (GNNs) have been increasingly deployed in a plethora of applications. However, the graph data used for training may contain sensitive personal information of the involved individuals. Once trained, GNNs typically encode such information in their learnable parameters. As a consequence, privacy leakage may happen when the trained GNNs are deployed and exposed to potential attackers. Facing such a threat, machine unlearning for GNNs has become an emerging technique that aims to remove certain personal information from a trained GNN. Among these techniques, certified unlearning stands out, as it provides a solid theoretical guarantee of the information removal effectiveness. Nevertheless, most of the existing certified unlearning methods for GNNs are only designed to handle node and edge unlearning requests. Meanwhile, these approaches are usually tailored for either a specific design of GNN or a specially designed training objective. These disadvantages significantly jeopardize their flexibility. In this paper, we propose a principled framework named IDEA to achieve flexible and certified unlearning for GNNs. Specifically, we first instantiate four types of unlearning requests on graphs, and then we propose an approximation approach to flexibly handle these unlearning requests over diverse GNNs. We further provide theoretical guarantee of the effectiveness for the proposed approach as a certification. Different from existing alternatives, IDEA is not designed for any specific GNNs or optimization objectives to perform certified unlearning, and thus can be easily generalized. Extensive experiments on real-world datasets demonstrate the superiority of IDEA in multiple key perspectives.
Paper Structure (31 sections, 14 theorems, 34 equations, 5 figures, 8 tables)

This paper contains 31 sections, 14 theorems, 34 equations, 5 figures, 8 tables.

Key Result

Proposition 1

Localized Equivalence of Training Nodes. Given $\Delta \mathcal{G} = \{\Delta \mathcal{V}, \Delta \mathcal{E}, \Delta \mathcal{X}\}$ to be unlearned and an objective $\mathscr{L}$ computed over $f_{\bm{\theta}}$, $\mathscr{L}\left(\bm{\theta}, v_i, \mathcal{G}\right) = \mathscr{L}\left(\bm{\theta},

Figures (5)

  • Figure 1: An illustration of common unlearning requests.
  • Figure 2: Distances between $\bm{\theta}^{*}$, $\tilde{\bm{\theta}}^*$, and $\bar{\bm{\theta}}^*$. Here, $\bm{\theta}^{*}$ denotes the optimal parameter before unlearning; $\tilde{\bm{\theta}}^*$ is the ideal optimal parameter after unlearning, which is obtained via re-training; $\bar{\bm{\theta}}^*$ is an approximation of $\tilde{\bm{\theta}}^*$ give by Theorem \ref{['influence']}.
  • Figure 3: Bounds and actual value of the $\ell_2$ distance between $\tilde{\bm{\theta}}^*$ and $\bar{\bm{\theta}}^*$, i.e., $\|\tilde{\bm{\theta}}^*-\bar{\bm{\theta}}^*\|_2$, over Cora, CiteSeer and PubMed datasets. CEU Worst, CEU Data Dependent, IDEA, and Actual represent the worst bound based on CEU, the data-dependent bound based on CEU, the bound based on IDEA, and the actual value of $\|\tilde{\bm{\theta}}^*-\bar{\bm{\theta}}^*\|_2$ derived from re-training, respectively.
  • Figure 4: Efficiency comparison between IDEA and other baselines including retraining. Running time is measured with seconds and presented in log scale.
  • Figure 5: Bounds and actual value of the $\ell_2$ distance between $\tilde{\bm{\theta}}^*$ and $\bar{\bm{\theta}}^*$, i.e., $\|\tilde{\bm{\theta}}^*-\bar{\bm{\theta}}^*\|_2$, over Cora, CiteSeer and PubMed datasets. CEU Worst, CEU Data Dependent, IDEA, and Actual represent the worst bound based on CEU, the data-dependent bound based on CEU, the bound based on IDEA, and the actual value of $\|\tilde{\bm{\theta}}^*-\bar{\bm{\theta}}^*\|_2$ derived from re-training, respectively.

Theorems & Definitions (22)

  • Definition 1
  • Proposition 1
  • Lemma 1
  • Theorem 1
  • Proposition 2
  • Theorem 2
  • Proposition 3
  • Theorem 3
  • Proposition 1
  • proof
  • ...and 12 more