Table of Contents
Fetching ...

Unlearning Inversion Attacks for Graph Neural Networks

Jiahao Zhang, Yilong Wang, Zhiwei Zhang, Xiaorui Liu, Suhang Wang

TL;DR

This paper reveals a privacy vulnerability in graph unlearning by showing that unlearned edges can be reconstructed from black-box GNN outputs. It introduces TrendAttack, which couples a flexible similarity-based edge predictor with confidence-trend features and an adaptive threshold to distinguish unlearned and memorized edges from non-edges. TrendAttack is trained with a shadow victim model to simulate unlearning and guide attack training, achieving superior AUC against multiple baselines across four real-world datasets. The work underscores the need for stronger defenses in graph unlearning and suggests directions for defending against edge- and node-level inversion, including differential privacy and improved unlearning guarantees.

Abstract

Graph unlearning methods aim to efficiently remove the impact of sensitive data from trained GNNs without full retraining, assuming that deleted information cannot be recovered. In this work, we challenge this assumption by introducing the graph unlearning inversion attack: given only black-box access to an unlearned GNN and partial graph knowledge, can an adversary reconstruct the removed edges? We identify two key challenges: varying probability-similarity thresholds for unlearned versus retained edges, and the difficulty of locating unlearned edge endpoints, and address them with TrendAttack. First, we derive and exploit the confidence pitfall, a theoretical and empirical pattern showing that nodes adjacent to unlearned edges exhibit a large drop in model confidence. Second, we design an adaptive prediction mechanism that applies different similarity thresholds to unlearned and other membership edges. Our framework flexibly integrates existing membership inference techniques and extends them with trend features. Experiments on four real-world datasets demonstrate that TrendAttack significantly outperforms state-of-the-art GNN membership inference baselines, exposing a critical privacy vulnerability in current graph unlearning methods.

Unlearning Inversion Attacks for Graph Neural Networks

TL;DR

This paper reveals a privacy vulnerability in graph unlearning by showing that unlearned edges can be reconstructed from black-box GNN outputs. It introduces TrendAttack, which couples a flexible similarity-based edge predictor with confidence-trend features and an adaptive threshold to distinguish unlearned and memorized edges from non-edges. TrendAttack is trained with a shadow victim model to simulate unlearning and guide attack training, achieving superior AUC against multiple baselines across four real-world datasets. The work underscores the need for stronger defenses in graph unlearning and suggests directions for defending against edge- and node-level inversion, including differential privacy and improved unlearning guarantees.

Abstract

Graph unlearning methods aim to efficiently remove the impact of sensitive data from trained GNNs without full retraining, assuming that deleted information cannot be recovered. In this work, we challenge this assumption by introducing the graph unlearning inversion attack: given only black-box access to an unlearned GNN and partial graph knowledge, can an adversary reconstruct the removed edges? We identify two key challenges: varying probability-similarity thresholds for unlearned versus retained edges, and the difficulty of locating unlearned edge endpoints, and address them with TrendAttack. First, we derive and exploit the confidence pitfall, a theoretical and empirical pattern showing that nodes adjacent to unlearned edges exhibit a large drop in model confidence. Second, we design an adaptive prediction mechanism that applies different similarity thresholds to unlearned and other membership edges. Our framework flexibly integrates existing membership inference techniques and extends them with trend features. Experiments on four real-world datasets demonstrate that TrendAttack significantly outperforms state-of-the-art GNN membership inference baselines, exposing a critical privacy vulnerability in current graph unlearning methods.

Paper Structure

This paper contains 41 sections, 7 theorems, 23 equations, 9 figures, 5 tables, 2 algorithms.

Key Result

Theorem 5.2

Let $f(\mathbf{C}, \mathbf{X}; \mathbf{w}^{\star}) := \mathbf{C} \mathbf{X} \mathbf{w}^\star$ be a linear GCN with propagation matrix $~\mathbf{C}\in\mathbb{R}^{n\times n}$ and parameters $\mathbf{w}^{\star}$ obtained by least‐squares on labels $\mathbf{y}\in\mathbb{R}^d$. The influence of an undire where $\mathbf{Z} := \mathbf{C}\mathbf{X}$ and $\mathbf{z}_l$ is the $l$‐th row of $~\mathbf{Z}$, $

Figures (9)

  • Figure 1: Illustration of the unlearning inversion attack. Considering an online social network $\mathcal{G}_{\mathrm{orig}}$, where a user requests the deletion of sensitive friendship information, resulting in a cleaned graph $\mathcal{G}_{\mathrm{un}}$ and updated model parameters $\boldsymbol{\theta}_{\mathrm{un}}$. The GNN model may be shared with third-parties via black-box APIs. If an attacker, leveraging the model API and auxiliary information about $\mathcal{G}_{\mathrm{un}}$, can reconstruct the removed knowledge $\Delta \mathcal{G}$ through an unlearning inversion attack, sensitive relationships may be exposed, severely compromising user privacy.
  • Figure 2: Illustration of the proposed TrendAttack.
  • Figure 3: Ablation study on the impact of victim models.
  • Figure 4: Ablation study on the impact of trend feature order. Overall attack AUC as a function of the trend feature order (0–3) for three unlearn methods across three datasets.
  • Figure 5: Transferability of attack models across different GNN architectures on Cora, Citeseer, and Pubmed. Each heatmap shows attack AUC when the shadow model (rows) and attack model (columns) differ.
  • ...and 4 more figures

Theorems & Definitions (18)

  • Definition 4.1: Query Set
  • Definition 4.2: Partial Knowledge of Query Set
  • Definition 4.3: Graph Unlearning Inversion Problem
  • Remark 4.4: Strictness of Attack Setting
  • Remark 4.5: Difference to GNN MIAs
  • Claim 5.1: Probability Similarity Gap
  • Theorem 5.2: Single‐Edge to Single‐Output Influence, Informal
  • Claim 5.3: Confidence Pitfall
  • Definition D.1: Linear GCN
  • Remark D.2: Universality of Linear GCN
  • ...and 8 more