Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

Yash Sinha; Murari Mandal; Mohan Kankanhalli

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

Yash Sinha, Murari Mandal, Mohan Kankanhalli

TL;DR

This work tackles graph unlearning by introducing D2DGN, a distillation-based, model-agnostic framework that separates retained and deleted knowledge with a preserver and a destroyer. It leverages two knowledge types—response-based soft targets and feature-based embeddings—and optimizes losses via KL divergence and MSE, combined as $Loss = \alpha Loss_r + (1-\alpha) Loss_f$. Across five real-world datasets and three GNN architectures, D2DGN achieves superior consistency (forget-set removal) and integrity (retained performance), improves membership privacy, and incurs zero partitioning overhead while delivering up to 5.3x faster unlearning than retraining. The results demonstrate strong, scalable unlearning capabilities with practical impact for regulatory compliance and dynamic data environments.

Abstract

Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

TL;DR

. Across five real-world datasets and three GNN architectures, D2DGN achieves superior consistency (forget-set removal) and integrity (retained performance), improves membership privacy, and incurs zero partitioning overhead while delivering up to 5.3x faster unlearning than retraining. The results demonstrate strong, scalable unlearning capabilities with practical impact for regulatory compliance and dynamic data environments.

Abstract

(AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by

, improves membership inference ratio by

, requires

fewer FLOPs per forward pass and up to

faster.

Paper Structure (21 sections, 15 equations, 4 figures, 13 tables)

This paper contains 21 sections, 15 equations, 4 figures, 13 tables.

Introduction
Related Work
Machine Unlearning in Graph Networks
Knowledge Distillation
Preliminaries
Proposed Method
Experimental Setup
Results
Comparison with SOTA on GNN Architectures
Comparison with SOTA on Different Datasets
Efficiency Analysis
Ablation Studies
Comparison across strategies.
Scalability
Unlearning a higher percentage
...and 6 more sections

Figures (4)

Figure 1: This figure illustrates the proposed method. The GNN Model is the original model trained on the complete data. The edge or node deletion requests are carried out as per the proposed D2DGN. We have shown Strategy 1 and Strategy 2 of our work here.
Figure 2: KL-Divergence between knowledge destroyer and knowledge preserver with respect to the increasing number of epochs. We observe that with increasing epochs, the knowledge destroyer is reaching closer to the prediction distribution of the knowledge preserver on the forget set $\mathcal{D}\xspace_f$.
Figure 3: Unlearning time comparison across datasets: D2DGN vs. SOTA and Gold models (↓).
Figure 6: Integrity, Consistency as graph size increases

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

TL;DR

Abstract

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)