Synthetic Forgetting without Access: A Few-shot Zero-glance Framework for Machine Unlearning
Qipeng Song, Nan Yang, Ziqi Xu, Yue Li, Wei Shao, Feng Xia
TL;DR
The paper addresses privacy-compliant unlearning under the Right to be Forgotten by proposing a few-shot zero-glance setting and the GFOES framework. It introduces Optimal Erasure Samples generated by a Generative Feedback Network to induce forgetting for target classes without access to forget data, paired with a two-phase fine-tuning strategy to aggressively forget and then restore utility. Empirical results on Fashion-MNIST, CIFAR-10, and CIFAR-100 show complete forgetting of forgotten classes and high retention of retained performance, outperforming baselines in both logit- and representation-based metrics while maintaining efficiency. This approach offers a practical path to privacy-preserving ML in data-constrained, real-world MLaaS scenarios, with publicly shareable code and clear ablation evidence supporting each design choice.
Abstract
Machine unlearning aims to eliminate the influence of specific data from trained models to ensure privacy compliance. However, most existing methods assume full access to the original training dataset, which is often impractical. We address a more realistic yet challenging setting: few-shot zero-glance, where only a small subset of the retained data is available and the forget set is entirely inaccessible. We introduce GFOES, a novel framework comprising a Generative Feedback Network (GFN) and a two-phase fine-tuning procedure. GFN synthesises Optimal Erasure Samples (OES), which induce high loss on target classes, enabling the model to forget class-specific knowledge without access to the original forget data, while preserving performance on retained classes. The two-phase fine-tuning procedure enables aggressive forgetting in the first phase, followed by utility restoration in the second. Experiments on three image classification datasets demonstrate that GFOES achieves effective forgetting at both logit and representation levels, while maintaining strong performance using only 5% of the original data. Our framework offers a practical and scalable solution for privacy-preserving machine learning under data-constrained conditions.
