Attack and Reset for Unlearning: Exploiting Adversarial Noise toward Machine Unlearning through Parameter Re-initialization
Yoonhwa Jung, Ikhyun Cho, Shun-Hsiang Hsu, Julia Hockenmaier
TL;DR
This work tackles the challenge of machine unlearning under a task-agnostic setting, focusing on forgetting specific instances containing personal data while preserving the model’s original task performance. It introduces Attack-and-Reset for Unlearning (ARU), a three-stage framework that uses carefully crafted adversarial noise to reveal features biased toward the forget set and then resets the corresponding convolutional filters via parameter re-initialization, followed by fine-tuning on retain data. ARU demonstrates state-of-the-art performance on MUFAC and MUCAC benchmarks, achieving superior NoMUS scores by efficiently combining adversarially guided parameter masking with targeted reinitialization, without sacrificing utility. The method offers a practical, scalable pathway toward robust unlearning in deep networks, highlighting the potential of adversarial perturbations as a tool for selective parameter reconfiguration and faster convergence during retraining. Overall, ARU provides a data-driven, computationally efficient alternative to retraining or class-specific unlearning, with broad implications for privacy-preserving ML deployments.
Abstract
With growing concerns surrounding privacy and regulatory compliance, the concept of machine unlearning has gained prominence, aiming to selectively forget or erase specific learned information from a trained model. In response to this critical need, we introduce a novel approach called Attack-and-Reset for Unlearning (ARU). This algorithm leverages meticulously crafted adversarial noise to generate a parameter mask, effectively resetting certain parameters and rendering them unlearnable. ARU outperforms current state-of-the-art results on two facial machine-unlearning benchmark datasets, MUFAC and MUCAC. In particular, we present the steps involved in attacking and masking that strategically filter and re-initialize network parameters biased towards the forget set. Our work represents a significant advancement in rendering data unexploitable to deep learning models through parameter re-initialization, achieved by harnessing adversarial noise to craft a mask.
