AMUN: Adversarial Machine UNlearning
Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran
TL;DR
This work tackles privacy-driven machine unlearning by introducing AMUN, which uses fine-tuning on adversarial near-neighbors of forget samples to reduce confidence on the forget subset while preserving test accuracy, thereby mimicking retraining on the remaining data. The authors provide a theoretical bound on parameter updates to elucidate why small Lipschitz constants, tight adversarial proximity, and effective adversarial examples aid unlearning, and they validate AMUN against strong baselines across CIFAR-10 with both access/no-access to the remaining data. Empirical results show AMUN outperforms prior SOTA unlearning methods, remains effective under adversarially robust models, and scales to multiple sequential unlearning requests, with an ablation study confirming the critical role of the adversarial set. The work advances practical, efficient unlearning with privacy-preserving implications, and opens avenues for extending the approach to other domains and formal privacy guarantees.
Abstract
Machine unlearning, where users can request the deletion of a forget dataset, is becoming increasingly important because of numerous privacy regulations. Initial works on ``exact'' unlearning (e.g., retraining) incur large computational overheads. However, while computationally inexpensive, ``approximate'' methods have fallen short of reaching the effectiveness of exact unlearning: models produced fail to obtain comparable accuracy and prediction confidence on both the forget and test (i.e., unseen) dataset. Exploiting this observation, we propose a new unlearning method, Adversarial Machine UNlearning (AMUN), that outperforms prior state-of-the-art (SOTA) methods for image classification. AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples. Adversarial examples naturally belong to the distribution imposed by the model on the input space; fine-tuning the model on the adversarial examples closest to the corresponding forget samples (a) localizes the changes to the decision boundary of the model around each forget sample and (b) avoids drastic changes to the global behavior of the model, thereby preserving the model's accuracy on test samples. Using AMUN for unlearning a random $10\%$ of CIFAR-10 samples, we observe that even SOTA membership inference attacks cannot do better than random guessing.
