Dataset Condensation Driven Machine Unlearning
Junaid Iqbal Khan
TL;DR
This work investigates privacy-preserving machine unlearning by integrating dataset condensation with a modular, three-part network training scheme. It introduces two condensation techniques—fast distribution matching and model-inversion–based condensation—and a modular unlearning framework with offline/online phases to minimize data exposure while preserving utility. The authors define novel metrics (unlearning and overfitting) and demonstrate defense against membership inference attacks, plus the ability to perform unlearning within condensed representations, enabling rapid training of new architectures. The approach achieves a favorable balance between privacy, utility, and efficiency across image-classification benchmarks and offers practical pathways for unlearning in condensed datasets and beyond.
Abstract
The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.
