Table of Contents
Fetching ...

Dataset Condensation Driven Machine Unlearning

Junaid Iqbal Khan

TL;DR

This work investigates privacy-preserving machine unlearning by integrating dataset condensation with a modular, three-part network training scheme. It introduces two condensation techniques—fast distribution matching and model-inversion–based condensation—and a modular unlearning framework with offline/online phases to minimize data exposure while preserving utility. The authors define novel metrics (unlearning and overfitting) and demonstrate defense against membership inference attacks, plus the ability to perform unlearning within condensed representations, enabling rapid training of new architectures. The approach achieves a favorable balance between privacy, utility, and efficiency across image-classification benchmarks and offers practical pathways for unlearning in condensed datasets and beyond.

Abstract

The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.

Dataset Condensation Driven Machine Unlearning

TL;DR

This work investigates privacy-preserving machine unlearning by integrating dataset condensation with a modular, three-part network training scheme. It introduces two condensation techniques—fast distribution matching and model-inversion–based condensation—and a modular unlearning framework with offline/online phases to minimize data exposure while preserving utility. The authors define novel metrics (unlearning and overfitting) and demonstrate defense against membership inference attacks, plus the ability to perform unlearning within condensed representations, enabling rapid training of new architectures. The approach achieves a favorable balance between privacy, utility, and efficiency across image-classification benchmarks and offers practical pathways for unlearning in condensed datasets and beyond.

Abstract

The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.
Paper Structure (44 sections, 48 equations, 11 figures, 2 tables, 4 algorithms)

This paper contains 44 sections, 48 equations, 11 figures, 2 tables, 4 algorithms.

Figures (11)

  • Figure 1: Main abstraction of Proposed Scheme
  • Figure 2: Evolution of UM, OM and MIA for first first epochs of modular unlearning and catastrophic forgetting over VGG16 on CIFAR10
  • Figure 3: Proposed model inversion attack based reconstruction of images per class of CIFAR-10 dataset from original model, model training with differentially-private Adam based optimization and proposed unlearning based regularization of model
  • Figure 4: Benchmarking of Unlearning in Condensation setting, where the goal is to unlearn the data from condensed knowledge which can be quickly used to train another model
  • Figure 5: Gradient of loss of CNN trained on CIFAR-10 over layers from shallow (left) to deep (right)
  • ...and 6 more figures