Dataset Condensation Driven Machine Unlearning

Junaid Iqbal Khan

Dataset Condensation Driven Machine Unlearning

Junaid Iqbal Khan

TL;DR

This work investigates privacy-preserving machine unlearning by integrating dataset condensation with a modular, three-part network training scheme. It introduces two condensation techniques—fast distribution matching and model-inversion–based condensation—and a modular unlearning framework with offline/online phases to minimize data exposure while preserving utility. The authors define novel metrics (unlearning and overfitting) and demonstrate defense against membership inference attacks, plus the ability to perform unlearning within condensed representations, enabling rapid training of new architectures. The approach achieves a favorable balance between privacy, utility, and efficiency across image-classification benchmarks and offers practical pathways for unlearning in condensed datasets and beyond.

Abstract

The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}.

Dataset Condensation Driven Machine Unlearning

TL;DR

Abstract

Paper Structure (44 sections, 48 equations, 11 figures, 2 tables, 4 algorithms)

This paper contains 44 sections, 48 equations, 11 figures, 2 tables, 4 algorithms.

Introduction
Related Works
Preliminaries and Notation
Methodology
Retain Dataset Reduction Framework
Offline Phase
Dataset Condensation via Fast Distribution Matching
Dataset Condensation via Model Inversion
Online Phase
Modular Training
Offline Phase
Online Phase
Instrumentation of Unlearning
Applications of Unlearning
Defense Against Membership Inference Attack
...and 29 more sections

Figures (11)

Figure 1: Main abstraction of Proposed Scheme
Figure 2: Evolution of UM, OM and MIA for first first epochs of modular unlearning and catastrophic forgetting over VGG16 on CIFAR10
Figure 3: Proposed model inversion attack based reconstruction of images per class of CIFAR-10 dataset from original model, model training with differentially-private Adam based optimization and proposed unlearning based regularization of model
Figure 4: Benchmarking of Unlearning in Condensation setting, where the goal is to unlearn the data from condensed knowledge which can be quickly used to train another model
Figure 5: Gradient of loss of CNN trained on CIFAR-10 over layers from shallow (left) to deep (right)
...and 6 more figures

Dataset Condensation Driven Machine Unlearning

TL;DR

Abstract

Dataset Condensation Driven Machine Unlearning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)