Table of Contents
Fetching ...

An Information Theoretic Approach to Machine Unlearning

Jack Foster, Kyle Fogarty, Stefan Schoepf, Zack Dugue, Cengiz Öztireli, Alexandra Brintrup

TL;DR

This work tackles the hard problem of zero-shot unlearning under regulatory constraints by proposing JiT, an information-theoretic unlearning method that localizes forgetting through gradient smoothing around forget points. JiT formalizes a loss that reweights the classifier’s local geometry via perturbations, effectively increasing uncertainty on forgotten data while preserving the broader decision boundary. The method is empirically shown to approximate retrained behavior across full-class, sub-class, and random forgetting on multiple benchmarks, with strong runtime advantages (O(N|D_f|)) and competitive MIA and accuracy metrics. The approach advances privacy-preserving ML by enabling fast, data-light unlearning, and the authors provide code to facilitate adoption and reproducibility.

Abstract

To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning. Code for the project can be found at https://github.com/jwf40/Information-Theoretic-Unlearning

An Information Theoretic Approach to Machine Unlearning

TL;DR

This work tackles the hard problem of zero-shot unlearning under regulatory constraints by proposing JiT, an information-theoretic unlearning method that localizes forgetting through gradient smoothing around forget points. JiT formalizes a loss that reweights the classifier’s local geometry via perturbations, effectively increasing uncertainty on forgotten data while preserving the broader decision boundary. The method is empirically shown to approximate retrained behavior across full-class, sub-class, and random forgetting on multiple benchmarks, with strong runtime advantages (O(N|D_f|)) and competitive MIA and accuracy metrics. The approach advances privacy-preserving ML by enabling fast, data-light unlearning, and the authors provide code to facilitate adoption and reproducibility.

Abstract

To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning. Code for the project can be found at https://github.com/jwf40/Information-Theoretic-Unlearning
Paper Structure (17 sections, 1 equation, 8 figures, 10 tables)

This paper contains 17 sections, 1 equation, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Visualization of the zero-shot unlearning scenario. Contrary to traditional unlearning there is no access to, or prior knowledge of, any data other than the forget set or the model at any point beyond its current state. These constraints make the problem considerably more challenging.
  • Figure 2: Demonstration of how the boundary of a classifier moves during unlearning. Retrained model is the gold standard. Removing a sample from a low-curvature region has almost no effect on the retrained model, whereas removing a sample from high curvature space has significant impact. In this low-dimensional setting, JiT successfully reconstructs the retrained boundary, whereas naively training to mislabel the forget sample completely destroys the trained model.
  • Figure 3: Change in sigmoid after unlearning with JiT. Red dots are unlearnt samples, black dots are the location on the new sigmoid post-JiT.
  • Figure 4: Entropy, $\mathcal{H}(x)$, of the $\mathcal{D}_f$ output distributions for full-class unlearning on CIFAR-10, showing JiT exhibits performance similar to the retrained model.
  • Figure 5: Median method runtime for ViT full-class forgetting on class rocket in seconds. For visual clarity we exclude GKT ($\sim 3000$ seconds).
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 4.1: Neighbourhood of a sample
  • Definition 4.2