Table of Contents
Fetching ...

Machine Unlearning of Features and Labels

Alexander Warnecke, Lukas Pirch, Christian Wressnegger, Konrad Rieck

TL;DR

This work addresses privacy leaks in machine learning by moving beyond pointwise data removal to unlearning entire features and labels. It introduces a novel framework that uses influence functions to map perturbations in training data to closed-form model updates, enabling efficient and flexible unlearning. The authors establish certified unlearning guarantees for strongly convex losses and demonstrate strong empirical performance for non-convex models across three practical scenarios: removing sensitive features, eliminating unintended memorization in language models, and repairing label poisoning in vision. The approach offers significant speed-ups over retraining and competitive fidelity, with practical implications for satisfying data-removal requests and mitigating privacy risks in real-world systems. The work also provides a thoughtful discussion of limitations and diffusion into differential privacy, offering a concrete pathway toward scalable, privacy-preserving ML deployments.

Abstract

Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter the model and need to be removed afterwards. Recently, different concepts for machine unlearning have been proposed to address this problem. While these approaches are effective in removing individual data points, they do not scale to scenarios where larger groups of features and labels need to be reverted. In this paper, we propose the first method for unlearning features and labels. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters. It enables to adapt the influence of training data on a learning model retrospectively, thereby correcting data leaks and privacy issues. For learning models with strongly convex loss functions, our method provides certified unlearning with theoretical guarantees. For models with non-convex losses, we empirically show that unlearning features and labels is effective and significantly faster than other strategies.

Machine Unlearning of Features and Labels

TL;DR

This work addresses privacy leaks in machine learning by moving beyond pointwise data removal to unlearning entire features and labels. It introduces a novel framework that uses influence functions to map perturbations in training data to closed-form model updates, enabling efficient and flexible unlearning. The authors establish certified unlearning guarantees for strongly convex losses and demonstrate strong empirical performance for non-convex models across three practical scenarios: removing sensitive features, eliminating unintended memorization in language models, and repairing label poisoning in vision. The approach offers significant speed-ups over retraining and competitive fidelity, with practical implications for satisfying data-removal requests and mitigating privacy risks in real-world systems. The work also provides a thoughtful discussion of limitations and diffusion into differential privacy, offering a concrete pathway toward scalable, privacy-preserving ML deployments.

Abstract

Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter the model and need to be removed afterwards. Recently, different concepts for machine unlearning have been proposed to address this problem. While these approaches are effective in removing individual data points, they do not scale to scenarios where larger groups of features and labels need to be reverted. In this paper, we propose the first method for unlearning features and labels. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters. It enables to adapt the influence of training data on a learning model retrospectively, thereby correcting data leaks and privacy issues. For learning models with strongly convex loss functions, our method provides certified unlearning with theoretical guarantees. For models with non-convex losses, we empirically show that unlearning features and labels is effective and significantly faster than other strategies.

Paper Structure

This paper contains 65 sections, 7 theorems, 54 equations, 13 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

For learning models processing inputs $x$ using linear transformations of the form $\theta^{T} x$, we have $\theta^*_{-F} \equiv \theta^*_{F=0}$.

Figures (13)

  • Figure 1: Probability of all shards being affected when unlearning for varying number of data points and shards ($S$).
  • Figure 2: Instance-based unlearning vs. unlearning of features and labels. The data to be removed is marked with orange.
  • Figure 3: Affected data points and overall data when removing or changing features in the different datasets.
  • Figure 4: Efficacy (gradient residual) of the certified unlearning methods for varying number of affected features (Lower values are better).
  • Figure 5: Difference in loss between retraining and unlearning with 100 affected features.
  • ...and 8 more figures

Theorems & Definitions (15)

  • Lemma 1
  • proof
  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2: GuoGolHan+20
  • Theorem 3
  • Theorem 1
  • Lemma 2
  • proof
  • ...and 5 more