Table of Contents
Fetching ...

Unlearning Personal Data from a Single Image

Thomas De Min, Massimiliano Mancini, Stéphane Lathuilière, Subhankar Roy, Elisa Ricci

TL;DR

This work tackles the realistic constraint that training data may not be accessible when a model must forget information. It introduces 1-SHUI, a benchmark for one-shot identity unlearning where a single Support Sample per identity is provided to guide forgetting without access to the full dataset. To solve this, the authors propose MetaUnlearn, a meta-learning-based approach that learns a forget-promoting loss by simulating unlearning on available data and then applies the learned loss with a single gradient step using only Support Samples at test time. Extensive experiments on CelebA, CelebA-HQ, and MUFAC across multiple forget-set sizes show that MetaUnlearn achieves competitive or superior forgetting (ToW) while preserving performance and maintaining low membership inference risk, demonstrating a viable direction for data-absence unlearning. The work also analyzes factors influencing difficulty and ablates the loss components and inputs, outlining future extensions to multiple forget requests and broader data domains.

Abstract

Machine unlearning aims to erase data from a model as if the latter never saw them during training. While existing approaches unlearn information from complete or partial access to the training data, this access can be limited over time due to privacy regulations. Currently, no setting or benchmark exists to probe the effectiveness of unlearning methods in such scenarios. To fill this gap, we propose a novel task we call One-Shot Unlearning of Personal Identities (1-SHUI) that evaluates unlearning models when the training data is not available. We focus on unlearning identity data, which is specifically relevant due to current regulations requiring personal data deletion after training. To cope with data absence, we expect users to provide a portraiting picture to aid unlearning. We design requests on CelebA, CelebA-HQ, and MUFAC with different unlearning set sizes to evaluate applicable methods in 1-SHUI. Moreover, we propose MetaUnlearn, an effective method that meta-learns to forget identities from a single image. Our findings indicate that existing approaches struggle when data availability is limited, especially when there is a dissimilarity between the provided samples and the training data. Source code available at https://github.com/tdemin16/one-shui.

Unlearning Personal Data from a Single Image

TL;DR

This work tackles the realistic constraint that training data may not be accessible when a model must forget information. It introduces 1-SHUI, a benchmark for one-shot identity unlearning where a single Support Sample per identity is provided to guide forgetting without access to the full dataset. To solve this, the authors propose MetaUnlearn, a meta-learning-based approach that learns a forget-promoting loss by simulating unlearning on available data and then applies the learned loss with a single gradient step using only Support Samples at test time. Extensive experiments on CelebA, CelebA-HQ, and MUFAC across multiple forget-set sizes show that MetaUnlearn achieves competitive or superior forgetting (ToW) while preserving performance and maintaining low membership inference risk, demonstrating a viable direction for data-absence unlearning. The work also analyzes factors influencing difficulty and ablates the loss components and inputs, outlining future extensions to multiple forget requests and broader data domains.

Abstract

Machine unlearning aims to erase data from a model as if the latter never saw them during training. While existing approaches unlearn information from complete or partial access to the training data, this access can be limited over time due to privacy regulations. Currently, no setting or benchmark exists to probe the effectiveness of unlearning methods in such scenarios. To fill this gap, we propose a novel task we call One-Shot Unlearning of Personal Identities (1-SHUI) that evaluates unlearning models when the training data is not available. We focus on unlearning identity data, which is specifically relevant due to current regulations requiring personal data deletion after training. To cope with data absence, we expect users to provide a portraiting picture to aid unlearning. We design requests on CelebA, CelebA-HQ, and MUFAC with different unlearning set sizes to evaluate applicable methods in 1-SHUI. Moreover, we propose MetaUnlearn, an effective method that meta-learns to forget identities from a single image. Our findings indicate that existing approaches struggle when data availability is limited, especially when there is a dissimilarity between the provided samples and the training data. Source code available at https://github.com/tdemin16/one-shui.
Paper Structure (30 sections, 9 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 9 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: One-Shot Unlearning of Personal Identities. Standard machine unlearning approaches leverage the entire forget set to forget an identity. In 1-SHUI, the user provides a picture of themselves as the only input to the unlearning algorithm for forgetting their identity. 1-SHUI evaluates unlearning when the entire training set is inaccessible.
  • Figure 2: Benchmark dataset construction. The dataset is split based on identities, dividing them into train $\mathcal{I}_{tr}$, validation $\mathcal{I}_v$, and test $\mathcal{I}_{te}$ IDs. Following, we sample forgetting identities from training ones, splitting $\mathcal{I}_\mathit{tr}$ into forget $\mathcal{I}_{\mathit{f}}$ and retain $\mathcal{I}_r$ IDs. Out of forget identities, we sample one image for each identity to form the Support Set $\mathcal{S}$, which is unavailable at training time.
  • Figure 3: MetaUnlearn pipeline. While training data are available (i) The original model $f_\theta$ is trained on $\mathcal{D}_{tr}$ using the task loss function (e.g., cross-entropy loss). (ii) Before being discarded, the training data can be used to learn the proposed unlearning loss (MetaUnlearn). MetaUnlearn simulates an unlearning request $\mathcal{S}$ and use the meta-loss ($h_\phi(\cdot)$) to unlearn it at once. We evaluate the unlearned model $f_{\theta_u}$ using equation \ref{['eq:scaledloss']} ($\mathcal{A}$) on the forget, and validation data and backpropagate to $h_\phi(\cdot)$. $f_{\theta_u}$ is discarded, we simulate another unlearning request and iterate until convergence. (iii) Once the MetaUnlearn is trained, we can use it to unlearn identities via Support Samples, without accessing the original training set.
  • Figure 4: Unlearning hardness vs. Support Sample distance. As the Support Sample distance from the identity centroid increases, the accuracy gap with the retrained model grows.
  • Figure 5: Performance drop vs. distance from forget set. As retain identities get closer to forget identities, retain samples mAP drops compared to the pretrain model.
  • ...and 3 more figures