Table of Contents
Fetching ...

Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Sonia Laguna, Jorge da Silva Goncalves, Moritz Vandenhirtz, Alain Ryser, Irene Cannistraci, Julia E. Vogt

Abstract

Machine unlearning is rapidly becoming a practical requirement, driven by privacy regulations, data errors, and the need to remove harmful or corrupted training samples. Despite this, most existing methods tackle the problem purely from a post-hoc perspective. They attempt to erase the influence of targeted training samples through parameter updates that typically require access to the full training data. This creates a mismatch with real deployment scenarios where unlearning requests can be anticipated, revealing a fundamental limitation of post-hoc approaches. We propose \textit{unlearning by design}, a novel paradigm in which models are directly trained to support forgetting as an inherent capability. We instantiate this idea with Machine UNlearning via KEY deletion (MUNKEY), a memory augmented transformer that decouples instance-specific memorization from model weights. Here, unlearning corresponds to removing the instance-identifying key, enabling direct zero-shot forgetting without weight updates or access to the original samples or labels. Across natural image benchmarks, fine-grained recognition, and medical datasets, MUNKEY outperforms all post-hoc baselines. Our results establish that unlearning by design enables fast, deployment-oriented unlearning while preserving predictive performance.

Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Abstract

Machine unlearning is rapidly becoming a practical requirement, driven by privacy regulations, data errors, and the need to remove harmful or corrupted training samples. Despite this, most existing methods tackle the problem purely from a post-hoc perspective. They attempt to erase the influence of targeted training samples through parameter updates that typically require access to the full training data. This creates a mismatch with real deployment scenarios where unlearning requests can be anticipated, revealing a fundamental limitation of post-hoc approaches. We propose \textit{unlearning by design}, a novel paradigm in which models are directly trained to support forgetting as an inherent capability. We instantiate this idea with Machine UNlearning via KEY deletion (MUNKEY), a memory augmented transformer that decouples instance-specific memorization from model weights. Here, unlearning corresponds to removing the instance-identifying key, enabling direct zero-shot forgetting without weight updates or access to the original samples or labels. Across natural image benchmarks, fine-grained recognition, and medical datasets, MUNKEY outperforms all post-hoc baselines. Our results establish that unlearning by design enables fast, deployment-oriented unlearning while preserving predictive performance.
Paper Structure (47 sections, 3 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 47 sections, 3 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of MUNKEY.(1) Training: The learnable exemplar token ${\bm{v}}_i$ from the memory bank $\mathcal{M}$ is concatenated to image tokens ${\bm{z}}_i$ to predict the label ${\bm{y}}_i$. (2) Unlearning results in a simple instance-specific key deletion in $\mathcal{M}$. (3) Inference: For a query $x_q$, the model retrieves $K$-nearest neighbor tokens from the updated memory $\mathcal{M}_u$ and ensembles their predictions to produce the final output.
  • Figure 2: UMAP visualization of exemplar tokens and $[CLS]$ tokens on (top) DermaMNIST and (bottom) CIFAR-10 shown at epochs 0 (start), 50, and 100 (end) of training.
  • Figure 3: KNN-based retrieval visualizations on DermaMNIST on a test sample (top) and forget sample (bottom). Rows correspond to the original and unlearning behavior of each sample and their retrieved counterparts. Color labels indicate retrieval and prediction outcomes: , , and ; prediction boxes are shown in and .
  • Figure 4: Performance sensitivity to the number of neighbors on DermaMNIST and CIFAR-10. We report Test Accuracy (%) (left) and Avg Gap (right) across different neighborhood sizes.
  • Figure 5: Overview of the performance metrics under varying image and token dropout probabilities in MUNKEY training in CIFAR-10 for visualization of model selection.
  • ...and 1 more figures