Table of Contents
Fetching ...

Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images

George R. Nahass, Zhu Wang, Homa Rashidisabet, Won Hwa Kim, Sasha Hubschman, Jeffrey C. Peterson, Chad A. Purnell, Pete Setabutr, Ann Q. Tran, Darvin Yi, Sathya N. Ravi

TL;DR

The paper tackles post-deployment model revision by introducing a bilevel optimization framework for targeted unlearning in deep networks, operationalized via a boundary-based inner search that relabels forget samples and an outer fine-tuning stage. It provides convergence guarantees for the inner perturbation search and offers tunable control over forgetting versus retention, plus a compositional strategy to merge unlearned components from different runs. Across benchmark and clinical imaging datasets, the method achieves strong selective forgetting while preserving utility on the remain set and maintaining reasonable privacy risk profiles. The work demonstrates practical, modular unlearning workflows that can adapt to device deprecation, data shifts, and evolving clinical criteria, reducing the need for full retraining. It also explores model composition as a means to balance competing objectives, enabling flexible deployment of unlearned models in real-world clinical contexts.

Abstract

Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.

Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images

TL;DR

The paper tackles post-deployment model revision by introducing a bilevel optimization framework for targeted unlearning in deep networks, operationalized via a boundary-based inner search that relabels forget samples and an outer fine-tuning stage. It provides convergence guarantees for the inner perturbation search and offers tunable control over forgetting versus retention, plus a compositional strategy to merge unlearned components from different runs. Across benchmark and clinical imaging datasets, the method achieves strong selective forgetting while preserving utility on the remain set and maintaining reasonable privacy risk profiles. The work demonstrates practical, modular unlearning workflows that can adapt to device deprecation, data shifts, and evolving clinical criteria, reducing the need for full retraining. It also explores model composition as a means to balance competing objectives, enabling flexible deployment of unlearned models in real-world clinical contexts.

Abstract

Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.

Paper Structure

This paper contains 36 sections, 4 theorems, 25 equations, 12 figures, 11 tables, 3 algorithms.

Key Result

Lemma 2.1

Assume $(x_i,y_i)\in F$ with $y_i$ to be 1-hot vector representation $x_i$'s class label, and $\mathcal{L}(\cdot,\cdot)$ is a smooth function that decomposes with respect to coordinates of $y_i$, and is decreasing. The inner minimization problem in eq:biunlearn is equivalent to $\max_{\delta}\mathca

Figures (12)

  • Figure 1: Graphical schematic of our proposed unlearning algorithm. We begin with a pretrained CNN $f_{w_0}$ and a user-defined forget set $F$. In the inner optimization loop (top right), we identify boundary points $x_i^b$ across the decision surface of the original model via our perturbed sign-gradient method. For each forget sample $x_i$, we assign a new label $y_i^b = \arg\max f_{w_0}(x_i^b)$ based on the closest incorrect class, and construct a relabeled forget set $\tilde{F} = \{(x_i, y_i^b)\}$. In the outer optimization loop (bottom), we fine-tune the model on $\tilde{F}$, optionally incorporating remain-set supervision. The result is an unlearned model $f_{w_u}$ whose decision boundaries are shifted to forget the designated samples.
  • Figure 2: Unlearning varying proportions of a target class using our method and standard baselines. R Acc and F Acc denote the accuracy of the unlearned model on the remain and forget sets, respectively. MIA refers to robustness against membership inference attacks. Top row: CIFAR-10; bottom row: FashionMNIST. Arrows indicate the desirable direction for each metric (↑ for higher is better, ↓ for lower is better).
  • Figure 3: Example images from both clinical unlearning scenarios, drawn from the forget set ($F$) and remain set ($R$). All images belong to the same disease class, but forget samples were either collected using the Cirrus 800 FA imaging device (left) or had a vertical palpebral fissure (VPF) $\geq$ 12mm (right), and were explicitly targeted for unlearning.
  • Figure 4: Decision boundary of original (left), retrain (center), and model unlearned of images with VPF $\geq$ 12 mm (right). All points in the test set for $R$ and $F$ were passed through the models and the embeddings were visualized using t-SNE. Stars denote points in $F$, and circles denote points in $R$. Color denotes predicted label, and in the event of misclassification, edges denote the ground truth label. Decision space was visualized by training a $k$-nearest neighbors classifier on the 2D t-SNE embeddings using predicted labels, and plotting its decision regions as background contours.
  • Figure 5: Distributions of clinical data and parameters for unlearning used in this study. On the histogram, the dashed line denotes vertical palpebral fissure (VPF) of 11- images with a VPF greater than this were unlearned. On bar graphs, 'red' denotes the samples that were unlearned in targeted medical unlearning experiments.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Lemma 2.1: Unconstrained Unlearning
  • Corollary 2.2: Boundary Shrink Initialization
  • Lemma 2.3: Ascent‑direction guarantee
  • Theorem 2.4: Convergence of perturbed FGSM