Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images
George R. Nahass, Zhu Wang, Homa Rashidisabet, Won Hwa Kim, Sasha Hubschman, Jeffrey C. Peterson, Chad A. Purnell, Pete Setabutr, Ann Q. Tran, Darvin Yi, Sathya N. Ravi
TL;DR
The paper tackles post-deployment model revision by introducing a bilevel optimization framework for targeted unlearning in deep networks, operationalized via a boundary-based inner search that relabels forget samples and an outer fine-tuning stage. It provides convergence guarantees for the inner perturbation search and offers tunable control over forgetting versus retention, plus a compositional strategy to merge unlearned components from different runs. Across benchmark and clinical imaging datasets, the method achieves strong selective forgetting while preserving utility on the remain set and maintaining reasonable privacy risk profiles. The work demonstrates practical, modular unlearning workflows that can adapt to device deprecation, data shifts, and evolving clinical criteria, reducing the need for full retraining. It also explores model composition as a means to balance competing objectives, enabling flexible deployment of unlearned models in real-world clinical contexts.
Abstract
Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.
