Exact Unlearning of Finetuning Data via Model Merging at Scale
Kevin Kuo, Amrith Setlur, Kartik Srinivas, Aditi Raghunathan, Virginia Smith
TL;DR
The paper tackles the challenge of exact unlearning in large-scale finetuned models by introducing SIFT-Masks, a merging-based approach that uses a global sign vector and per-task masks to preserve utility while enabling exact removal of a task’s influence. By representing each task with a residual $ au_c$ aligned to a shared sign $v$ and merging residuals into $ar{ au}$, the method allows unlearning a task via simple subtraction of its residual, avoiding full retraining across the retain set. Across experiments with up to 500 tasks, SIFT-Masks substantially improves accuracy over naive merging (5–80%) and reduces unlearning compute by up to 250x compared to other merging baselines, demonstrating scalable, exact unlearning at scale. The findings highlight the trade-offs between utility, compute, and storage, and suggest directions for more efficient localization-compatible unlearning and broader adoption in privacy-conscious, federated-like settings.
Abstract
Approximate unlearning has gained popularity as an approach to efficiently update an LLM so that it behaves (roughly) as if it was not trained on a subset of data to begin with. However, existing methods are brittle in practice and can easily be attacked to reveal supposedly unlearned information. To alleviate issues with approximate unlearning, we instead propose SIFT-Masks (SIgn-Fixed Tuning-Masks), an exact unlearning method based on model merging. SIFT-Masks addresses two key limitations of standard model merging: (1) merging a large number of tasks can severely harm utility; and (2) methods that boost utility by sharing extra information across tasks make exact unlearning prohibitively expensive. SIFT-Masks solves these issues by (1) applying local masks to recover task-specific performance; and (2) constraining finetuning to align with a global sign vector as a lightweight approach to determine masks independently before merging. Across four settings where we merge up to 500 models, SIFT-Masks improves accuracy by 5-80% over naive merging and uses up to 250x less compute for exact unlearning compared to other merging baselines.
