Table of Contents
Fetching ...

Exact Unlearning of Finetuning Data via Model Merging at Scale

Kevin Kuo, Amrith Setlur, Kartik Srinivas, Aditi Raghunathan, Virginia Smith

TL;DR

The paper tackles the challenge of exact unlearning in large-scale finetuned models by introducing SIFT-Masks, a merging-based approach that uses a global sign vector and per-task masks to preserve utility while enabling exact removal of a task’s influence. By representing each task with a residual $ au_c$ aligned to a shared sign $v$ and merging residuals into $ar{ au}$, the method allows unlearning a task via simple subtraction of its residual, avoiding full retraining across the retain set. Across experiments with up to 500 tasks, SIFT-Masks substantially improves accuracy over naive merging (5–80%) and reduces unlearning compute by up to 250x compared to other merging baselines, demonstrating scalable, exact unlearning at scale. The findings highlight the trade-offs between utility, compute, and storage, and suggest directions for more efficient localization-compatible unlearning and broader adoption in privacy-conscious, federated-like settings.

Abstract

Approximate unlearning has gained popularity as an approach to efficiently update an LLM so that it behaves (roughly) as if it was not trained on a subset of data to begin with. However, existing methods are brittle in practice and can easily be attacked to reveal supposedly unlearned information. To alleviate issues with approximate unlearning, we instead propose SIFT-Masks (SIgn-Fixed Tuning-Masks), an exact unlearning method based on model merging. SIFT-Masks addresses two key limitations of standard model merging: (1) merging a large number of tasks can severely harm utility; and (2) methods that boost utility by sharing extra information across tasks make exact unlearning prohibitively expensive. SIFT-Masks solves these issues by (1) applying local masks to recover task-specific performance; and (2) constraining finetuning to align with a global sign vector as a lightweight approach to determine masks independently before merging. Across four settings where we merge up to 500 models, SIFT-Masks improves accuracy by 5-80% over naive merging and uses up to 250x less compute for exact unlearning compared to other merging baselines.

Exact Unlearning of Finetuning Data via Model Merging at Scale

TL;DR

The paper tackles the challenge of exact unlearning in large-scale finetuned models by introducing SIFT-Masks, a merging-based approach that uses a global sign vector and per-task masks to preserve utility while enabling exact removal of a task’s influence. By representing each task with a residual aligned to a shared sign and merging residuals into , the method allows unlearning a task via simple subtraction of its residual, avoiding full retraining across the retain set. Across experiments with up to 500 tasks, SIFT-Masks substantially improves accuracy over naive merging (5–80%) and reduces unlearning compute by up to 250x compared to other merging baselines, demonstrating scalable, exact unlearning at scale. The findings highlight the trade-offs between utility, compute, and storage, and suggest directions for more efficient localization-compatible unlearning and broader adoption in privacy-conscious, federated-like settings.

Abstract

Approximate unlearning has gained popularity as an approach to efficiently update an LLM so that it behaves (roughly) as if it was not trained on a subset of data to begin with. However, existing methods are brittle in practice and can easily be attacked to reveal supposedly unlearned information. To alleviate issues with approximate unlearning, we instead propose SIFT-Masks (SIgn-Fixed Tuning-Masks), an exact unlearning method based on model merging. SIFT-Masks addresses two key limitations of standard model merging: (1) merging a large number of tasks can severely harm utility; and (2) methods that boost utility by sharing extra information across tasks make exact unlearning prohibitively expensive. SIFT-Masks solves these issues by (1) applying local masks to recover task-specific performance; and (2) constraining finetuning to align with a global sign vector as a lightweight approach to determine masks independently before merging. Across four settings where we merge up to 500 models, SIFT-Masks improves accuracy by 5-80% over naive merging and uses up to 250x less compute for exact unlearning compared to other merging baselines.

Paper Structure

This paper contains 22 sections, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: SIFT-Masks is a merging-and-localization method that can match central training accuracy while being as efficient for unlearning as naïve merging.
  • Figure 2: SIFT-Masks starts with (1) Sign-Fixed (Fine)-Tuning, which initializes a random sign vector $v$ and constrains finetuning to match this sign vector or otherwise be sparse, producing a sparse local model $M_i$ with mask $m_i$. We then (2) merge these local models into a global model and only keep the masks $m_i$. When (3) serving task $c_i$, we apply $m_i$ to the merged model. Finally, to unlearn task $c_i$, we simply unmerge $M_i$ from $\overline{M}$ and discard $m_i$.
  • Figure 3: Left: Merging ($x>1$) degrades performance (answer probability) compared to applying local models ($x=1$); this issue becomes more severe as the number of models increases, potentially reducing performance to zeroshot accuracy. Right: Our method SIFT-Masks recovers performance (probability for TOFU; accuracy otherwise) from merging and suffers much less from scale.
  • Figure 4: SIFT (Sign-Fixed Tuning) produces sparse local models (and corresponding masks) which have similar utility as regular FT (finetuning) when applied individually and after merging. However, the main benefit of SIFT is that the sparse masks can be applied to the merged model to obtain strong task-specific models. Despite learning these masks independently from the merged model (which is useful for unlearning efficiency), SIFT-Masks is competitive with existing localization approaches which optimize the mask to minimize distance between the merged and local models.
  • Figure 5: SIFT reduces the distance between the merged and local models, but this does not directly result in improved accuracy due to interference from large-scale merging. Instead, accuracy only improves after applying local masks.
  • ...and 8 more figures