Table of Contents
Fetching ...

Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models

Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey

Abstract

Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.

Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models

Abstract

Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.

Paper Structure

This paper contains 73 sections, 5 theorems, 60 equations, 9 figures, 21 tables.

Key Result

lemma 1

Let $\mathcal{L}$ denote the cross-entropy loss trained on a MLP $F$ with $L$ layers under gradient ascent. Let $z(t)$ denote the logits at iteration $t$. Then if $\mathcal{L}(t) \rightarrow \infty$ it follows that $z(t) \rightarrow \infty$ in norm.

Figures (9)

  • Figure 1: GD+Sine is the only method to achieve both high Forget Quality and Model Utility across vision architectures.(a) FQ vs. MU on ViT-B/16, ViT-L/14, and DeiT-S: only GD+Sine (red) reaches the ideal zone (green box), while parameter-efficient and full fine-tuning baselines fail on one or both axes. (b) Activation ablation on ViT-B/16: Sine alone achieves near-perfect FQ (0.92) and MU (0.97); unbounded activations collapse on both metrics. (c) GD+Sine dominates consistently across all three architectures by orders of magnitude in forget quality over parameter-efficient baselines maintaining model utility.
  • Figure 2: Balancing efficiency and effectiveness in parameter tuning.(Left) On TDEC, our method achieves stronger privacy protection than existing parameter-efficient baselines while requiring fewer parameters than full-tuning. (Right) On TOFU, our approach maintains consistently high forget quality across LoRA ranks, outperforming state-of-the-art baselines by orders of magnitude while preserving parameter efficiency.
  • Figure 3: Optimization dynamics and unlearning convergence.(a) (top) Gradient and weight Frobenius norms (FFN MLP layers, Phi-1.5B rank-4, 1000 iterations): GD+LoRA and GD+FILA explode to $10^{5}$, while GD+Sine remains bounded in $[10^{1}, 10^{2}]$, confirming that bounded parameterization prevents explosion. (b) (bottom) Forget quality and model utility across training iterations (Phi-1.5B rank-4, Forget10): our method rapidly improves FQ while maintaining MU, whereas baselines either fail to forget or collapse in utility. Additional comparisons in \ref{['app:sensitivity']}.
  • Figure 4: Sensitivity analysis of the frequency parameter $\omega$ on TOFU-Forget10 with Phi-1.5B. (Left) Forget quality (FQ $\uparrow$) improves with $\omega$, plateauing beyond $\omega \geq 100$. (Right) Model utility (MU $\uparrow$) remains stable, with both GD+Sine and IHL+Sine converging to similar levels.
  • Figure 5: Classifier head stability comparison on TOFU-Forget10 using Phi-1.5B model during unlearning training across 1000 iterations. (Left) Logits (per-class) where GD+LoRA and GD+FILA drift with large variance, while our bounded approach GD+Sine remains tightly centered. (Middle) Norm of classifier updates showing sine-activated methods converge to stable plateaus compared to both baselines. (Right) Gradient norm showing our method (GD+Sine) maintains low, stable values, in contrast to growing variance in both GD+LoRA and GD+FILA.
  • ...and 4 more figures

Theorems & Definitions (10)

  • lemma 1
  • theorem 1
  • proof : Proof of \ref{['lem:logits_infinity']}
  • proof : Proof of \ref{['thm:weights_gradients_explode']}
  • lemma 2
  • proof
  • theorem 2
  • proof
  • theorem 3
  • proof