Table of Contents
Fetching ...

Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate

Zhiqi Bu, Xiaomeng Jin, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong

TL;DR

Unlearning in LLMs is framed as a two-task optimization between retaining data utility ($L_R$) and forgetting data ($L_F$). The authors propose NGDiff, a dynamic scalarization method using normalized gradients $g_{NGDiff} = \frac{g_R}{\|g_R\|} - \frac{g_F}{\|g_F\|}$ together with an automatic learning-rate adaptation via GeN to ensure stable progress toward forgetting while preserving retention. They formalize the problem with Pareto-optimality guarantees, provide theoretical analysis of NGDiff, and demonstrate, across TOFU and MUSE-NEWS with multiple foundation models, that NGDiff achieves stronger forgetting while maintaining higher utility than state-of-the-art baselines, with AutoLR further enhancing stability. The results suggest a robust, scalable approach to unlearning that generalizes beyond NLP to other modalities, offering a principled link between LLM unlearning and multi-task optimization.

Abstract

Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.

Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate

TL;DR

Unlearning in LLMs is framed as a two-task optimization between retaining data utility () and forgetting data (). The authors propose NGDiff, a dynamic scalarization method using normalized gradients together with an automatic learning-rate adaptation via GeN to ensure stable progress toward forgetting while preserving retention. They formalize the problem with Pareto-optimality guarantees, provide theoretical analysis of NGDiff, and demonstrate, across TOFU and MUSE-NEWS with multiple foundation models, that NGDiff achieves stronger forgetting while maintaining higher utility than state-of-the-art baselines, with AutoLR further enhancing stability. The results suggest a robust, scalable approach to unlearning that generalizes beyond NLP to other modalities, offering a principled link between LLM unlearning and multi-task optimization.

Abstract

Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.

Paper Structure

This paper contains 33 sections, 8 theorems, 26 equations, 7 figures, 6 tables, 3 algorithms.

Key Result

Lemma 2

For any $0<c<1$, the model $\bm\theta^*_\text{LSP}(c)\in\mathop{\mathrm{arg\,min}}\limits_\theta \textup{LSP}(\bm\theta; c)$ is Pareto optimal.

Figures (7)

  • Figure 1: Loss values and ROUGE scores on the forgetting and retaining data from the TOFU dataset using different unlearning methods on the Phi-1.5 language model. We apply the extended GDiff with various coefficients (see \ref{['eq:static']}, $0\leq c\leq 1$) and connect the results with a blue dashed line. We denote MTO methods as different markers, and use a grey dashed line to represent the loss of random guess.
  • Figure 2: Gradient space in 2-dimension. $\color{red}\bm{g}_\textup{F}$ is the forgetting gradient and $\color{blue}\bm{g}_\textup{R}$ is the retaining gradient, each with a perpendicular dashed line. Yellow area is the linear span (Eq. \ref{['eq:static']}) by scalarization. Green area is positively correlated to $\bm{g}_\textup{R}$ and negatively correlated to $\bm{g}_\textup{F}$ by Eq. \ref{['eq:sanity check']}, whereas NGDiff always stays within this green area at each iteration by \ref{['fact:gnorm good']}.
  • Figure 3: Loss values of retaining and forgetting sets with different learning rates. Markers are $L_\textup{R}(\bm\theta_t-\eta\bm{g}_\textup{R})$ and $L_\textup{F}(\bm\theta_t-\eta\bm{g}_\textup{F})$ estimated by Phi-1.5 on TOFU at step 10. The curves are fitted as quadratic functions.
  • Figure 4: Comparison of unlearning methods on TOFU. The figures show the ROUGE scores and loss terms during unlearning process with different methods, which includes GDiff, LossNorm, and NGDiff. We observe that NGDiff effectively unlearns the forgetting data while maintaining the performance on the retaining data.
  • Figure 5: Comparison between AutoLR and different learning rates on NGDiff. The figures show the ROUGE scores and loss values during the unlearning process on TOFU dataset using Phi-1.5 model. We observe that AutoLR outperforms the static learning rates with better model utility and more stable convergence.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Definition 1: Pareto optimality in unlearning
  • Remark 3.1
  • Lemma 2: restated from xin2022current
  • Theorem 3
  • Lemma 4
  • Theorem 5
  • Remark 4.1
  • Remark 4.2
  • Lemma 2: restated from xin2022current
  • proof : Proof of \ref{['thm:pareto']}
  • ...and 8 more