Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Zhiqi Bu, Xiaomeng Jin, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong
TL;DR
Unlearning in LLMs is framed as a two-task optimization between retaining data utility ($L_R$) and forgetting data ($L_F$). The authors propose NGDiff, a dynamic scalarization method using normalized gradients $g_{NGDiff} = \frac{g_R}{\|g_R\|} - \frac{g_F}{\|g_F\|}$ together with an automatic learning-rate adaptation via GeN to ensure stable progress toward forgetting while preserving retention. They formalize the problem with Pareto-optimality guarantees, provide theoretical analysis of NGDiff, and demonstrate, across TOFU and MUSE-NEWS with multiple foundation models, that NGDiff achieves stronger forgetting while maintaining higher utility than state-of-the-art baselines, with AutoLR further enhancing stability. The results suggest a robust, scalable approach to unlearning that generalizes beyond NLP to other modalities, offering a principled link between LLM unlearning and multi-task optimization.
Abstract
Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.
