Table of Contents
Fetching ...

BLUR: A Bi-Level Optimization Approach for LLM Unlearning

Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong

TL;DR

This paper tackles the problem of unlearning in large language models by introducing BLUR, a bi-level optimization framework that prioritizes forgetting over retaining utility. By modeling unlearning as a hierarchical problem where the lower-level forget objective constrains the solution space and the upper-level retain objective selects the best utility-preserving solution, BLUR provides a gradient-based update that orthogonally projects retain information away from forgetting directions. The authors prove convergence properties for nonconvex settings and demonstrate through extensive experiments on TOFU, MUSE, and WMDP benchmarks that BLUR outperforms state-of-the-art baselines in unlearning efficiency while preserving model utility, with ablations highlighting the impact of hyperparameters. The work offers a principled path toward compliant and ethical LLM unlearning and introduces a flexible meta-algorithm that can accommodate various forget and retain losses, potentially influencing future standardizations in data-removal and privacy-preserving AI.

Abstract

Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model's utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.

BLUR: A Bi-Level Optimization Approach for LLM Unlearning

TL;DR

This paper tackles the problem of unlearning in large language models by introducing BLUR, a bi-level optimization framework that prioritizes forgetting over retaining utility. By modeling unlearning as a hierarchical problem where the lower-level forget objective constrains the solution space and the upper-level retain objective selects the best utility-preserving solution, BLUR provides a gradient-based update that orthogonally projects retain information away from forgetting directions. The authors prove convergence properties for nonconvex settings and demonstrate through extensive experiments on TOFU, MUSE, and WMDP benchmarks that BLUR outperforms state-of-the-art baselines in unlearning efficiency while preserving model utility, with ablations highlighting the impact of hyperparameters. The work offers a principled path toward compliant and ethical LLM unlearning and introduces a flexible meta-algorithm that can accommodate various forget and retain losses, potentially influencing future standardizations in data-removal and privacy-preserving AI.

Abstract

Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model's utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.

Paper Structure

This paper contains 16 sections, 3 theorems, 28 equations, 11 figures, 8 tables.

Key Result

Theorem 3.2

Under Assumption asm:f, the model generated by using dynamics in eq:u -- eq:forbun satisfies Further, the following holds: for every $T\geq \frac{4C}{L_f C^2_1 \eta^2}$ where $C_1:=(2+\gamma)C$.

Figures (11)

  • Figure 1: Trade-off between Knowledge memorization values on the forget set (vertical axis, the lower the better) and retain datasets (horizontal axis, the higher the better) using different unlearning methods. Training is done using LLaMA2-7B model, evaluated using the MUSE-News dataset. We run GradDiff with various values of the regularization term $\lambda$, as defined in \ref{['eq:reg_f']}.
  • Figure 2: Alignment values of forget and retain losses in \ref{['eq:alig']} on MUSE-News using LLaMa2-7B model vs. training step.
  • Figure 3: Cosine similarity of the gradient forget and retain losses using NPO on MUSE-News dataset and LLaMA2-7B, with $\lambda \!=\!1$ and $\eta\!=\! 10^{-5}$.
  • Figure 4: Visualization of the update direction in \ref{['eq:u_hat']} and \ref{['eq:theta_hat']} with their components.
  • Figure C.1: Verbatim memorization on the forget set $\mathcal{D}_f$ (top) and knowledge memorization on the retain set $\mathcal{D}_r$ (bottom) vs. optimization epochs, using various unlearning methods on the MUSE-News dataset.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Example 1
  • Remark 2.1
  • Theorem 3.2
  • Remark 3.3
  • Remark 3.4
  • Lemma A.1
  • proof
  • Lemma A.2