Table of Contents
Fetching ...

Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning

Zhuo Huang, Qizhou Wang, Ziming Hong, Shanshan Ye, Bo Han, Tongliang Liu

TL;DR

This work tackles machine unlearning by avoiding gradient ascent (GA), which can destabilize training and degrade model utility. It introduces MOdel eXtrapolation (MOX): first create a memorization model $\theta_{mem}$ through gradient descent with a memorization objective and a KL constraint, then compute a forget model via extrapolation toward the reference model $\theta_{ref}$ using $\theta_{for} = (1+\alpha)\theta_{ref} - \alpha\theta_{mem}$, with optional momentum updates. MOX stabilizes unlearning, supports targeted and continual unlearning, and demonstrates superior forgetting performance while preserving utility on TOFU and MUSE benchmarks, outperforming GA-based and other baselines. The method is computationally efficient, adaptable to training phases, and supported by ablations and analyses that connect GA and GD directions, offering practical impact for deploying privacy-preserving unlearning in large-scale language models.

Abstract

For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest retained data, to obtain a memorization model. Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.

Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning

TL;DR

This work tackles machine unlearning by avoiding gradient ascent (GA), which can destabilize training and degrade model utility. It introduces MOdel eXtrapolation (MOX): first create a memorization model through gradient descent with a memorization objective and a KL constraint, then compute a forget model via extrapolation toward the reference model using , with optional momentum updates. MOX stabilizes unlearning, supports targeted and continual unlearning, and demonstrates superior forgetting performance while preserving utility on TOFU and MUSE benchmarks, outperforming GA-based and other baselines. The method is computationally efficient, adaptable to training phases, and supported by ablations and analyses that connect GA and GD directions, offering practical impact for deploying privacy-preserving unlearning in large-scale language models.

Abstract

For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest retained data, to obtain a memorization model. Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
Paper Structure (31 sections, 1 theorem, 18 equations, 8 figures, 6 tables)

This paper contains 31 sections, 1 theorem, 18 equations, 8 figures, 6 tables.

Key Result

Theorem 2.4

By avoiding the irreversible gradient via GD on $D_{sub}$, the model converges without collapse.

Figures (8)

  • Figure 1: (a) Effect of gradient ascent and gradient descent on model utility under various reweighting levels. (b) Effect of gradient ascent and gradient descent on divergence between training and reference models under various reweighting levels. (c) Comparison of forget quality between the forget model and the memorize model.
  • Figure 2: Illustration of MOX. Color intensity indicates dataset fit, colored arrows denote learning directions, and black arrows indicate model extrapolation. Directly deriving the forget model $\theta_{for}$ from the reference model $\theta_{ref}$ via gradient ascent is infeasible, as it reverses pre-training and leads to optimization failures. Instead, we apply gradient descent to memorize $\mathcal{D}_F$, obtaining a memorization model $\theta_{mem}$, and then extrapolate to produce $\theta_{for}$ that effectively forgets $\mathcal{D}_F$. To preserve utility, a KL-divergence constraint enforces consistency between $\theta_{ref}$ and $\theta_{mem}$, improving the utility of $\theta_{for}$. For targeted unlearning, an additional forgetting loss—compatible with pre-training—is combined with the memorization loss to perform MOX and obtain $\theta_{for}$.
  • Figure 3: Parameter sensitivity analyses on $\alpha$: top row and $\eta$: bottom row.
  • Figure 4: Performance stability under different extrapolation strengths and weight values.
  • Figure 5: Performance comparison under various forget sizes.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 2.1: irreversible gradient
  • Definition 2.2: Collapse
  • Theorem 2.4
  • proof