Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning
Zhuo Huang, Qizhou Wang, Ziming Hong, Shanshan Ye, Bo Han, Tongliang Liu
TL;DR
This work tackles machine unlearning by avoiding gradient ascent (GA), which can destabilize training and degrade model utility. It introduces MOdel eXtrapolation (MOX): first create a memorization model $\theta_{mem}$ through gradient descent with a memorization objective and a KL constraint, then compute a forget model via extrapolation toward the reference model $\theta_{ref}$ using $\theta_{for} = (1+\alpha)\theta_{ref} - \alpha\theta_{mem}$, with optional momentum updates. MOX stabilizes unlearning, supports targeted and continual unlearning, and demonstrates superior forgetting performance while preserving utility on TOFU and MUSE benchmarks, outperforming GA-based and other baselines. The method is computationally efficient, adaptable to training phases, and supported by ablations and analyses that connect GA and GD directions, offering practical impact for deploying privacy-preserving unlearning in large-scale language models.
Abstract
For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest retained data, to obtain a memorization model. Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
