Table of Contents
Fetching ...

DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers

Xuyang Zhong, Haochen Luo, Chen Liu

TL;DR

The paper tackles the instability and hyperparameter sensitivity of approximate machine unlearning by introducing DualOptim, a plug-and-play framework that uses adaptive learning rates for forgetting and decoupled momentum for forgetting and retaining objectives. The approach is supported by theoretical guarantees showing reduced parameter-variance with decoupled momentum and by extensive experiments across image classification, image generation, and large language models, where it improves forgetting efficacy and stability relative to strong baselines. Key contributions include the formalization of a two-objective MU problem, the demonstration that decoupled momentum reduces worst-case variance, and the empirical validation of a practical, generalizable optimization scheme that pushes state-of-the-art in MU. The work has practical impact for deploying trustworthy MU systems in real-world settings, offering a robust, modular method that can be integrated with existing MU algorithms. $\min_{\theta} \mathcal{L}_f(\mathcal{D}_f, \theta) + \mathcal{L}_r(\mathcal{D}_r, \theta)$, adaptive learning rates, and decoupled momentum together enable more stable and effective forgetting across diverse tasks.

Abstract

Existing machine unlearning (MU) approaches exhibit significant sensitivity to hyperparameters, requiring meticulous tuning that limits practical deployment. In this work, we first empirically demonstrate the instability and suboptimal performance of existing popular MU methods when deployed in different scenarios. To address this issue, we propose Dual Optimizer (DualOptim), which incorporates adaptive learning rate and decoupled momentum factors. Empirical and theoretical evidence demonstrates that DualOptim contributes to effective and stable unlearning. Through extensive experiments, we show that DualOptim can significantly boost MU efficacy and stability across diverse tasks, including image classification, image generation, and large language models, making it a versatile approach to empower existing MU algorithms.

DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers

TL;DR

The paper tackles the instability and hyperparameter sensitivity of approximate machine unlearning by introducing DualOptim, a plug-and-play framework that uses adaptive learning rates for forgetting and decoupled momentum for forgetting and retaining objectives. The approach is supported by theoretical guarantees showing reduced parameter-variance with decoupled momentum and by extensive experiments across image classification, image generation, and large language models, where it improves forgetting efficacy and stability relative to strong baselines. Key contributions include the formalization of a two-objective MU problem, the demonstration that decoupled momentum reduces worst-case variance, and the empirical validation of a practical, generalizable optimization scheme that pushes state-of-the-art in MU. The work has practical impact for deploying trustworthy MU systems in real-world settings, offering a robust, modular method that can be integrated with existing MU algorithms. , adaptive learning rates, and decoupled momentum together enable more stable and effective forgetting across diverse tasks.

Abstract

Existing machine unlearning (MU) approaches exhibit significant sensitivity to hyperparameters, requiring meticulous tuning that limits practical deployment. In this work, we first empirically demonstrate the instability and suboptimal performance of existing popular MU methods when deployed in different scenarios. To address this issue, we propose Dual Optimizer (DualOptim), which incorporates adaptive learning rate and decoupled momentum factors. Empirical and theoretical evidence demonstrates that DualOptim contributes to effective and stable unlearning. Through extensive experiments, we show that DualOptim can significantly boost MU efficacy and stability across diverse tasks, including image classification, image generation, and large language models, making it a versatile approach to empower existing MU algorithms.

Paper Structure

This paper contains 29 sections, 3 theorems, 22 equations, 10 figures, 20 tables, 1 algorithm.

Key Result

Lemma 3.4

(Variance of Gradients) If the loss function $\mathcal{L}$ is Lipschitz smooth with a constant $L$, and $\mathrm{Var}(\theta) \leq \sigma_{\theta}^2$, then we have $\mathrm{Var}(\nabla_\theta{\mathcal{L}}(\theta)) \leq L^2\sigma_{\theta}^2$.

Figures (10)

  • Figure 1: Unlearning process of MU baselines. SFRon huang2025unified, SalUn fan2024salun and SCRUB kurmanji2023towards are adopted as the baselines. The metrics are those mentioned in Sec. \ref{['sec:preliminary']}. All results are obtained from unlearning 10% random subset of CIFAR-10 on ResNet-18. The solid lines and shadows denote the mean and standard deviation across 5 trials with different random data. The hyperparameters of different methods are selected based on minimizing the averaging gap between retraining and them across 5 trials. The red dashed lines denote the final performance of retraining as a reference.
  • Figure 2: Unlearning process with different ablations of the proposed method. All results are obtained from unlearning 10% random subset of CIFAR-10 by SFRon huang2025unified on ResNet-18. (a)-(d) The metrics are those mentioned in Sec. \ref{['sec:preliminary']}. The red dashed lines denote the final performance of retraining as a reference. The solid lines and shadows denote the mean and standard deviation across 5 trials with different random forget sets.
  • Figure 3: Norms of stochastic forget gradient $\widehat{{\bm{g}}}_f$ and stochastic retain gradient $\widehat{{\bm{g}}}_r$ using different optimizers. All results are obtained from unlearning 10% random subset of CIFAR-10 by SFRon on ResNet-18. (a)-(c) The curves are obtained using SGD, Adam, DualOptim, respectively.
  • Figure 4: Cosine similarity between $\widehat{{\bm{g}}}_f$ and $\widehat{{\bm{g}}}_r$. The moving average curve is shown for better visualization.
  • Figure 5: Unlearning process of SalUn. All results are obtained from unlearning 10% random subset of CIFAR-10 on ResNet-18.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Lemma 3.4
  • Theorem 3.5
  • proof
  • proof
  • Corollary A.2
  • proof