Table of Contents
Fetching ...

Variance-Reduced $(\varepsilon,δ)-$Unlearning using Forget Set Gradients

Martin Van Waerebeke, Marco Lorenzi, Kevin Scaman, El Mahdi El Mhamdi, Giovanni Neglia

TL;DR

This paper introduces Variance-Reduced Unlearning (VRU), a first-order algorithm that directly incorporates forget-set gradients into the unlearning update while guaranteeing $(\varepsilon,\delta)$-unlearning for strongly convex losses. VRU achieves a variance-reduced stochastic gradient estimator anchored at the pre-trained optimum $\theta^*$, enabling a projected SGD phase to quickly reach $\theta^*_r$ and a final Gaussian mechanism to satisfy the privacy guarantee. Theoretical results establish a convergence rate of $\tilde{\mathcal{O}}\left( \kappa_{\ell}^3 \big(1 + d\kappa_{\epsilon,\delta}^2 \log(1/\delta)\big) \frac{e_0}{e} \left(\frac{r_f}{1-r_f}\right)^2 \right)$, improving over prior $(\varepsilon,\delta)$-unlearning methods with a $\mathcal{O}(1/e^2)$ dependence and outperforming any forget-set-free method in a low-error regime as $r_f\to 0$. Empirical results on logistic regression with the Digit dataset show consistent gains over certified and empirical baselines, with VRU achieving the lowest excess risk for small forget fractions and the best privacy-utility trade-offs under a fixed computational budget. Overall, VRU bridges certified unlearning with forgetting-set exploitation, offering faster convergence and stronger guarantees in practical unlearning scenarios.

Abstract

In machine unlearning, $(\varepsilon,δ)-$unlearning is a popular framework that provides formal guarantees on the effectiveness of the removal of a subset of training data, the forget set, from a trained model. For strongly convex objectives, existing first-order methods achieve $(\varepsilon,δ)-$unlearning, but they only use the forget set to calibrate injected noise, never as a direct optimization signal. In contrast, efficient empirical heuristics often exploit the forget samples (e.g., via gradient ascent) but come with no formal unlearning guarantees. We bridge this gap by presenting the Variance-Reduced Unlearning (VRU) algorithm. To the best of our knowledge, VRU is the first first-order algorithm that directly includes forget set gradients in its update rule, while provably satisfying ($(\varepsilon,δ)-$unlearning. We establish the convergence of VRU and show that incorporating the forget set yields strictly improved rates, i.e. a better dependence on the achieved error compared to existing first-order $(\varepsilon,δ)-$unlearning methods. Moreover, we prove that, in a low-error regime, VRU asymptotically outperforms any first-order method that ignores the forget set.Experiments corroborate our theory, showing consistent gains over both state-of-the-art certified unlearning methods and over empirical baselines that explicitly leverage the forget set.

Variance-Reduced $(\varepsilon,δ)-$Unlearning using Forget Set Gradients

TL;DR

This paper introduces Variance-Reduced Unlearning (VRU), a first-order algorithm that directly incorporates forget-set gradients into the unlearning update while guaranteeing -unlearning for strongly convex losses. VRU achieves a variance-reduced stochastic gradient estimator anchored at the pre-trained optimum , enabling a projected SGD phase to quickly reach and a final Gaussian mechanism to satisfy the privacy guarantee. Theoretical results establish a convergence rate of , improving over prior -unlearning methods with a dependence and outperforming any forget-set-free method in a low-error regime as . Empirical results on logistic regression with the Digit dataset show consistent gains over certified and empirical baselines, with VRU achieving the lowest excess risk for small forget fractions and the best privacy-utility trade-offs under a fixed computational budget. Overall, VRU bridges certified unlearning with forgetting-set exploitation, offering faster convergence and stronger guarantees in practical unlearning scenarios.

Abstract

In machine unlearning, unlearning is a popular framework that provides formal guarantees on the effectiveness of the removal of a subset of training data, the forget set, from a trained model. For strongly convex objectives, existing first-order methods achieve unlearning, but they only use the forget set to calibrate injected noise, never as a direct optimization signal. In contrast, efficient empirical heuristics often exploit the forget samples (e.g., via gradient ascent) but come with no formal unlearning guarantees. We bridge this gap by presenting the Variance-Reduced Unlearning (VRU) algorithm. To the best of our knowledge, VRU is the first first-order algorithm that directly includes forget set gradients in its update rule, while provably satisfying (unlearning. We establish the convergence of VRU and show that incorporating the forget set yields strictly improved rates, i.e. a better dependence on the achieved error compared to existing first-order unlearning methods. Moreover, we prove that, in a low-error regime, VRU asymptotically outperforms any first-order method that ignores the forget set.Experiments corroborate our theory, showing consistent gains over both state-of-the-art certified unlearning methods and over empirical baselines that explicitly leverage the forget set.
Paper Structure (41 sections, 13 theorems, 31 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 41 sections, 13 theorems, 31 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Theorem 4.1

Let $\mathcal{F}$ be the set of $\mu$-strongly-convex, $L$-Lipschitz, and $\beta$-smooth loss functions. Then, for any $\ell\in\mathcal{F}$ and any $e>0$,

Figures (3)

  • Figure 1: Excess risk of certified unlearning and retraining methods for varying forget fractions $r_f$, under fixed computational budget (10 epochs) and privacy budget ($\kappa_{{\epsilon},\delta} = 1$). Results are averaged over 30 runs; error bars indicate $\pm 1$ standard deviation. \ref{['alg:VRU']} achieves the lowest excess risk across all tested $r_f$ values, with gains exceeding an order of magnitude for $r_f < 10^{-2}$.
  • Figure 2: Privacy-utility trade-off under fixed computational budget (5 epochs). Each point represents one method at a given $r_f$ value. Excess risk (y-axis): lower is better. Empirical privacy risk (x-axis): MIA accuracy, 50% indicates perfect unlearning. The lower-left region represents the ideal trade-off.
  • Figure 3: Ablation study on the projection step. Excess risk versus forget fraction $r_f$ for \ref{['alg:VRU']} with and without projection, using $\kappa_{\epsilon,\delta} = 0.1$. The projection step has minimal impact on convergence, indicating \ref{['alg:VRU']}'s robustness to this algorithmic choice. Error bars: $\pm 1$ std. over 30 runs.

Theorems & Definitions (22)

  • Definition 3.3: Unlearning Algorithm
  • Definition 3.4: $(\varepsilon, \delta)$-unlearning
  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Theorem 4.4: Fundamental gain from forget set access
  • Corollary 4.5
  • Lemma 1.1: Lemma C.2, van2025forget
  • Lemma 1.2
  • proof
  • ...and 12 more