Table of Contents
Fetching ...

Ascent Fails to Forget

Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, Volkan Cevher

TL;DR

Problem: gradient ascent–based unlearning methods frequently fail due to statistical dependencies between forget and retain data. Approach: combine theory and experiments, proving, among other results, that random forget sets cannot be unlearned without degrading performance, and analyzing logistic regression with cross-dimensional correlations to show divergence via Lambert $W$ minimizers; validate with neural-network experiments using KLoM as the unlearning metric. Findings: DA unlearning can degrade forget-set metrics, diverge from retraining solutions, and trap models in poor minima, with instability even in convex-like settings; results are corroborated by neural-net experiments. Significance: these results urge safer unlearning algorithms (e.g., rewinding or noise-based methods) and offer practical evaluation guidelines to detect and avoid ascent-induced harm.

Abstract

Contrary to common belief, we show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning, a phenomenon we attribute to the inherent statistical dependence between the forget and retain data sets. This dependence, which can manifest itself even as simple correlations, undermines the misconception that these sets can be independently manipulated during unlearning. We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. For random forget sets, this dependence means that degrading forget set metrics (which, for a retrained model, should mirror test set metrics) inevitably harms overall test performance. Going beyond random sets, we consider logistic regression as an instructive example where a critical failure mode emerges: inter-set dependence causes gradient descent-ascent iterations to progressively diverge from the ideal retrained model. Strikingly, these methods can converge to solutions that are not only far from the retrained ideal but are potentially even further from it than the original model itself, rendering the unlearning process actively detrimental. A toy example further illustrates how this dependence can trap models in inferior local minima, inescapable via finetuning. Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay.

Ascent Fails to Forget

TL;DR

Problem: gradient ascent–based unlearning methods frequently fail due to statistical dependencies between forget and retain data. Approach: combine theory and experiments, proving, among other results, that random forget sets cannot be unlearned without degrading performance, and analyzing logistic regression with cross-dimensional correlations to show divergence via Lambert minimizers; validate with neural-network experiments using KLoM as the unlearning metric. Findings: DA unlearning can degrade forget-set metrics, diverge from retraining solutions, and trap models in poor minima, with instability even in convex-like settings; results are corroborated by neural-net experiments. Significance: these results urge safer unlearning algorithms (e.g., rewinding or noise-based methods) and offer practical evaluation guidelines to detect and avoid ascent-induced harm.

Abstract

Contrary to common belief, we show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning, a phenomenon we attribute to the inherent statistical dependence between the forget and retain data sets. This dependence, which can manifest itself even as simple correlations, undermines the misconception that these sets can be independently manipulated during unlearning. We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. For random forget sets, this dependence means that degrading forget set metrics (which, for a retrained model, should mirror test set metrics) inevitably harms overall test performance. Going beyond random sets, we consider logistic regression as an instructive example where a critical failure mode emerges: inter-set dependence causes gradient descent-ascent iterations to progressively diverge from the ideal retrained model. Strikingly, these methods can converge to solutions that are not only far from the retrained ideal but are potentially even further from it than the original model itself, rendering the unlearning process actively detrimental. A toy example further illustrates how this dependence can trap models in inferior local minima, inescapable via finetuning. Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay.

Paper Structure

This paper contains 38 sections, 28 theorems, 76 equations, 7 figures, 1 table.

Key Result

Lemma 1

Given a true distribution of samples $P_\mathcal{T}$ and a forget set $\mathcal{F}$ chosen uniformly at random from the dataset and a oracle model with parameters $\theta$, then the probability that the accuracy on the test set $\text{Acc}_{\mathcal{T}}$ and the forget set $\text{Acc}_{\mathcal{F}}$

Figures (7)

  • Figure 1: Ascent Fails to Forget. We apply Gradient Ascent and Gradient Descent/Ascent to Pretrained models to unlearn a selected forget set containing points of the first Principal Component (PC) of the influence matrix from Cifar-10. KLoM scores (x-axis, y-axis) measure the quality of unlearning on a given set by comparing the distribution distance between unlearned predictions and Oracle predictions (0 means perfect unlearning $\bigstar$). We measure KLoM values over each data-point in a set and report the 95th percentile in each group. Different (x/y) points in the plot represent results for different unlearning method hyper-parameters. The colors indicate what is the relative cost of an unlearning method when compared to fully retraining the model. A Pretrained model ($\circ$) is similar to an Oracle on the validation set but very different on the forget set. On such set, unlearning with Gradient Ascent or Gradient Descent/Ascent either breaks the model or does not move much from the Pretrained starting point, we find this behavior to be consistent in most sets. Forget set selection and KLoM score metric follow georgiev2024attributetodeletemachineunlearningdatamodel. Further details on method and evaluation hyper-parameters can be found in the Appendix.
  • Figure 2: The Ascent Forgets Illusion. The left plot shows KLoM scores of Gradient Ascent when unlearning just 10 random samples (axis and points follow Fig. \ref{['fig:ascent_fails_text']}). Some runs (- - -) seem to achieve unlearning without breaking the model. On the right, we present the average KLoM between retain, validation and forget sets (y-axis) along time of unlearning (x-axis). We observe that in order for Gradient Ascent to unlearn such (easy) sets in practice, one would need to (i): select the learning rate, (ii) know when to stop fine-tuning.
  • Figure 3: Different Unlearning Difficulties. We present the KLoM scores of Gradient Ascent and Gradient Descent/Ascent when unlearning over different forget sets (axes and points follow Fig. \ref{['fig:ascent_fails_text']}). In general, the majority of runs either do nothing or break the model. Empirically, we find highly important points (left) to be the hardest to unlearn with zero realizations showing any unlearning signs at all. Random samples (center) show some Gradient Ascent runs improving the forget KLoM but with significant degradation in the models. Finally, for a set with second PC points (right) we observe some Gradient Descent/Ascent runs improve the forget KLoM without breaking the model but at a high cost, around $25\%$ of retraining an Oracle for unlearning $0.2\%$ of the data.
  • Figure 4: Cross dimensional data correlations $\epsilon$ lead DA to failure for a certain range of values. We present the range of $\alpha$ as a function of the correlation $\epsilon$, for which we can guarantee that DA is detrimental. The (- -) lines represent the minimum $\alpha$ for which the coordinates of the original model become bigger than the coordinates of the DA unlearning algorithm and with the (--) the maximum $\alpha$ for which the coordinates of the oracle are bigger than those of the original model.
  • Figure 5: Unlearning certain forget sets leads to the wrong decision boundary under GDA.Left: We show the MSE loss landscape for a pretrained model on the problem described in \ref{['sec:low_d']}. We denote as ($\color{black}\boldsymbol{\times}$) the global minimum, while ($\color{black}\boldsymbol{\circ}$) is the local minimum. Right: The effective loss landscape observed in the GDA problem (top) and the retraining problem (bottom). The combination of these results shows that retraining keeps the model in the same global optimum as the pretrained model, while GDA chooses the local minimum. This is clearly manifest in the decision boundaries favored by the different methods, denoted in dashed lines. Next to the contour plots we present two dimensional illustrations of possible decision boundaries between the samples labeled as negative ($\color{darkred}-$) and positive ($\color{darkpowderblue}\boldsymbol{+}$), while the forget set are the two positive points shaded in gray, as described in \ref{['sec:low_d']}. We show the decision boundaries for both GDA (right top) and retraining (right bottom). These conclusions concern the minimizers of a fixed objective and thus do not depend on training dynamics (e.g., step size); see \ref{['app:discussion']} for details.
  • ...and 2 more figures

Theorems & Definitions (39)

  • Lemma 1: Random Sets
  • Lemma 2: Closed Form
  • Lemma 3: Divergence Logistic Regression
  • Lemma 4: Distance Growth
  • Lemma 5: Distance Unlearning
  • Corollary 1
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • ...and 29 more