Table of Contents
Fetching ...

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Carolin Heinzler, Kasra Malihi, Amartya Sanyal

TL;DR

While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise, and refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity.

Abstract

Certified machine unlearning aims to provably remove the influence of a deletion set $U$ from a model trained on a dataset $S$, by producing an unlearned output that is statistically indistinguishable from retraining on the retain set $R:=S\setminus U$. Many existing certified unlearning methods adapt techniques from Differential Privacy (DP) and add noise calibrated to global sensitivity, i.e., the worst-case output change over all adjacent datasets. We show that this DP-style calibration is often overly conservative for unlearning, based on a key observation: certified unlearning, by definition, does not require protecting the privacy of the retained data $R$. Motivated by this distinction, we define retain sensitivity as the worst-case output change over deletions $U$ while keeping $R$ fixed. While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise. We validate these reductions in noise theoretically and empirically across several problems, including the weight of minimum spanning trees, PCA, and ERM. Finally, we refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity, leveraging the regularity induced by $R$ to further reduce noise and improve utility.

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

TL;DR

While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise, and refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity.

Abstract

Certified machine unlearning aims to provably remove the influence of a deletion set from a model trained on a dataset , by producing an unlearned output that is statistically indistinguishable from retraining on the retain set . Many existing certified unlearning methods adapt techniques from Differential Privacy (DP) and add noise calibrated to global sensitivity, i.e., the worst-case output change over all adjacent datasets. We show that this DP-style calibration is often overly conservative for unlearning, based on a key observation: certified unlearning, by definition, does not require protecting the privacy of the retained data . Motivated by this distinction, we define retain sensitivity as the worst-case output change over deletions while keeping fixed. While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise. We validate these reductions in noise theoretically and empirically across several problems, including the weight of minimum spanning trees, PCA, and ERM. Finally, we refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity, leveraging the regularity induced by to further reduce noise and improve utility.
Paper Structure (31 sections, 26 theorems, 68 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 26 theorems, 68 equations, 2 figures, 4 tables, 2 algorithms.

Key Result

Corollary 2.8

For any dataset $R$ and function $f$, we have $\mathop{\mathrm{\mathrm{RS}}}\nolimits_f(R)\leq \mathop{\mathrm{\mathrm{LS}}}\nolimits_f(R)$

Figures (2)

  • Figure 1: Retain vs. global sensitivity (Passive): In all cases, smaller is better; the gap is largest when the retained data are well-conditioned (large empirical curvature/margin/not concentrated), while the ratios approach $1$ in regimes where worst-case and data-dependent bounds coincide. For details on experiments see \ref{['app:experiments']}.
  • Figure 2: Retain vs. global sensitivity (Active):\ref{['fig:active_mse_d2d', 'fig:active_logloss_d2d']} show the ratio of iteration counts ${I_R/I}$ in Descent-to-Delete (D2D) to guarantee $\left({\varepsilon,\delta}\right)$-unlearning for fixed $\left({\varepsilon=1, \delta=10^{-5}, \sigma=0.1}\right)$. \ref{['fig:active_newton']} shows the ratio ${\mathop{\mathrm{\mathrm{RS}}}\nolimits\left({R}\right)/\Delta_{\mathop{\mathrm{\mathrm{GS}}}\nolimits}}$ for the Newton-step update. In all cases, smaller is better, and the gap is largest for small regularization $\lambda$. \ref{['fig:active_newton_accuracy']} plots $\lambda$ against the accuarcy of Newton step update. Accuracy is highest for small $\lambda$, on lower dimensional data. For details on experiments see \ref{['app:experiments']}.

Theorems & Definitions (70)

  • Definition 2.1: $(\varepsilon,\delta)$-Indistinguishability
  • Definition 2.2: $(\varepsilon,\delta)$-Unlearning sekhari2021RememberWhatYou
  • Definition 2.3: Active vs. passive unlearning
  • Definition 2.4: $(\varepsilon,\delta)$-Differential Privacy (DP) dwork2006calibrating
  • Definition 2.5: Global Sensitivity
  • Definition 2.6: Local Sensitivity
  • Definition 2.7: Retain Sensitivity
  • Corollary 2.8
  • Remark 2.9
  • Definition 2.10: Retain Sensitivity for Unlearning
  • ...and 60 more