Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Carolin Heinzler; Kasra Malihi; Amartya Sanyal

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Carolin Heinzler, Kasra Malihi, Amartya Sanyal

TL;DR

While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise, and refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity.

Abstract

Certified machine unlearning aims to provably remove the influence of a deletion set $U$ from a model trained on a dataset $S$, by producing an unlearned output that is statistically indistinguishable from retraining on the retain set $R:=S\setminus U$. Many existing certified unlearning methods adapt techniques from Differential Privacy (DP) and add noise calibrated to global sensitivity, i.e., the worst-case output change over all adjacent datasets. We show that this DP-style calibration is often overly conservative for unlearning, based on a key observation: certified unlearning, by definition, does not require protecting the privacy of the retained data $R$. Motivated by this distinction, we define retain sensitivity as the worst-case output change over deletions $U$ while keeping $R$ fixed. While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise. We validate these reductions in noise theoretically and empirically across several problems, including the weight of minimum spanning trees, PCA, and ERM. Finally, we refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity, leveraging the regularity induced by $R$ to further reduce noise and improve utility.

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

TL;DR

Abstract

Certified machine unlearning aims to provably remove the influence of a deletion set

from a model trained on a dataset

, by producing an unlearned output that is statistically indistinguishable from retraining on the retain set

. Many existing certified unlearning methods adapt techniques from Differential Privacy (DP) and add noise calibrated to global sensitivity, i.e., the worst-case output change over all adjacent datasets. We show that this DP-style calibration is often overly conservative for unlearning, based on a key observation: certified unlearning, by definition, does not require protecting the privacy of the retained data

. Motivated by this distinction, we define retain sensitivity as the worst-case output change over deletions

while keeping

fixed. While insufficient for DP, retain sensitivity is exactly sufficient for unlearning, allowing for the same certificates with less noise. We validate these reductions in noise theoretically and empirically across several problems, including the weight of minimum spanning trees, PCA, and ERM. Finally, we refine the analysis of two widely used certified unlearning algorithms through the lens of retain sensitivity, leveraging the regularity induced by

to further reduce noise and improve utility.

Paper Structure (31 sections, 26 theorems, 68 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 26 theorems, 68 equations, 2 figures, 4 tables, 2 algorithms.

Introduction
Retain Sensitivity
Preliminaries
Global, Local, and Smooth Sensitivity
Retain Sensitivity
Unlearning using Retain Sensitivity
Passive Unlearning
Median
Minimum Spanning Tree (MST) Weight
Principal Component Analysis (PCA)
Support Vector Machine (SVM) - Hard Margin
Empirical Risk Minimiser (ERM)
Active Unlearning Algorithms
Descent-to-Delete
Newton Step Update
...and 16 more sections

Key Result

Corollary 2.8

For any dataset $R$ and function $f$, we have $\mathop{\mathrm{\mathrm{RS}}}\nolimits_f(R)\leq \mathop{\mathrm{\mathrm{LS}}}\nolimits_f(R)$

Figures (2)

Figure 1: Retain vs. global sensitivity (Passive): In all cases, smaller is better; the gap is largest when the retained data are well-conditioned (large empirical curvature/margin/not concentrated), while the ratios approach $1$ in regimes where worst-case and data-dependent bounds coincide. For details on experiments see \ref{['app:experiments']}.
Figure 2: Retain vs. global sensitivity (Active):\ref{['fig:active_mse_d2d', 'fig:active_logloss_d2d']} show the ratio of iteration counts ${I_R/I}$ in Descent-to-Delete (D2D) to guarantee $\left({\varepsilon,\delta}\right)$-unlearning for fixed $\left({\varepsilon=1, \delta=10^{-5}, \sigma=0.1}\right)$. \ref{['fig:active_newton']} shows the ratio ${\mathop{\mathrm{\mathrm{RS}}}\nolimits\left({R}\right)/\Delta_{\mathop{\mathrm{\mathrm{GS}}}\nolimits}}$ for the Newton-step update. In all cases, smaller is better, and the gap is largest for small regularization $\lambda$. \ref{['fig:active_newton_accuracy']} plots $\lambda$ against the accuarcy of Newton step update. Accuracy is highest for small $\lambda$, on lower dimensional data. For details on experiments see \ref{['app:experiments']}.

Theorems & Definitions (70)

Definition 2.1: $(\varepsilon,\delta)$-Indistinguishability
Definition 2.2: $(\varepsilon,\delta)$-Unlearning sekhari2021RememberWhatYou
Definition 2.3: Active vs. passive unlearning
Definition 2.4: $(\varepsilon,\delta)$-Differential Privacy (DP) dwork2006calibrating
Definition 2.5: Global Sensitivity
Definition 2.6: Local Sensitivity
Definition 2.7: Retain Sensitivity
Corollary 2.8
Remark 2.9
Definition 2.10: Retain Sensitivity for Unlearning
...and 60 more

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

TL;DR

Abstract

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (70)