Leveraging Per-Instance Privacy for Machine Unlearning
Nazanin Mohammadi Sepahvand, Anvith Thudi, Berivan Isik, Ashmita Bhattacharyya, Nicolas Papernot, Eleni Triantafillou, Daniel M. Roy, Gintare Karolina Dziugaite
TL;DR
The paper develops per-instance $\,D_{\alpha}$-Rényi privacy losses to quantify the difficulty of unlearning individual datapoints in neural networks, addressing the inefficiency of retraining from scratch. By integrating Langevin unlearning theory with per-instance privacy analysis, it derives data-dependent bounds on the number of unlearning steps $k$ needed to forget a datapoint, and shows that $k$ scales with the per-instance loss $P(x,4\alpha)$, while an irreducible term reflects distance to stationarity. Empirically, the authors validate that per-instance privacy losses predict unlearning difficulty across SGLD and standard fine-tuning, and that these losses correlate with, yet outperform, traditional data-difficulty proxies and loss-barrier metrics. They also demonstrate that data points with higher privacy losses correspond to larger loss barriers along linear paths in weight space, providing a geometric interpretation of unlearning difficulty. Overall, the work lays a foundation for adaptive, data-aware unlearning strategies and suggests avenues for integrating per-instance privacy losses into practical unlearning pipelines and proxy metrics.
Abstract
We present a principled, per-instance approach to quantifying the difficulty of unlearning via fine-tuning. We begin by sharpening an analysis of noisy gradient descent for unlearning (Chien et al., 2024), obtaining a better utility-unlearning tradeoff by replacing worst-case privacy loss bounds with per-instance privacy losses (Thudi et al., 2024), each of which bounds the (Renyi) divergence to retraining without an individual data point. To demonstrate the practical applicability of our theory, we present empirical results showing that our theoretical predictions are born out both for Stochastic Gradient Langevin Dynamics (SGLD) as well as for standard fine-tuning without explicit noise. We further demonstrate that per-instance privacy losses correlate well with several existing data difficulty metrics, while also identifying harder groups of data points, and introduce novel evaluation methods based on loss barriers. All together, our findings provide a foundation for more efficient and adaptive unlearning strategies tailored to the unique properties of individual data points.
