Hessian-Free Online Certified Unlearning
Xinbao Qiao, Meng Zhang, Ming Tang, Ermin Wei
TL;DR
This work introduces Hessian-Free Online Certified Unlearning (HF-OCU), a method to forget data from trained models with certified guarantees without performing Hessian inversions. By maintaining per-sample impact statistics computed through an affine stochastic recursion and Hessian-vector products, HF-OCU enables near-instantaneous online data removal via simple vector additions, while preserving or improving unlearning and generalization guarantees compared to Hessian-based baselines. Theoretical bounds show improved unlearning error and generalization behavior under non-convex settings, and experiments demonstrate millisecond-level unlearning, reduced precomputation/storage costs, and robust performance across convex and non-convex models, with open-source code provided. The approach also analyzes privacy implications (MIA-L/MIA-U) and demonstrates how calibrated noise can defend against membership inference while maintaining utility, highlighting practical significance for rights-to-forget in modern ML systems.
Abstract
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent advances suggest pre-computing and storing statistics extracted from second-order information and implementing unlearning through Newton-style updates. However, the Hessian matrix operations are extremely costly and previous works conduct unlearning for empirical risk minimizer with the convexity assumption, precluding their applicability to high-dimensional over-parameterized models and the nonconvergence condition. In this paper, we propose an efficient Hessian-free unlearning approach. The key idea is to maintain a statistical vector for each training data, computed through affine stochastic recursion of the difference between the retrained and learned models. We prove that our proposed method outperforms the state-of-the-art methods in terms of the unlearning and generalization guarantees, the deletion capacity, and the time/storage complexity, under the same regularity conditions. Through the strategy of recollecting statistics for removing data, we develop an online unlearning algorithm that achieves near-instantaneous data removal, as it requires only vector addition. Experiments demonstrate that our proposed scheme surpasses existing results by orders of magnitude in terms of time/storage costs with millisecond-level unlearning execution, while also enhancing test accuracy.
