When unlearning is free: leveraging low influence points to reduce computational costs
Anat Kleiman, Robert Fisher, Ben Deaner, Udi Wieder
TL;DR
This work tackles data privacy by reducing the cost of unlearning through pre-filtering the forget and retain sets using influence scores. It develops a theory-backed, algorithm-agnostic framework that identifies low-impact training points via approximate influence functions (Hessian-based, LESS, and Lowest Gradients) and validates it across vision and language tasks. Empirically, removing these low-influence points preserves model privacy (MIA) and accuracy while cutting unlearning time by up to about 50% in real-world scenarios. The approach demonstrates practical, cross-domain applicability and supports efficient, privacy-preserving data removal in deployed ML systems.
Abstract
As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art unlearning methods have emerged in response, they typically treat all points in the forget set equally. In this work, we challenge this approach by asking whether points that have a negligible impact on the model's learning need to be removed. Through a comparative analysis of influence functions across language and vision tasks, we identify subsets of training data with negligible impact on model outputs. Leveraging this insight, we propose an efficient unlearning framework that reduces the size of datasets before unlearning leading to significant computational savings (up to approximately 50 percent) on real world empirical examples.
