FUNU: Boosting Machine Unlearning Efficiency by Filtering Unnecessary Unlearning
Zitong Li, Qingqing Ye, Haibo Hu
TL;DR
FUNU tackles the inefficiency of machine unlearning by filtering removal requests likely to be inconsequential to the retrained model. It builds a distance matrix from a pretraining feature space, uses a one-epoch reference model to derive a data-driven similarity threshold, and prunes $D_u$ to form $D_u^+$ so that unlearning on $D_u^+$ approximates retraining within a provable bound $\epsilon$. The approach yields theoretical privacy guarantees and demonstrates substantial time savings and robust model privacy across MNIST, CIFAR-10, CIFAR-100, including effective integration with SISA, while adapting to random and class removal scenarios. Overall, FUNU provides a practical, parameter-tunable, and adaptable framework to accelerate unlearning without compromising the right to be forgotten.
Abstract
Machine unlearning is an emerging field that selectively removes specific data samples from a trained model. This capability is crucial for addressing privacy concerns, complying with data protection regulations, and correcting errors or biases introduced by certain data. Unlike traditional machine learning, where models are typically static once trained, machine unlearning facilitates dynamic updates that enable the model to ``forget'' information without requiring complete retraining from scratch. There are various machine unlearning methods, some of which are more time-efficient when data removal requests are fewer. To decrease the execution time of such machine unlearning methods, we aim to reduce the size of data removal requests based on the fundamental assumption that the removal of certain data would not result in a distinguishable retrained model. We first propose the concept of unnecessary unlearning, which indicates that the model would not alter noticeably after removing some data points. Subsequently, we review existing solutions that can be used to solve our problem. We highlight their limitations in adaptability to different unlearning scenarios and their reliance on manually selected parameters. We consequently put forward FUNU, a method to identify data points that lead to unnecessary unlearning. FUNU circumvents the limitations of existing solutions. The idea is to discover data points within the removal requests that have similar neighbors in the remaining dataset. We utilize a reference model to set parameters for finding neighbors, inspired from the area of model memorization. We provide a theoretical analysis of the privacy guarantee offered by FUNU and conduct extensive experiments to validate its efficacy.
