Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning
Nexhi Sula, Abhinav Kumar, Jie Hou, Han Wang, Reza Tourani
TL;DR
The paper addresses the challenge of removing a data subject's influence from trained neural networks under GDPR without full retraining. It introduces ReMI, a machine unlearning framework that uses a privacy approximation function, such as membership inference or membership fingerprinting, to guide weight refinement and minimize leakage from forgotten data while preserving primary-task performance. A novel unlearning loss combines the target-model loss with a leakage term and is augmented by a KL-divergence-based objective to reduce distinguishability between forgotten data and out-of-sample data; a Gaussian-based upper bound is provided for tractability. Empirically, ReMI demonstrates strong unlearning efficacy and latency advantages across four datasets and four architectures, outperforming naive retraining and Fisher unlearning in several settings and enabling rapid, privacy-preserving forgetting with maintained accuracy.
Abstract
With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also facilitates the elimination of sensitive data fingerprints within machine learning models to mitigate potential attack - a process referred to as machine unlearning. In this study, we present a novel unlearning mechanism designed to effectively remove the impact of specific data samples from a neural network while considering the performance of the unlearned model on the primary task. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model by combining target classification loss and membership inference loss. Our adaptable framework can easily incorporate various privacy leakage approximation mechanisms to guide the unlearning process. We provide empirical evidence of the effectiveness of our unlearning approach with a theoretical upper-bound analysis through a membership inference mechanism as a proof of concept. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task, across four datasets and four deep learning architectures.
