From Machine Learning to Machine Unlearning: Complying with GDPR's Right to be Forgotten while Maintaining Business Value of Predictive Models
Yuncong Yang, Xiao Han, Yidong Chai, Reza Ebrahimi, Rouzbeh Behnia, Balaji Padmanabhan
TL;DR
This work tackles GDPR’s Right to Be Forgotten by proposing ETID, a holistic framework that jointly addresses predictive model construction and data-erasure responses. ETID combines a Reference-Oriented Ensemble Learning (ROEL) backbone with Iterative Information Distillation (TID) to erase unlearning data efficiently while preserving accuracy and ensuring verifiability. Empirical results on consumer profiling and image classification demonstrate that ETID outperforms state-of-the-art unlearning methods in consistency, accuracy, and efficiency, while maintaining demonstrable verifiability. The approach offers practical benefits for GDPR compliance and supports a trustworthy, data-driven service market, with broader applicability to privacy management, fairness, and robustness in predictive analytics.
Abstract
Recent privacy regulations (e.g., GDPR) grant data subjects the `Right to Be Forgotten' (RTBF) and mandate companies to fulfill data erasure requests from data subjects. However, companies encounter great challenges in complying with the RTBF regulations, particularly when asked to erase specific training data from their well-trained predictive models. While researchers have introduced machine unlearning methods aimed at fast data erasure, these approaches often overlook maintaining model performance (e.g., accuracy), which can lead to financial losses and non-compliance with RTBF obligations. This work develops a holistic machine learning-to-unlearning framework, called Ensemble-based iTerative Information Distillation (ETID), to achieve efficient data erasure while preserving the business value of predictive models. ETID incorporates a new ensemble learning method to build an accurate predictive model that can facilitate handling data erasure requests. ETID also introduces an innovative distillation-based unlearning method tailored to the constructed ensemble model to enable efficient and effective data erasure. Extensive experiments demonstrate that ETID outperforms various state-of-the-art methods and can deliver high-quality unlearned models with efficiency. We also highlight ETID's potential as a crucial tool for fostering a legitimate and thriving market for data and predictive services.
