Fair Machine Unlearning: Data Removal while Mitigating Disparities
Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju
TL;DR
This paper tackles data deletion in machine learning by introducing the first fair unlearning method that preserves popular group fairness notions while efficiently removing data. It couples a convex fair loss, incorporating a pairwise fairness regularizer targeting Equalized Odds, with an unlearning procedure that updates model parameters using a Newton-like step and a Gaussian loss perturbation to achieve statistical indistinguishability. The authors prove theoretical guarantees on both unlearning (via $(\epsilon,\delta)$-indistinguishability) and fairness (bounded AEOD change) and validate the approach on three real-world datasets under random and subgroup deletions, showing strong alignment with retraining in both fairness and accuracy. The work demonstrates that unlearning and fairness can be satisfied simultaneously, enabling practical deployment of compliant data-removal systems. This advances the regulatory and ethical deployment of ML in sensitive domains by providing provable guarantees and scalable performance.
Abstract
The Right to be Forgotten is a core principle outlined by regulatory frameworks such as the EU's General Data Protection Regulation (GDPR). This principle allows individuals to request that their personal data be deleted from deployed machine learning models. While "forgetting" can be naively achieved by retraining on the remaining dataset, it is computationally expensive to do to so with each new request. As such, several machine unlearning methods have been proposed as efficient alternatives to retraining. These methods aim to approximate the predictive performance of retraining, but fail to consider how unlearning impacts other properties critical to real-world applications such as fairness. In this work, we demonstrate that most efficient unlearning methods cannot accommodate popular fairness interventions, and we propose the first fair machine unlearning method that can efficiently unlearn data instances from a fair objective. We derive theoretical results which demonstrate that our method can provably unlearn data and provably maintain fairness performance. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.
