Fair Machine Unlearning: Data Removal while Mitigating Disparities

Alex Oesterling; Jiaqi Ma; Flavio P. Calmon; Hima Lakkaraju

Fair Machine Unlearning: Data Removal while Mitigating Disparities

Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju

TL;DR

This paper tackles data deletion in machine learning by introducing the first fair unlearning method that preserves popular group fairness notions while efficiently removing data. It couples a convex fair loss, incorporating a pairwise fairness regularizer targeting Equalized Odds, with an unlearning procedure that updates model parameters using a Newton-like step and a Gaussian loss perturbation to achieve statistical indistinguishability. The authors prove theoretical guarantees on both unlearning (via $(\epsilon,\delta)$-indistinguishability) and fairness (bounded AEOD change) and validate the approach on three real-world datasets under random and subgroup deletions, showing strong alignment with retraining in both fairness and accuracy. The work demonstrates that unlearning and fairness can be satisfied simultaneously, enabling practical deployment of compliant data-removal systems. This advances the regulatory and ethical deployment of ML in sensitive domains by providing provable guarantees and scalable performance.

Abstract

The Right to be Forgotten is a core principle outlined by regulatory frameworks such as the EU's General Data Protection Regulation (GDPR). This principle allows individuals to request that their personal data be deleted from deployed machine learning models. While "forgetting" can be naively achieved by retraining on the remaining dataset, it is computationally expensive to do to so with each new request. As such, several machine unlearning methods have been proposed as efficient alternatives to retraining. These methods aim to approximate the predictive performance of retraining, but fail to consider how unlearning impacts other properties critical to real-world applications such as fairness. In this work, we demonstrate that most efficient unlearning methods cannot accommodate popular fairness interventions, and we propose the first fair machine unlearning method that can efficiently unlearn data instances from a fair objective. We derive theoretical results which demonstrate that our method can provably unlearn data and provably maintain fairness performance. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.

Fair Machine Unlearning: Data Removal while Mitigating Disparities

TL;DR

-indistinguishability) and fairness (bounded AEOD change) and validate the approach on three real-world datasets under random and subgroup deletions, showing strong alignment with retraining in both fairness and accuracy. The work demonstrates that unlearning and fairness can be satisfied simultaneously, enabling practical deployment of compliant data-removal systems. This advances the regulatory and ethical deployment of ML in sensitive domains by providing provable guarantees and scalable performance.

Abstract

Paper Structure (44 sections, 7 theorems, 60 equations, 7 figures, 6 tables)

This paper contains 44 sections, 7 theorems, 60 equations, 7 figures, 6 tables.

INTRODUCTION
RELATED WORK
Unlearning.
Fairness.
Intersections.
PRELIMINARIES
Unlearning.
Fairness.
FAIR UNLEARNING
A Convex Fair Loss Function
The Fair Unlearning Algorithm
Efficient Model Update.
Noisy Loss Perturbation.
Runtime Complexity.
Trade-off between fairness and unlearning.
...and 29 more sections

Key Result

Theorem 1

Let $\theta_{D}^*$ be the output of algorithm $A$ trained on $\mathcal{L}^\textbf{b}$. Assume the loss $\ell$ is $\psi$-Lipschitz in its second derivative ($\ell"$ is $\psi$-Lipschitz), and bounded in its first derivative by a constant $||\ell'(\theta, x, y)||_2 \leq g$. Assume the data is bounded s and if $\textbf{b} \sim N(0, k\epsilon'/\epsilon)^d$ with k $>$ 0 and $||\nabla \mathcal{L}(\theta_

Figures (7)

Figure 1: Absolute equalized odds difference (lower is better) for unlearning methods over random requests on COMPAS, Adult, and HSLS. Our method well-approximates fair retraining.
Figure 2: Unlearning performance and leakage for various levels of noise added while unlearning 100 samples according to Thm. \ref{['thm:eps_delta']}. We set $\delta = 0.0001$ and report corresponding $\epsilon$ values in the legend.
Figure 3: Absolute equalized odds difference (lower is better) for unlearning methods when unlearning from the minority (top) and majority (bottom) subgroup on COMPAS, Adult, and HSLS.
Figure 4: Absolute demographic parity (top), equality of opportunity (middle), and subgroup test accuracy (bottom) differences (lower is better) for unlearning methods over random requests on COMPAS, Adult, and HSLS.
Figure 5: Absolute demographic parity (top), equality of opportunity (middle), and subgroup test accuracy (bottom) differences (lower is better) for unlearning methods when unlearning from the minority subgroup on COMPAS, Adult, and HSLS.
...and 2 more figures

Theorems & Definitions (9)

Definition 1: $(\epsilon, \delta)$-Statistical Indistinguishability guoCertifiedDataRemoval2020neelDescenttoDeleteGradientBasedMethods2020a.
Definition 2: Equalized Odds hardt2016equality.
Theorem 1: $(\epsilon, \delta)$-unlearning.
Theorem 2: AEOD is bounded.
Lemma 1
Lemma 2: guoCertifiedDataRemoval2020
Corollary 1: $||\theta_{D'}^- - \theta_{D'}^*||_2$ is bounded.
Lemma 3: The interior angle between $\theta_{D'}^*$ and $\theta_{D'}^-$ is bounded.
Lemma 4: Volume of internal segment of a d-sphere

Fair Machine Unlearning: Data Removal while Mitigating Disparities

TL;DR

Abstract

Fair Machine Unlearning: Data Removal while Mitigating Disparities

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)