FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

Puheng Li; James Zou; Linjun Zhang

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

Puheng Li, James Zou, Linjun Zhang

TL;DR

FaiREE tackles the challenge of enforcing group fairness in classification with guarantees that hold in finite samples and without distributional assumptions, via a post-processing approach. It scores a given classifier, constructs a candidate set of thresholds using order statistics and Beta-distributed bounds to ensure fairness constraints (e.g., $|DEOO(\phi)|\leq \alpha$) hold with high probability, and selects the threshold that minimizes the mis-classification error within feasibility. The method demonstrates theoretical guarantees for fairness and near-optimal accuracy, extends to multiple fairness notions including Equality of Opportunity and Equalized Odds, and shows strong empirical performance on synthetic and real datasets (e.g., Adult Census) against state-of-the-art baselines. The work offers a practically valuable, distribution-free tool for fair classification with clear guidance on sample-size requirements and applicability to various fairness constraints, enabling reliable deployment in real-world settings.

Abstract

Algorithmic fairness plays an increasingly critical role in machine learning research. Several group fairness notions and algorithms have been proposed. However, the fairness guarantee of existing fair classification methods mainly depends on specific data distributional assumptions, often requiring large sample sizes, and fairness could be violated when there is a modest number of samples, which is often the case in practice. In this paper, we propose FaiREE, a fair classification algorithm that can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. FaiREE can be adapted to satisfy various group fairness notions (e.g., Equality of Opportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimal accuracy. These theoretical guarantees are further supported by experiments on both synthetic and real data. FaiREE is shown to have favorable performance over state-of-the-art algorithms.

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

TL;DR

) hold with high probability, and selects the threshold that minimizes the mis-classification error within feasibility. The method demonstrates theoretical guarantees for fairness and near-optimal accuracy, extends to multiple fairness notions including Equality of Opportunity and Equalized Odds, and shows strong empirical performance on synthetic and real datasets (e.g., Adult Census) against state-of-the-art baselines. The work offers a practically valuable, distribution-free tool for fair classification with clear guidance on sample-size requirements and applicability to various fairness constraints, enabling reliable deployment in real-world settings.

Abstract

Paper Structure (30 sections, 25 theorems, 84 equations, 9 figures, 10 tables, 5 algorithms)

This paper contains 30 sections, 25 theorems, 84 equations, 9 figures, 10 tables, 5 algorithms.

Introduction
Additional Related Works.
Preliminary
FaiREE: A Finite Sample Based Algorithm
The general pipeline of FaiREE
Application to Equality of Opportunity
Candidate Set Construction
Candidate Selection
Application to More Fairness Notions
Equalized Odds
On comparing different fairness constraints
Experiments
Synthetic data
Real Data analysis
Conclusion and discussion
...and 15 more sections

Key Result

Proposition 3.1

Consider $k^{1,a} \in \{1,\ldots,n^{1,a}\}$ for $a\in\{0,1\}$, and the score-based classifier $\phi(x,a)=\mathbbm{1}\{f(x,a))>t^{1,a}_{(k^{1,a})}\}$. Let $g_{1}(k, a)=\mathbb{E}[\sum\limits^{n^{1,a}}_{j=k}{n^{1,a}\choose j}(Q^{1,1-a}-\alpha)^{j}(1-(Q^{1,1-a}-\alpha))^{n^{1,a}-j}]$ with $Q^{1,a}\sim Additionally, if $t^{1,a}_{(k^{1,a})}$ is a continuous random variable, the inequality above become

Figures (9)

Figure 1: Comparison of FairBayes and FaiREE on the synthetic data with sample size = 1000. See Table \ref{['table1']} for detailed numerical results. Left: $DEOO$ v.s. $\alpha$, Right: DEOO v.s. Test accuracy. Here, $DEOO$ is the degree of violation to fairness constraint Equality of Opportunity and $\alpha$ is the pre-specified desired level to upper bound $DEOO$ for both methods. See Eq. (\ref{['deoo']}) in Section \ref{['section2']} for a more detailed definition.
Figure 2: A concrete pipeline of FaiREE for Equality of Opportunity. Edges in Step 2 represent the selected candidate pair and the red edge in Step 3 represents the final optimal candidate selected from all the edges. Each pair represents two different thresholds of a single classifier.
Figure 3: DEOO v.s. Accuracy, as a complementary figure for Figure \ref{['compare']}
Figure 4: DEOO v.s. Accuracy & DPE v.s. Accuracy for Model 1
Figure 5: DEOO v.s. Accuracy & DPE v.s. Accuracy for Model 2
...and 4 more figures

Theorems & Definitions (41)

Definition 2.1
Definition 2.2
Definition 2.3
Proposition 3.1
Theorem 3.2
Lemma 3.3: Adapted from Theorem E.4 in zeng2022bayes
Proposition 3.4
Lemma 3.5
Theorem 3.6
Proposition 4.1
...and 31 more

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

TL;DR

Abstract

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (41)