Increasing Fairness via Combination with Learning Guarantees

Yijun Bian; Kun Zhang

Increasing Fairness via Combination with Learning Guarantees

Yijun Bian, Kun Zhang

TL;DR

This work addresses hidden discrimination in ML by introducing discriminative risk ($\mathrm{DR}$), a fairness quality that integrates individual and group perspectives. It develops first- and second-order oracle bounds and PAC bounds showing that ensemble voting can reduce discrimination through a cancellation-of-biases effect, and then contributes POAF, a Pareto-optimal ensemble pruning method that improves fairness with limited accuracy loss. The authors validate DR and the theoretical bounds across binary and multi-class tasks, and demonstrate that POAF yields fairer ensembles with competitive or superior performance compared with baseline fairness-aware methods and state-of-the-art pruning approaches. The study provides learning-guaranteed principles for boosting fairness in ensembles, with practical implications for deploying fair, multi-attribute, and multi-class classifiers.

Abstract

The concern about hidden discrimination in ML models is growing, as their widespread real-world application increasingly impacts human lives. Various techniques, including commonly used group fairness measures and several fairness-aware ensemble-based methods, have been developed to enhance fairness. However, existing fairness measures typically focus on only one aspect -- either group or individual fairness, and the hard compatibility among them indicates a possibility of remaining biases even when one of them is satisfied. Moreover, existing mechanisms to boost fairness usually present empirical results to show validity, yet few of them discuss whether fairness can be boosted with certain theoretical guarantees. To address these issues, we propose a fairness quality measure named 'discriminative risk (DR)' to reflect both individual and group fairness aspects. Furthermore, we investigate its properties and establish the first- and second-order oracle bounds to show that fairness can be boosted via ensemble combination with theoretical learning guarantees. The analysis is suitable for both binary and multi-class classification. A pruning method is also proposed to utilise our proposed measure and comprehensive experiments are conducted to evaluate the effectiveness of the proposed methods.

Increasing Fairness via Combination with Learning Guarantees

TL;DR

This work addresses hidden discrimination in ML by introducing discriminative risk (

), a fairness quality that integrates individual and group perspectives. It develops first- and second-order oracle bounds and PAC bounds showing that ensemble voting can reduce discrimination through a cancellation-of-biases effect, and then contributes POAF, a Pareto-optimal ensemble pruning method that improves fairness with limited accuracy loss. The authors validate DR and the theoretical bounds across binary and multi-class tasks, and demonstrate that POAF yields fairer ensembles with competitive or superior performance compared with baseline fairness-aware methods and state-of-the-art pruning approaches. The study provides learning-guaranteed principles for boosting fairness in ensembles, with practical implications for deploying fair, multi-attribute, and multi-class classifiers.

Abstract

Paper Structure (29 sections, 7 theorems, 23 equations, 13 figures, 8 tables, 3 algorithms)

This paper contains 29 sections, 7 theorems, 23 equations, 13 figures, 8 tables, 3 algorithms.

Introduction
Related Work
Mechanisms to enhance fairness
Types of fairness measures
Fairness-aware ensemble-based methods
Methodology
Fairness quality from both individual and group fairness aspects
The distinction of DR compared with existing fairness measures
Properties of DR and bounds regarding fairness for weighted vote
Oracle bounds regarding fairness for weighted vote
PAC bounds for the weighted vote
Application: Constructing fairer ensembles without much accuracy degradation
Empirical Results
Experimental setups
RQ1: Validating the proposed fairness quality measure
...and 14 more sections

Key Result

Theorem 1

Figures (13)

Figure 1: Comparison of the proposed discriminative risk (DR) with three group fairness measures (that is, DP, EO, and PQP). \ref{['fig:expt4a']} Scatter diagrams with the degree of correlation, where the x- and y-axes are different fairness measures and the variation of accuracy between the raw and disturbed data. \ref{['fig:expt4b']} Correlation among multiple criteria. Note that correlation here is calculated based on the results from all datasets.
Figure 2: Example: law school success. (a) Test MSE of different models, where 'undisturbed' and 'disturbed' denote the results obtained from the original and disturbed data respectively. (b) The comparison between the change in MSE and DR, which suggests that $\mathrm{DR}\!\approx \!0$ when the corresponding model satisfies or nearly satisfies counterfactual fairness.
Figure 3: Correlation for oracle bounds in \ref{['method:1']} and generalisation bounds in \ref{['method:2']}. (a--c) Correlation between $\mathcal{L}_\text{bias}(\mathbf{wv}_\rho)$ and oracle bounds, where $\mathcal{L}_\text{bias}(\mathbf{wv}_\rho)$ is indicated on the vertical axis and the horizontal axes represent the right-hand sides of inequalities \ref{['eq:2']}, \ref{['eq:5']}, and \ref{['eq:6']}, respectively. (d) The horizontal and vertical axes in \ref{['fig:thm,b']} denote the right- and left-hand sides in \ref{['eq:4']}, respectively. (e--f) Correlation between $\mathcal{L}_\text{bias}(\cdot)$ and generalisation bounds, where $\mathcal{L}_\text{bias}(\cdot)$ is indicated on the vertical axis and the right-hand sides of inequalities \ref{['eq:n8']} and \ref{['eq:8']} are indicated on the horizontal axes, respectively. Note that correlation here is calculated based on the results from all datasets.
Figure 4: Comparison between POAF and fairness-aware ensemble-based methods. (a--d) Scatter plots cruz2022fairgbm showing fairness and accuracy of each algorithm, evaluated on the test data. (e--h) Plots of best test-set fairness-accuracy trade-offs per algorithm cruz2022fairgbm, where fairness is DP, EO, PQP, and DR, respectively. Lines show the mean value, and shades show 95% confidence intervals; The smaller the better.
Figure 5: Comparison of the state-of-the-art pruning method with POAF, using bagging to conduct homogeneous ensembles. (a--c) Friedman test chart on the test accuracy, precision, and $\mathcal{L}_\text{bias}(\mathbf{wv}_\rho)$, respectively, of which each rejects the null hypothesis at the significance level of 5%; (d--h) The aggregated rank for each pruning method over the $\mathcal{L}_\text{bias}(\mathbf{wv}_\rho)$, test accuracy, precision, recall, and $\mathcal{L}(\mathbf{wv}_\rho)$, respectively.
...and 8 more figures

Theorems & Definitions (14)

Theorem 1: First-order oracle bound
Lemma 2
Theorem 3: Second-order oracle bound
Theorem 4: C-tandem oracle bound
Theorem 5
Theorem 6
Definition 1: Domination
proof : Proof of \ref{['thm:1']}
proof : Proof of \ref{['lem:2']}
proof : Proof of \ref{['thm:3']}
...and 4 more

Increasing Fairness via Combination with Learning Guarantees

TL;DR

Abstract

Increasing Fairness via Combination with Learning Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (14)