Table of Contents
Fetching ...

Cauchy-Schwarz Fairness Regularizer

Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani

TL;DR

This work addresses group fairness in binary classification by introducing a Cauchy–Schwarz divergence-based regularizer that penalizes differences between group-conditional prediction distributions. It argues, with Gaussian-scenario analysis, that CS divergence yields tighter bounds than KL, MMD, and DP gaps, and provides a kernel-based, distribution-free estimator that extends to multiple sensitive attributes. Empirically, CS improves Demographic Parity and Equal Opportunity across five datasets (four tabular, one image) while maintaining competitive accuracy and offering a more stable utility–fairness trade-off across hyperparameters. The results support CS as a robust, generalizable debiasing mechanism and point to future work on broader fairness notions and structured data domains.

Abstract

Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers are built on heterogeneous distance measures and design choices, which makes their behavior hard to reason about and their performance inconsistent across tasks. This raises a basic question: what properties make a good fairness regularizer? We address this question by first organizing existing in-process methods into three families: (i) matching prediction statistics across sensitive groups, (ii) aligning latent representations, and (iii) directly minimizing dependence between predictions and sensitive attributes. Through this lens, we identify desirable properties of the underlying distance measure, including tight generalization bounds, robustness to scale differences, and the ability to handle arbitrary prediction distributions. Motivated by these properties, we propose a Cauchy-Schwarz (CS) fairness regularizer that penalizes the empirical CS divergence between prediction distributions conditioned on sensitive groups. Under a Gaussian comparison, we show that CS divergence yields a tighter bound than Kullback-Leibler divergence, Maximum Mean Discrepancy, and the mean disparity used in Demographic Parity, and we discuss how these advantages translate to a distribution-free, kernel-based estimator that naturally extends to multiple sensitive attributes. Extensive experiments on four tabular benchmarks and one image dataset demonstrate that the proposed CS regularizer consistently improves Demographic Parity and Equal Opportunity metrics while maintaining competitive accuracy, and achieves a more stable utility-fairness trade-off across hyperparameter settings compared to prior regularizers.

Cauchy-Schwarz Fairness Regularizer

TL;DR

This work addresses group fairness in binary classification by introducing a Cauchy–Schwarz divergence-based regularizer that penalizes differences between group-conditional prediction distributions. It argues, with Gaussian-scenario analysis, that CS divergence yields tighter bounds than KL, MMD, and DP gaps, and provides a kernel-based, distribution-free estimator that extends to multiple sensitive attributes. Empirically, CS improves Demographic Parity and Equal Opportunity across five datasets (four tabular, one image) while maintaining competitive accuracy and offering a more stable utility–fairness trade-off across hyperparameters. The results support CS as a robust, generalizable debiasing mechanism and point to future work on broader fairness notions and structured data domains.

Abstract

Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers are built on heterogeneous distance measures and design choices, which makes their behavior hard to reason about and their performance inconsistent across tasks. This raises a basic question: what properties make a good fairness regularizer? We address this question by first organizing existing in-process methods into three families: (i) matching prediction statistics across sensitive groups, (ii) aligning latent representations, and (iii) directly minimizing dependence between predictions and sensitive attributes. Through this lens, we identify desirable properties of the underlying distance measure, including tight generalization bounds, robustness to scale differences, and the ability to handle arbitrary prediction distributions. Motivated by these properties, we propose a Cauchy-Schwarz (CS) fairness regularizer that penalizes the empirical CS divergence between prediction distributions conditioned on sensitive groups. Under a Gaussian comparison, we show that CS divergence yields a tighter bound than Kullback-Leibler divergence, Maximum Mean Discrepancy, and the mean disparity used in Demographic Parity, and we discuss how these advantages translate to a distribution-free, kernel-based estimator that naturally extends to multiple sensitive attributes. Extensive experiments on four tabular benchmarks and one image dataset demonstrate that the proposed CS regularizer consistently improves Demographic Parity and Equal Opportunity metrics while maintaining competitive accuracy, and achieves a more stable utility-fairness trade-off across hyperparameter settings compared to prior regularizers.

Paper Structure

This paper contains 68 sections, 2 theorems, 54 equations, 26 figures, 14 tables.

Key Result

Proposition 4.1

Given two sets of observations $\{{\bf x}^{p}_i\}_{i=1}^{N_{1}}$ and $\{{\bf x}^{q}_j\}_{j=1}^{N_{2}}$, let $p$ and $q$ denote the distributions of the two groups. The empirical estimator of the CS divergence $D_{\text{CS}}(p;q)$ is then given by:

Figures (26)

  • Figure 1: From left to right: (1) Prediction distribution of all classes; (2) T-SNE plot of embeddings for samples from all classes; (3) Prediction distribution of class 1; (4) T-SNE plot of embeddings for samples from Adult, and the sensitive attribute is gender. The blue points represent samples with sensitive attribute $0$, while the red points represent samples with sensitive attribute $1$.
  • Figure 2: Fairness loss landscapes evaluated using three functions, presented from left to right: Kullback-Leibler (KL) divergence, Hilbert-Schmidt Independence Criterion (HSIC), and Cauchy-Schwarz (CS) divergence. A smaller inner circle indicates greater robustness. Among these methods, the CS divergence achieves the smallest inner circle, ranging from $-2$ to $1$, while the inner circles of KL and HSIC divergences both span from $-2$ to $2$.
  • Figure 3: Fairness-accuracy trade-off curves on the test sets for (left) Adult, (middle) COMPAS, and (bottom) ACS-I. Ideally, results should be positioned in the bottom-right corner.
  • Figure 4: Prediction distributions for female and male groups in the Adult dataset. The top row shows kernel density estimates of the raw predictions $\hat{Y}$ for all target labels, grouped by gender, while the bottom row shows the prediction densities for the positive class, $\hat{Y}=1$, for the two gender groups. Each column corresponds to a different fairness regularizer. A larger overlap between the blue and red curves indicates better group fairness, and the reported values above each panel give the corresponding gaps in $\Delta_{\mathrm{DP}}$ (top row) and $\Delta_{\mathrm{EO}}$ (bottom row).
  • Figure 5: Parameter sensitivity of theCS regularizer on Adult: heatmaps show test accuracy (left) and $\Delta_{\mathrm{DP}}$ (right) as the fairness weight $\alpha$ and $\ell_2$ weight $\beta$ vary over the cross-validated ranges. Overall, CS exhibits a smooth utility–fairness trade-off, remaining stable over a broad range of $\beta$ and becoming noticeably more sensitive only when $\alpha$ is very large.
  • ...and 21 more figures

Theorems & Definitions (6)

  • Proposition 4.1: Empirical CS divergence estimator, cf. jenssen2006cauchyprincipe2000information
  • Proposition 4.2
  • Remark A.1
  • proof
  • proof
  • proof