Cauchy-Schwarz Fairness Regularizer
Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani
TL;DR
This work addresses group fairness in binary classification by introducing a Cauchy–Schwarz divergence-based regularizer that penalizes differences between group-conditional prediction distributions. It argues, with Gaussian-scenario analysis, that CS divergence yields tighter bounds than KL, MMD, and DP gaps, and provides a kernel-based, distribution-free estimator that extends to multiple sensitive attributes. Empirically, CS improves Demographic Parity and Equal Opportunity across five datasets (four tabular, one image) while maintaining competitive accuracy and offering a more stable utility–fairness trade-off across hyperparameters. The results support CS as a robust, generalizable debiasing mechanism and point to future work on broader fairness notions and structured data domains.
Abstract
Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers are built on heterogeneous distance measures and design choices, which makes their behavior hard to reason about and their performance inconsistent across tasks. This raises a basic question: what properties make a good fairness regularizer? We address this question by first organizing existing in-process methods into three families: (i) matching prediction statistics across sensitive groups, (ii) aligning latent representations, and (iii) directly minimizing dependence between predictions and sensitive attributes. Through this lens, we identify desirable properties of the underlying distance measure, including tight generalization bounds, robustness to scale differences, and the ability to handle arbitrary prediction distributions. Motivated by these properties, we propose a Cauchy-Schwarz (CS) fairness regularizer that penalizes the empirical CS divergence between prediction distributions conditioned on sensitive groups. Under a Gaussian comparison, we show that CS divergence yields a tighter bound than Kullback-Leibler divergence, Maximum Mean Discrepancy, and the mean disparity used in Demographic Parity, and we discuss how these advantages translate to a distribution-free, kernel-based estimator that naturally extends to multiple sensitive attributes. Extensive experiments on four tabular benchmarks and one image dataset demonstrate that the proposed CS regularizer consistently improves Demographic Parity and Equal Opportunity metrics while maintaining competitive accuracy, and achieves a more stable utility-fairness trade-off across hyperparameter settings compared to prior regularizers.
