Fairness via Independence: A (Conditional) Distance Covariance Framework
Ruifan Huang, Haixia Liu
TL;DR
This work frames fairness as statistical independence between predictions and sensitive attributes, leveraging empirical distance covariance (DC) and conditional distance covariance (CDC) as train-time penalties. It introduces matrix forms for efficient parallel computation and provides convergence guarantees for empirical- population distance covariance in batch settings. A Lagrangian dual approach dynamically balances accuracy and fairness, demonstrated across tabular and image datasets (including CelebA and UTKFace) with competitive DP/EO trade-offs. The results highlight the method's versatility, scalability, and applicability to high-dimensional, tensor-valued data without relying on strong distributional assumptions.
Abstract
We explore fairness from a statistical perspective by selectively utilizing either conditional distance covariance or distance covariance statistics as measures to assess the independence between predictions and sensitive attributes. We boost fairness with independence by adding a distance covariance-based penalty to the model's training. Additionally, we present the matrix form of empirical (conditional) distance covariance for parallel calculations to enhance computational efficiency. Theoretically, we provide a proof for the convergence between empirical and population (conditional) distance covariance, establishing necessary guarantees for batch computations. Through experiments conducted on a range of real-world datasets, we have demonstrated that our method effectively bridges the fairness gap in machine learning. Our code is available at \url{https://github.com/liuhaixias1/Fair_dc/}.
