Table of Contents
Fetching ...

FairContrast: Enhancing Fairness through Contrastive learning and Customized Augmenting Methods on Tabular Data

Aida Tayebi, Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Ozlem Ozmen Garibay

TL;DR

The paper tackles bias in tabular-data models by introducing FairContrast, a fairness-aware contrastive-learning framework that employs a specialized positive-pair sampling strategy and a hybrid loss combining supervised/self-supervised contrastive objectives with binary cross-entropy. The theoretical analysis shows that the approach implicitly balance label-relevant information against leakage of sensitive attributes, enabling a data-driven fairness-utility trade-off without extra adversaries or estimators. Empirically, FairContrast delivers reduced bias (lower Demographic Parity) with minimal accuracy loss across three datasets (Adult, German, Heritage Health) in both supervised and unsupervised modes, outperforming several state-of-the-art tabular fairness baselines. The work highlights the potential of contrastive learning for fair representations in tabular domains and points to avenues for extending to other fairness notions and data modalities.

Abstract

As AI systems become more embedded in everyday life, the development of fair and unbiased models becomes more critical. Considering the social impact of AI systems is not merely a technical challenge but a moral imperative. As evidenced in numerous research studies, learning fair and robust representations has proven to be a powerful approach to effectively debiasing algorithms and improving fairness while maintaining essential information for prediction tasks. Representation learning frameworks, particularly those that utilize self-supervised and contrastive learning, have demonstrated superior robustness and generalizability across various domains. Despite the growing interest in applying these approaches to tabular data, the issue of fairness in these learned representations remains underexplored. In this study, we introduce a contrastive learning framework specifically designed to address bias and learn fair representations in tabular datasets. By strategically selecting positive pair samples and employing supervised and self-supervised contrastive learning, we significantly reduce bias compared to existing state-of-the-art contrastive learning models for tabular data. Our results demonstrate the efficacy of our approach in mitigating bias with minimum trade-off in accuracy and leveraging the learned fair representations in various downstream tasks.

FairContrast: Enhancing Fairness through Contrastive learning and Customized Augmenting Methods on Tabular Data

TL;DR

The paper tackles bias in tabular-data models by introducing FairContrast, a fairness-aware contrastive-learning framework that employs a specialized positive-pair sampling strategy and a hybrid loss combining supervised/self-supervised contrastive objectives with binary cross-entropy. The theoretical analysis shows that the approach implicitly balance label-relevant information against leakage of sensitive attributes, enabling a data-driven fairness-utility trade-off without extra adversaries or estimators. Empirically, FairContrast delivers reduced bias (lower Demographic Parity) with minimal accuracy loss across three datasets (Adult, German, Heritage Health) in both supervised and unsupervised modes, outperforming several state-of-the-art tabular fairness baselines. The work highlights the potential of contrastive learning for fair representations in tabular domains and points to avenues for extending to other fairness notions and data modalities.

Abstract

As AI systems become more embedded in everyday life, the development of fair and unbiased models becomes more critical. Considering the social impact of AI systems is not merely a technical challenge but a moral imperative. As evidenced in numerous research studies, learning fair and robust representations has proven to be a powerful approach to effectively debiasing algorithms and improving fairness while maintaining essential information for prediction tasks. Representation learning frameworks, particularly those that utilize self-supervised and contrastive learning, have demonstrated superior robustness and generalizability across various domains. Despite the growing interest in applying these approaches to tabular data, the issue of fairness in these learned representations remains underexplored. In this study, we introduce a contrastive learning framework specifically designed to address bias and learn fair representations in tabular datasets. By strategically selecting positive pair samples and employing supervised and self-supervised contrastive learning, we significantly reduce bias compared to existing state-of-the-art contrastive learning models for tabular data. Our results demonstrate the efficacy of our approach in mitigating bias with minimum trade-off in accuracy and leveraging the learned fair representations in various downstream tasks.

Paper Structure

This paper contains 13 sections, 4 theorems, 15 equations, 3 figures, 2 tables.

Key Result

Lemma 3.1

For any encoder $f_\theta$ and any positive-pair distribution, Thus minimising $\mathcal{L}_{\mathrm{NCE}}$ maximises the mutual information $I(Z;Z^{+})$.

Figures (3)

  • Figure 1: The schematic diagram illustrates the proposed fairness-aware contrastive learning framework. Our approach involves selectively sampling positive pairs based on specific criteria and integrating them into the training process with a contrastive loss in an end-to-end manner. Although combining supervised contrastive learning with cross-entropy loss improves model robustness, contrastive loss without explicit bias mitigation can unintentionally separate instances across sensitive attributes in the representation space. Our proposed fairness-aware contrastive loss, together with cross-entropy, reduces this separation by bringing positive-class instances from different sensitive groups closer, thereby improving fairness without requiring additional fairness-specific constraint loss functions.
  • Figure 2: Accuracy-fairness trade-off and comparison to various benchmark models across three benchmark datasets: (a) UCI Adult dataset, (b) Heritage Health dataset, and (c) German Credit. The optimal region on the graph is the lower right corner, representing high accuracy and low demographic parity. Our model demonstrates a superior fairness-accuracy trade-off.
  • Figure 3: Effect of varying $\alpha$ on the Area Over the Fairness–Accuracy Pareto Curve (AOC) for supervised and unsupervised settings. Each point represents the AOC score at a specific $\alpha$ value. The trade-off stabilizes for $\alpha > 1$, indicating consistent fairness–accuracy performance in both learning modes.

Theorems & Definitions (6)

  • Lemma 3.1: InfoNCE lower bound oord2018representation
  • Proposition 3.2: Mutual-information decomposition
  • proof : Proof (chain rule only)
  • Theorem 3.3: InfoNCE $\Longleftrightarrow$ information bottleneck
  • proof
  • Corollary 3.3.1