Table of Contents
Fetching ...

TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models

Chong Lyu, Lin Li, Shiqing Wu, Jingling Yuan

TL;DR

TriCon-Fair tackles social bias in pre-trained language models by introducing a decoupled triplet contrastive learning framework that pairs anchors with explicit biased negatives and unbiased positives (via counterfactuals) while jointly optimizing a language modeling objective to preserve utility. The method builds biased-unbiased triplets, applies a decoupled loss that separates positive and negative gradients, and trains with $L_{total} = L_{Triplet} + lambda L_{LM}$ to achieve reduced bias with minimal degradation in downstream tasks. Across BERT, ALBERT, GPT-2, and LLaMA-2 backbones, TriCon-Fair shows reduced bias on StereoSet (lower SS) and higher ICAT, with only marginal LM changes, and largely preserves MNLI and SST-2 accuracy, outperforming several baselines. The findings demonstrate that decoupling contrastive forces is an effective strategy for fairness-oriented representation learning in NLP. Limitations include evaluation limited to English, static bias benchmarks, and future work targeting multilinguality, inference-time efficiency, and alignment with user preferences.

Abstract

The increasing utilization of large language models raises significant concerns about the propagation of social biases, which may result in harmful and unfair outcomes. However, existing debiasing methods treat the biased and unbiased samples independently, thus ignoring their mutual relationship. This oversight enables a hidden negative-positive coupling, where improvements for one group inadvertently compromise the other, allowing residual social bias to persist. In this paper, we introduce TriCon-Fair, a contrastive learning framework that employs a decoupled loss that combines triplet and language modeling terms to eliminate positive-negative coupling. Our TriCon-Fair assigns each anchor an explicitly biased negative and an unbiased positive, decoupling the push-pull dynamics and avoiding positive-negative coupling, and jointly optimizes a language modeling (LM) objective to preserve general capability. Experimental results demonstrate that TriCon-Fair reduces discriminatory output beyond existing debiasing baselines while maintaining strong downstream performance. This suggests that our proposed TriCon-Fair offers a practical and ethical solution for sensitive NLP applications.

TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models

TL;DR

TriCon-Fair tackles social bias in pre-trained language models by introducing a decoupled triplet contrastive learning framework that pairs anchors with explicit biased negatives and unbiased positives (via counterfactuals) while jointly optimizing a language modeling objective to preserve utility. The method builds biased-unbiased triplets, applies a decoupled loss that separates positive and negative gradients, and trains with to achieve reduced bias with minimal degradation in downstream tasks. Across BERT, ALBERT, GPT-2, and LLaMA-2 backbones, TriCon-Fair shows reduced bias on StereoSet (lower SS) and higher ICAT, with only marginal LM changes, and largely preserves MNLI and SST-2 accuracy, outperforming several baselines. The findings demonstrate that decoupling contrastive forces is an effective strategy for fairness-oriented representation learning in NLP. Limitations include evaluation limited to English, static bias benchmarks, and future work targeting multilinguality, inference-time efficiency, and alignment with user preferences.

Abstract

The increasing utilization of large language models raises significant concerns about the propagation of social biases, which may result in harmful and unfair outcomes. However, existing debiasing methods treat the biased and unbiased samples independently, thus ignoring their mutual relationship. This oversight enables a hidden negative-positive coupling, where improvements for one group inadvertently compromise the other, allowing residual social bias to persist. In this paper, we introduce TriCon-Fair, a contrastive learning framework that employs a decoupled loss that combines triplet and language modeling terms to eliminate positive-negative coupling. Our TriCon-Fair assigns each anchor an explicitly biased negative and an unbiased positive, decoupling the push-pull dynamics and avoiding positive-negative coupling, and jointly optimizes a language modeling (LM) objective to preserve general capability. Experimental results demonstrate that TriCon-Fair reduces discriminatory output beyond existing debiasing baselines while maintaining strong downstream performance. This suggests that our proposed TriCon-Fair offers a practical and ethical solution for sensitive NLP applications.

Paper Structure

This paper contains 12 sections, 4 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overview of TriCon-Fair. Stage 1 builds counterfactual triplets aligned on protected attributes; Stage 2 performs decoupled contrastive learning with a task-agnostic LM loss to reduce bias in the PLM.