Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Shadi Iskander; Kira Radinsky; Yonatan Belinkov

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Shadi Iskander, Kira Radinsky, Yonatan Belinkov

TL;DR

DaFair addresses social bias in language models without relying on demographic labels by leveraging prototypical demographic texts and a KL divergence regularization during fine-tuning. It defines multiple social-attribute representations, uses an ensemble of representation pairs, and optimizes a total loss $L_{total} = L_{ce} + \lambda L_{kl}$ to encourage uniform similarity to demographic prototypes. The approach supports no-label and limited-label settings (Semi-DaFair) and demonstrates bias reduction on occupation prediction and Twitter sentiment tasks across BERT and DeBERTa-V3, with modest accuracy trade-offs. This work offers a scalable, practical framework for fairness in NLP with clear pathways for extension to other bias types while noting ethical considerations and limitations of predefined texts and binary gender focus.

Abstract

Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

TL;DR

to encourage uniform similarity to demographic prototypes. The approach supports no-label and limited-label settings (Semi-DaFair) and demonstrates bias reduction on occupation prediction and Twitter sentiment tasks across BERT and DeBERTa-V3, with modest accuracy trade-offs. This work offers a scalable, practical framework for fairness in NLP with clear pathways for extension to other bias types while noting ethical considerations and limitations of predefined texts and binary gender focus.

Abstract

Paper Structure (39 sections, 8 equations, 4 figures, 8 tables)

This paper contains 39 sections, 8 equations, 4 figures, 8 tables.

Introduction and Background
Methodology
Demographic-Agnostic Fairness Approach
Social Attribute Representations
Pre-defined Representations (No Labels).
Data-driven Representations (Few Labels).
Ensemble of Representations
Calculating KL Loss
Experimental Setup
Tasks
Occupation Prediction.
Twitter Sentiment Analysis.
Models
Metrics
Performance Evaluation.
...and 24 more sections

Figures (4)

Figure 1: Our debiasing method consists of defining task-specific representations for each social attribute, measuring similarity in the representation space for each example, and utilizing the KL loss to encourage uniform probabilities across social groups.
Figure 2: Effect of bias mitigation methods on TPR-GAP with varying labeled data sizes. In scenarios with limited demographic-annotated data, our approach outperforms common debiasing approaches.
Figure 3: Effect of bias mitigation methods on accuracy with varying labeled data sizes.
Figure 4: The prompt for generating prototypical text pairs.

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

TL;DR

Abstract

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Authors

TL;DR

Abstract

Table of Contents

Figures (4)