Table of Contents
Fetching ...

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Shadi Iskander, Kira Radinsky, Yonatan Belinkov

TL;DR

DaFair addresses social bias in language models without relying on demographic labels by leveraging prototypical demographic texts and a KL divergence regularization during fine-tuning. It defines multiple social-attribute representations, uses an ensemble of representation pairs, and optimizes a total loss $L_{total} = L_{ce} + \lambda L_{kl}$ to encourage uniform similarity to demographic prototypes. The approach supports no-label and limited-label settings (Semi-DaFair) and demonstrates bias reduction on occupation prediction and Twitter sentiment tasks across BERT and DeBERTa-V3, with modest accuracy trade-offs. This work offers a scalable, practical framework for fairness in NLP with clear pathways for extension to other bias types while noting ethical considerations and limitations of predefined texts and binary gender focus.

Abstract

Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

TL;DR

DaFair addresses social bias in language models without relying on demographic labels by leveraging prototypical demographic texts and a KL divergence regularization during fine-tuning. It defines multiple social-attribute representations, uses an ensemble of representation pairs, and optimizes a total loss to encourage uniform similarity to demographic prototypes. The approach supports no-label and limited-label settings (Semi-DaFair) and demonstrates bias reduction on occupation prediction and Twitter sentiment tasks across BERT and DeBERTa-V3, with modest accuracy trade-offs. This work offers a scalable, practical framework for fairness in NLP with clear pathways for extension to other bias types while noting ethical considerations and limitations of predefined texts and binary gender focus.

Abstract

Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.
Paper Structure (39 sections, 8 equations, 4 figures, 8 tables)

This paper contains 39 sections, 8 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Our debiasing method consists of defining task-specific representations for each social attribute, measuring similarity in the representation space for each example, and utilizing the KL loss to encourage uniform probabilities across social groups.
  • Figure 2: Effect of bias mitigation methods on TPR-GAP with varying labeled data sizes. In scenarios with limited demographic-annotated data, our approach outperforms common debiasing approaches.
  • Figure 3: Effect of bias mitigation methods on accuracy with varying labeled data sizes.
  • Figure 4: The prompt for generating prototypical text pairs.