Table of Contents
Fetching ...

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

Ariyan Hossain, Khondokar Mohammad Ahanaf Hannan, Rakinul Haque, Nowreen Tarannum Rafa, Humayra Musarrat, Shoaib Ahmed Dipu, Farig Yousuf Sadeque

TL;DR

This work examines gender bias in encoder-based transformer models, focusing on contextualized word embeddings produced by architectures like BERT, RoBERTa, ALBERT, and DistilBERT. It introduces MALoR, a model-agnostic metric based on mean absolute log-ratio of MLM token probabilities across gendered terms and professions, and validates it across three experiments: he–she, his–her, and male–female names. To mitigate bias, the authors perform Counterfactual Data Augmentation to create gender-balanced corpora and continue pretraining the models, reporting substantial reductions in MALoR scores while preserving SST-2 performance (no significant degradation). The study highlights model size and vocabulary as factors in debiasing effectiveness and provides a reproducible methodology, including detailed sentence templates and datasets, for evaluating and reducing gender bias in contextualized embeddings. This approach offers a practical, data-efficient pathway to fairer transformer-based systems in downstream NLP tasks.

Abstract

Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which have achieved state-of-the-art performance in various language tasks, have been shown to exhibit strong gender biases inherited from their training data. This paper investigates gender bias in contextualized word embeddings, a crucial component of transformer-based models. We focus on prominent architectures such as BERT, ALBERT, RoBERTa, and DistilBERT to examine their vulnerability to gender bias. To quantify the degree of bias, we introduce a novel metric, MALoR, which assesses bias based on model probabilities for filling masked tokens. We further propose a mitigation approach involving continued pre-training on a gender-balanced dataset generated via Counterfactual Data Augmentation. Our experiments reveal significant reductions in gender bias scores across different pronoun pairs. For instance, in BERT-base, bias scores for "he-she" dropped from 1.27 to 0.08, and "his-her" from 2.51 to 0.36 following our mitigation approach. We also observed similar improvements across other models, with "male-female" bias decreasing from 1.82 to 0.10 in BERT-large. Our approach effectively reduces gender bias without compromising model performance on downstream tasks.

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

TL;DR

This work examines gender bias in encoder-based transformer models, focusing on contextualized word embeddings produced by architectures like BERT, RoBERTa, ALBERT, and DistilBERT. It introduces MALoR, a model-agnostic metric based on mean absolute log-ratio of MLM token probabilities across gendered terms and professions, and validates it across three experiments: he–she, his–her, and male–female names. To mitigate bias, the authors perform Counterfactual Data Augmentation to create gender-balanced corpora and continue pretraining the models, reporting substantial reductions in MALoR scores while preserving SST-2 performance (no significant degradation). The study highlights model size and vocabulary as factors in debiasing effectiveness and provides a reproducible methodology, including detailed sentence templates and datasets, for evaluating and reducing gender bias in contextualized embeddings. This approach offers a practical, data-efficient pathway to fairer transformer-based systems in downstream NLP tasks.

Abstract

Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which have achieved state-of-the-art performance in various language tasks, have been shown to exhibit strong gender biases inherited from their training data. This paper investigates gender bias in contextualized word embeddings, a crucial component of transformer-based models. We focus on prominent architectures such as BERT, ALBERT, RoBERTa, and DistilBERT to examine their vulnerability to gender bias. To quantify the degree of bias, we introduce a novel metric, MALoR, which assesses bias based on model probabilities for filling masked tokens. We further propose a mitigation approach involving continued pre-training on a gender-balanced dataset generated via Counterfactual Data Augmentation. Our experiments reveal significant reductions in gender bias scores across different pronoun pairs. For instance, in BERT-base, bias scores for "he-she" dropped from 1.27 to 0.08, and "his-her" from 2.51 to 0.36 following our mitigation approach. We also observed similar improvements across other models, with "male-female" bias decreasing from 1.82 to 0.10 in BERT-large. Our approach effectively reduces gender bias without compromising model performance on downstream tasks.

Paper Structure

This paper contains 30 sections, 6 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: Methodology Pipeline
  • Figure 2: Bias Detection Methodology
  • Figure 3: Bias Mitigiation Methodology
  • Figure 4: BERT-base: Probabilities of gendered terms before (left figure) vs after (right figure) debiasing
  • Figure 5: BERT-large: Probabilities of gendered terms before (left figure) vs after (right figure) debiasing
  • ...and 11 more figures