Table of Contents
Fetching ...

Intersectional Bias in Causal Language Models

Liam Magee, Lida Ghahremanlou, Karen Soldatic, Shanthi Robertson

TL;DR

This study investigates intersectional bias in eight causal language models (GPT-2 and GPT-NEO) by generating 280 prompts combining gender, religion, and disability and evaluating the sentiment of 100 sentences per prompt. It finds bias across single categories and that intersectional prompts often yield lower sentiment; model size and training data diversity do not reliably eliminate bias. Through ANOVA, t-tests, topic modeling, and regression, the authors show that while some patterns emerge (e.g., Muslim and disability terms often correlate with negative sentiment), the effects are context-dependent and not fully predictable. They also explore mitigation avenues, including prompt calibration and community-informed data practices, concluding that addressing intersectional bias requires a combination of technical strategies and sociocultural engagement. The work highlights the nuanced ways in which language models reflect and propagate social biases and emphasizes the need for targeted, context-aware mitigation in real-world applications.

Abstract

To examine whether intersectional bias can be observed in language generation, we examine \emph{GPT-2} and \emph{GPT-NEO} models, ranging in size from 124 million to ~2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts used to generate sentences that are then analysed for sentiment. Our results confirm earlier tests conducted with auto-regressive causal models, including the \emph{GPT} family of models. We also illustrate why bias may be resistant to techniques that target single categories (e.g. gender, religion and race), as it can also manifest, in often subtle ways, in texts prompted by concatenated social categories. To address these difficulties, we suggest technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.

Intersectional Bias in Causal Language Models

TL;DR

This study investigates intersectional bias in eight causal language models (GPT-2 and GPT-NEO) by generating 280 prompts combining gender, religion, and disability and evaluating the sentiment of 100 sentences per prompt. It finds bias across single categories and that intersectional prompts often yield lower sentiment; model size and training data diversity do not reliably eliminate bias. Through ANOVA, t-tests, topic modeling, and regression, the authors show that while some patterns emerge (e.g., Muslim and disability terms often correlate with negative sentiment), the effects are context-dependent and not fully predictable. They also explore mitigation avenues, including prompt calibration and community-informed data practices, concluding that addressing intersectional bias requires a combination of technical strategies and sociocultural engagement. The work highlights the nuanced ways in which language models reflect and propagate social biases and emphasizes the need for targeted, context-aware mitigation in real-world applications.

Abstract

To examine whether intersectional bias can be observed in language generation, we examine \emph{GPT-2} and \emph{GPT-NEO} models, ranging in size from 124 million to ~2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts used to generate sentences that are then analysed for sentiment. Our results confirm earlier tests conducted with auto-regressive causal models, including the \emph{GPT} family of models. We also illustrate why bias may be resistant to techniques that target single categories (e.g. gender, religion and race), as it can also manifest, in often subtle ways, in texts prompted by concatenated social categories. To address these difficulties, we suggest technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.

Paper Structure

This paper contains 10 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Highest and lowest 10 sentiment scores.
  • Figure 2: Word cloud for 'A Muslim blind man'.
  • Figure 3: Word cloud for 'A Jewish woman with quadriplegia'.
  • Figure 4: Word cloud for 'A Buddhist person with Down Syndrome'.