Table of Contents
Fetching ...

Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data

Laura Biester

TL;DR

This study introduces a novel Olympic-data framework to probe gender bias in large language models by using parallel men’s and women’s events and two prompt styles (specified and underspecified). It defines three bias metrics—knowledge-based, explicit-ambiguous, and implicit-ambiguous—and evaluates six instruction-tuned models, revealing that models maintain roughly equal knowledge about men’s and women’s events but exhibit clear bias when gender is ambiguous: they often default to men’s results explicitly or implicitly. The work provides a data source, annotation protocol, and public prompts to enable further bias analysis, and shows discipline-level variability (e.g., artistic gymnastics as a notable outlier). Overall, the approach highlights how ambiguity can trigger gender bias in factual generation and offers a pathway for targeted bias mitigation in NLP systems. The analysis uses significance testing with permutation tests and false-discovery-rate control to substantiate findings, emphasizing the practical risk of a default-man bias in sports coverage by LLMs. $p$-values and $FDR$ adjustments are reported to support the conclusions, with results robust to the exclusion of ambiguous cases.

Abstract

Large Language Models (LLMs) have been shown to be biased in prior work, as they generate text that is in line with stereotypical views of the world or that is not representative of the viewpoints and values of historically marginalized demographic groups. In this work, we propose using data from parallel men's and women's events at the Olympic Games to investigate different forms of gender bias in language models. We define three metrics to measure bias, and find that models are consistently biased against women when the gender is ambiguous in the prompt. In this case, the model frequently retrieves only the results of the men's event with or without acknowledging them as such, revealing pervasive gender bias in LLMs in the context of athletics.

Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data

TL;DR

This study introduces a novel Olympic-data framework to probe gender bias in large language models by using parallel men’s and women’s events and two prompt styles (specified and underspecified). It defines three bias metrics—knowledge-based, explicit-ambiguous, and implicit-ambiguous—and evaluates six instruction-tuned models, revealing that models maintain roughly equal knowledge about men’s and women’s events but exhibit clear bias when gender is ambiguous: they often default to men’s results explicitly or implicitly. The work provides a data source, annotation protocol, and public prompts to enable further bias analysis, and shows discipline-level variability (e.g., artistic gymnastics as a notable outlier). Overall, the approach highlights how ambiguity can trigger gender bias in factual generation and offers a pathway for targeted bias mitigation in NLP systems. The analysis uses significance testing with permutation tests and false-discovery-rate control to substantiate findings, emphasizing the practical risk of a default-man bias in sports coverage by LLMs. -values and adjustments are reported to support the conclusions, with results robust to the exclusion of ambiguous cases.

Abstract

Large Language Models (LLMs) have been shown to be biased in prior work, as they generate text that is in line with stereotypical views of the world or that is not representative of the viewpoints and values of historically marginalized demographic groups. In this work, we propose using data from parallel men's and women's events at the Olympic Games to investigate different forms of gender bias in language models. We define three metrics to measure bias, and find that models are consistently biased against women when the gender is ambiguous in the prompt. In this case, the model frequently retrieves only the results of the men's event with or without acknowledging them as such, revealing pervasive gender bias in LLMs in the context of athletics.

Paper Structure

This paper contains 33 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Overview of how the three bias metrics are computed for a single event.
  • Figure 2: An example annotation for the specified task.