Table of Contents
Fetching ...

Rethinking LLM Bias Probing Using Lessons from the Social Sciences

Kirsten N. Morehouse, Siddharth Swaroop, Weiwei Pan

TL;DR

The paper tackles the challenges of bias probing in large language models by proposing EcoLevels, a framework that uses ecological validity and abstraction level to select and interpret probes. Grounded in social science theory, it surveys human-bias research and existing probes, and demonstrates how to apply EcoLevels to gender-occupation bias to reveal boundary conditions and generalization considerations. It argues for translating domain knowledge from psychology to ML contexts, addresses conflicting results across probes, and advocates for standardized reporting and no-lose experimental designs. The work aims to improve rigor, comparability, and real-world relevance in LLM bias research, guiding the next generation of bias probes and interpretable mitigation strategies.

Abstract

The proliferation of LLM bias probes introduces three significant challenges: (1) we lack principled criteria for choosing appropriate probes, (2) we lack a system for reconciling conflicting results across probes, and (3) we lack formal frameworks for reasoning about when (and why) probe results will generalize to real user behavior. We address these challenges by systematizing LLM social bias probing using actionable insights from social sciences. We then introduce EcoLevels - a framework that helps (a) determine appropriate bias probes, (b) reconcile conflicting findings across probes, and (c) generate predictions about bias generalization. Overall, we ground our analysis in social science research because many LLM probes are direct applications of human probes, and these fields have faced similar challenges when studying social bias in humans. Based on our work, we suggest how the next generation of LLM bias probing can (and should) benefit from decades of social science research.

Rethinking LLM Bias Probing Using Lessons from the Social Sciences

TL;DR

The paper tackles the challenges of bias probing in large language models by proposing EcoLevels, a framework that uses ecological validity and abstraction level to select and interpret probes. Grounded in social science theory, it surveys human-bias research and existing probes, and demonstrates how to apply EcoLevels to gender-occupation bias to reveal boundary conditions and generalization considerations. It argues for translating domain knowledge from psychology to ML contexts, addresses conflicting results across probes, and advocates for standardized reporting and no-lose experimental designs. The work aims to improve rigor, comparability, and real-world relevance in LLM bias research, guiding the next generation of bias probes and interpretable mitigation strategies.

Abstract

The proliferation of LLM bias probes introduces three significant challenges: (1) we lack principled criteria for choosing appropriate probes, (2) we lack a system for reconciling conflicting results across probes, and (3) we lack formal frameworks for reasoning about when (and why) probe results will generalize to real user behavior. We address these challenges by systematizing LLM social bias probing using actionable insights from social sciences. We then introduce EcoLevels - a framework that helps (a) determine appropriate bias probes, (b) reconcile conflicting findings across probes, and (c) generate predictions about bias generalization. Overall, we ground our analysis in social science research because many LLM probes are direct applications of human probes, and these fields have faced similar challenges when studying social bias in humans. Based on our work, we suggest how the next generation of LLM bias probing can (and should) benefit from decades of social science research.

Paper Structure

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Construct schematic. Starting from the bottom, the blue and green circles represent probes used to study implicit and explicit cognition, respectively. The rectangles in the center represent the constructs or the latent concept under investigation. The gray horizontal lines emphasize that constructs are interconnected rather than isolated phenomena. Finally, the colored squares represent the ideas underlying each construct (constituent ideas).
  • Figure 2: Establishing task-probe alignment through example research questions. Ecologically valid probes (a) measure the construct defined by the research question (RQ) and (b) possess strong task-probe alignment. This figure demonstrates how distinct RQs can target the same construct, highlighting the differences between constructs and tasks. Once the construct(s) are identified, the task associated with the RQ ('task|RQ') should be specified. With the research question, construct, and task defined, researchers can more effectively identify probes that align with the task.
  • Figure 3: Borderline Prompts and Features that Distinguish Levels. As discussed in Section 4.4, sentence completion probes can be difficult to categorize. Here, we show how the inclusion of (a) an implied task, (b) a defined task, and/or (c) real-world context changes the EcoLevels categorization. Responses were obtained via the browser version of GPT-4o and are included for demonstration purposes only.
  • Figure 4: Increasing the Ecological Validity of a Probe, Given a Research Question. In this figure, we return to one of the research questions introduced in Section 4.4. In the main text, we argued that naturalistic probes would be most appropriate for this research question, given its focus on disparate outcomes. Here, however, we show how small tweaks to an association-level probe -- LLM IB baiExplicitlyUnbiasedLarge2025 -- can increase its ecological validity for this research question. Specifically, we replace the context-neutral language ("pick a word") with a specific context/task ('pick a person to hire').