Rethinking LLM Bias Probing Using Lessons from the Social Sciences
Kirsten N. Morehouse, Siddharth Swaroop, Weiwei Pan
TL;DR
The paper tackles the challenges of bias probing in large language models by proposing EcoLevels, a framework that uses ecological validity and abstraction level to select and interpret probes. Grounded in social science theory, it surveys human-bias research and existing probes, and demonstrates how to apply EcoLevels to gender-occupation bias to reveal boundary conditions and generalization considerations. It argues for translating domain knowledge from psychology to ML contexts, addresses conflicting results across probes, and advocates for standardized reporting and no-lose experimental designs. The work aims to improve rigor, comparability, and real-world relevance in LLM bias research, guiding the next generation of bias probes and interpretable mitigation strategies.
Abstract
The proliferation of LLM bias probes introduces three significant challenges: (1) we lack principled criteria for choosing appropriate probes, (2) we lack a system for reconciling conflicting results across probes, and (3) we lack formal frameworks for reasoning about when (and why) probe results will generalize to real user behavior. We address these challenges by systematizing LLM social bias probing using actionable insights from social sciences. We then introduce EcoLevels - a framework that helps (a) determine appropriate bias probes, (b) reconcile conflicting findings across probes, and (c) generate predictions about bias generalization. Overall, we ground our analysis in social science research because many LLM probes are direct applications of human probes, and these fields have faced similar challenges when studying social bias in humans. Based on our work, we suggest how the next generation of LLM bias probing can (and should) benefit from decades of social science research.
