Bias Association Discovery Framework for Open-Ended LLM Generations
Jinhao Pan, Chahat Raj, Ziwei Zhu
TL;DR
The paper tackles the challenge of surface biases in open-ended LLM generations by moving beyond predefined identity–bias term pairs. It introduces the Bias Association Discovery Framework (BADF), a three-stage pipeline that extracts, filters, and validates bias-associated concepts from freely generated narratives across multiple demographic axes and locations. Through extensive experiments with sentiment-constrained prompts, open-box perturbations, and cross-model analyses, BADF reveals that two-character prompts and open-box methods uncover richer and more diverse bias associations than traditional single-character or black-box approaches. The work demonstrates robust evaluation of each pipeline stage and presents a scalable tool for auditing representational harms, with implications for bias mitigation and responsible AI deployment.
Abstract
Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs.
