Table of Contents
Fetching ...

Bias Association Discovery Framework for Open-Ended LLM Generations

Jinhao Pan, Chahat Raj, Ziwei Zhu

TL;DR

The paper tackles the challenge of surface biases in open-ended LLM generations by moving beyond predefined identity–bias term pairs. It introduces the Bias Association Discovery Framework (BADF), a three-stage pipeline that extracts, filters, and validates bias-associated concepts from freely generated narratives across multiple demographic axes and locations. Through extensive experiments with sentiment-constrained prompts, open-box perturbations, and cross-model analyses, BADF reveals that two-character prompts and open-box methods uncover richer and more diverse bias associations than traditional single-character or black-box approaches. The work demonstrates robust evaluation of each pipeline stage and presents a scalable tool for auditing representational harms, with implications for bias mitigation and responsible AI deployment.

Abstract

Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs.

Bias Association Discovery Framework for Open-Ended LLM Generations

TL;DR

The paper tackles the challenge of surface biases in open-ended LLM generations by moving beyond predefined identity–bias term pairs. It introduces the Bias Association Discovery Framework (BADF), a three-stage pipeline that extracts, filters, and validates bias-associated concepts from freely generated narratives across multiple demographic axes and locations. Through extensive experiments with sentiment-constrained prompts, open-box perturbations, and cross-model analyses, BADF reveals that two-character prompts and open-box methods uncover richer and more diverse bias associations than traditional single-character or black-box approaches. The work demonstrates robust evaluation of each pipeline stage and presents a scalable tool for auditing representational harms, with implications for bias mitigation and responsible AI deployment.

Abstract

Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs.

Paper Structure

This paper contains 71 sections, 1 equation, 9 figures, 20 tables.

Figures (9)

  • Figure 1: Bias Association Discovery Framework (BADF) extracts multiple bias concepts from open-ended generations, while prior benchmarks are limited to a single predefined concept per evaluation instance.
  • Figure 2: Bias Association Discovery Framework (BADF) workflow (see Table 3 in the Appendix for a sample generation).
  • Figure 3: N. of bias associations (gender) per location.
  • Figure 4: Top 3 bias associations of Single-Character Base (SCB), Two-Character Base (TCB), and Open-Box (OB) (For gender category, f: female, m: male; for race category, A: Asian, B: Black, ME: Middle-East, W: White; for religions category, Bu: Buddhism, C: Christian, J: Judaism, Mu: Muslim). The complete versions of the top 10 bias associations are in Table 15 and 16.
  • Figure 5: Numbers of bias associations of all locations for Single-Character base and Two-Character base.
  • ...and 4 more figures