Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
Abhishek Kumar, Sarfaroz Yunusov, Ali Emami
TL;DR
This work addresses subtle biases in LLM outputs by introducing the Representative Bias Score ($RBS$) and the Affinity Bias Score ($ABS$) and by deploying the Creativity-Oriented Generation Suite (CoGS) to quantify generation and evaluation biases. By evaluating GPT-4, LLaMA-2, and Mixtral on 3,240 CoGS problem instances, the authors uncover pronounced representative biases toward white, straight, and male identities and model-specific affinity-bias fingerprints, with human evaluators showing related patterns. The methodology combines semantic-similarity analysis of identity-modulated outputs and evaluator preferences to produce cross-model bias profiles, enabling scalable benchmarking of subtle biases in creative generation and evaluation contexts. These insights have practical implications for fairness in AI-assisted storytelling and evaluation, and the work paves the way for bias-awareness tools and broader axis inclusion in future studies.
Abstract
Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.
