Table of Contents
Fetching ...

Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases

Jen-tse Huang, Yuhang Yan, Linqi Liu, Yixin Wan, Wenxuan Wang, Kai-Wei Chang, Michael R. Lyu

TL;DR

This work reframes AI fairness by separating factual accuracy from normative fairness and introduces Fact-or-Fair, a benchmark built on 19 real-world statistics to test world knowledge and three cognitive-bias–driven subjective prompts. It defines two core metrics, $S_{fact}$ and $S_{fair}$, with subcomponents $S_E$ and $S_{KLD}$, and demonstrates a formal trade-off between factuality and fairness across six LLMs and four T2I models. Empirical results show models vary in their ability to recall factual world knowledge (especially for race-related statistics) and in how they respond to bias-inducing contexts, with LLMs generally more robust to cognitive-error prompts than T2I models. The findings provide a practical, theoretically grounded framework for responsible model evaluation and highlight the need for broader demographic coverage and more nuanced evaluation prompts to advance fairer AI systems.

Abstract

Recent failures such as Google Gemini generating people of color in Nazi-era uniforms illustrate how AI outputs can be factually plausible yet socially harmful. AI models are increasingly evaluated for "fairness," yet existing benchmarks often conflate two fundamentally different dimensions: factual correctness and normative fairness. A model may generate responses that are factually accurate but socially unfair, or conversely, appear fair while distorting factual reality. We argue that identifying the boundary between fact and fair is essential for meaningful fairness evaluation. We introduce Fact-or-Fair, a benchmark with (i) objective queries aligned with descriptive, fact-based judgments, and (ii) subjective queries aligned with normative, fairness-based judgments. Our queries are constructed from 19 statistics and are grounded in cognitive psychology, drawing on representativeness bias, attribution bias, and ingroup-outgroup bias to explain why models often misalign fact and fairness. Experiments across ten frontier models reveal different levels of fact-fair trade-offs. By reframing fairness evaluation, we provide both a new theoretical lens and a practical benchmark to advance the responsible model assessments. Our test suite is publicly available at https://github.com/uclanlp/Fact-or-Fair.

Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases

TL;DR

This work reframes AI fairness by separating factual accuracy from normative fairness and introduces Fact-or-Fair, a benchmark built on 19 real-world statistics to test world knowledge and three cognitive-bias–driven subjective prompts. It defines two core metrics, and , with subcomponents and , and demonstrates a formal trade-off between factuality and fairness across six LLMs and four T2I models. Empirical results show models vary in their ability to recall factual world knowledge (especially for race-related statistics) and in how they respond to bias-inducing contexts, with LLMs generally more robust to cognitive-error prompts than T2I models. The findings provide a practical, theoretically grounded framework for responsible model evaluation and highlight the need for broader demographic coverage and more nuanced evaluation prompts to advance fairer AI systems.

Abstract

Recent failures such as Google Gemini generating people of color in Nazi-era uniforms illustrate how AI outputs can be factually plausible yet socially harmful. AI models are increasingly evaluated for "fairness," yet existing benchmarks often conflate two fundamentally different dimensions: factual correctness and normative fairness. A model may generate responses that are factually accurate but socially unfair, or conversely, appear fair while distorting factual reality. We argue that identifying the boundary between fact and fair is essential for meaningful fairness evaluation. We introduce Fact-or-Fair, a benchmark with (i) objective queries aligned with descriptive, fact-based judgments, and (ii) subjective queries aligned with normative, fairness-based judgments. Our queries are constructed from 19 statistics and are grounded in cognitive psychology, drawing on representativeness bias, attribution bias, and ingroup-outgroup bias to explain why models often misalign fact and fairness. Experiments across ten frontier models reveal different levels of fact-fair trade-offs. By reframing fairness evaluation, we provide both a new theoretical lens and a practical benchmark to advance the responsible model assessments. Our test suite is publicly available at https://github.com/uclanlp/Fact-or-Fair.

Paper Structure

This paper contains 46 sections, 6 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Fact-or-Fair is a benchmark comprising objective queries derived from real-world statistics and subjective queries designed using three cognitive errors that contribute to stereotypes. It includes queries designed for LLMs and T2I models.
  • Figure 2: Visualization of two functions.
  • Figure 3: Fact-or-Fair offers diverse scenarios in subjective queries to evaluate models' fairness.
  • Figure 4: $S_{fair}$ and $S_{fact}$ of six LLMs and four T2I models using Fact-or-Fair.
  • Figure 5: $S_{fair}$ and $S_{fact}$ of six LLMs using subjective queries with different contexts.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4