Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning
Yue Zhou, Barbara Di Eugenio
TL;DR
The paper investigates veracity bias in LLMs, defined as systematic associations between solution correctness and demographic attributes, through Attribution (A) and Evaluation (E). It analyzes five human-value-aligned LLMs across math, coding, commonsense, and writing tasks using carefully designed prompts (Direct Labels and Name Proxies) and deterministic settings to measure AB_cor, AB_inc, EI, and EP. Key findings include pervasive attribution biases against Black groups in multiple domains and varying White/Asian preferences by task, as well as notable evaluation biases, strongest in writing, where demographic identity alters scoring of identical solutions; additional studies reveal racially stereotypical color assignments in visualization code. These results imply that demographic biases are deeply embedded in LLM reasoning, raising concerns for deployment in education and evaluation and underscoring the need for targeted debiasing and robust evaluation protocols.
Abstract
Despite LLMs' explicit alignment against demographic stereotypes, they have been shown to exhibit biases under various social contexts. In this work, we find that LLMs exhibit concerning biases in how they associate solution veracity with demographics. Through experiments across five human value-aligned LLMs on mathematics, coding, commonsense, and writing problems, we reveal two forms of such veracity biases: Attribution Bias, where models disproportionately attribute correct solutions to certain demographic groups, and Evaluation Bias, where models' assessment of identical solutions varies based on perceived demographic authorship. Our results show pervasive biases: LLMs consistently attribute fewer correct solutions and more incorrect ones to African-American groups in math and coding, while Asian authorships are least preferred in writing evaluation. In additional studies, we show LLMs automatically assign racially stereotypical colors to demographic groups in visualization code, suggesting these biases are deeply embedded in models' reasoning processes. Our findings indicate that demographic bias extends beyond surface-level stereotypes and social context provocations, raising concerns about LLMs' deployment in educational and evaluation settings.
