Who Do LLMs Trust? Human Experts Matter More Than Other LLMs
Anooshka Bajaj, Zoran Tiganj
TL;DR
This work investigates how LLMs weigh social information and whether human expert feedback is privileged over other sources. Through two experiments across three binary tasks and four instruction-tuned LLMs, it demonstrates a robust credibility-weighted influence where expert framing drives stronger conformity than friends or other LLMs, and group size amplifies this effect. In direct human–LLM conflict, belief revision is heavily biased toward expert-framed humans, with token-level evidence corroborating the human-centric shifts. The findings highlight a potential alignment and safety concern: social framing can substantially bias LLM judgments, informing how multi-agent prompting and human-in-the-loop systems should be designed to mitigate over-reliance on authority signals.
Abstract
Large language models (LLMs) increasingly operate in environments where they encounter social information such as other agents' answers, tool outputs, or human recommendations. In humans, such inputs influence judgments in ways that depend on the source's credibility and the strength of consensus. This paper investigates whether LLMs exhibit analogous patterns of influence and whether they privilege feedback from humans over feedback from other LLMs. Across three binary decision-making tasks, reading comprehension, multi-step reasoning, and moral judgment, we present four instruction-tuned LLMs with prior responses attributed either to friends, to human experts, or to other LLMs. We manipulate whether the group is correct and vary the group size. In a second experiment, we introduce direct disagreement between a single human and a single LLM. Across tasks, models conform significantly more to responses labeled as coming from human experts, including when that signal is incorrect, and revise their answers toward experts more readily than toward other LLMs. These results reveal that expert framing acts as a strong prior for contemporary LLMs, suggesting a form of credibility-sensitive social influence that generalizes across decision domains.
