Table of Contents
Fetching ...

Who Do LLMs Trust? Human Experts Matter More Than Other LLMs

Anooshka Bajaj, Zoran Tiganj

TL;DR

This work investigates how LLMs weigh social information and whether human expert feedback is privileged over other sources. Through two experiments across three binary tasks and four instruction-tuned LLMs, it demonstrates a robust credibility-weighted influence where expert framing drives stronger conformity than friends or other LLMs, and group size amplifies this effect. In direct human–LLM conflict, belief revision is heavily biased toward expert-framed humans, with token-level evidence corroborating the human-centric shifts. The findings highlight a potential alignment and safety concern: social framing can substantially bias LLM judgments, informing how multi-agent prompting and human-in-the-loop systems should be designed to mitigate over-reliance on authority signals.

Abstract

Large language models (LLMs) increasingly operate in environments where they encounter social information such as other agents' answers, tool outputs, or human recommendations. In humans, such inputs influence judgments in ways that depend on the source's credibility and the strength of consensus. This paper investigates whether LLMs exhibit analogous patterns of influence and whether they privilege feedback from humans over feedback from other LLMs. Across three binary decision-making tasks, reading comprehension, multi-step reasoning, and moral judgment, we present four instruction-tuned LLMs with prior responses attributed either to friends, to human experts, or to other LLMs. We manipulate whether the group is correct and vary the group size. In a second experiment, we introduce direct disagreement between a single human and a single LLM. Across tasks, models conform significantly more to responses labeled as coming from human experts, including when that signal is incorrect, and revise their answers toward experts more readily than toward other LLMs. These results reveal that expert framing acts as a strong prior for contemporary LLMs, suggesting a form of credibility-sensitive social influence that generalizes across decision domains.

Who Do LLMs Trust? Human Experts Matter More Than Other LLMs

TL;DR

This work investigates how LLMs weigh social information and whether human expert feedback is privileged over other sources. Through two experiments across three binary tasks and four instruction-tuned LLMs, it demonstrates a robust credibility-weighted influence where expert framing drives stronger conformity than friends or other LLMs, and group size amplifies this effect. In direct human–LLM conflict, belief revision is heavily biased toward expert-framed humans, with token-level evidence corroborating the human-centric shifts. The findings highlight a potential alignment and safety concern: social framing can substantially bias LLM judgments, informing how multi-agent prompting and human-in-the-loop systems should be designed to mitigate over-reliance on authority signals.

Abstract

Large language models (LLMs) increasingly operate in environments where they encounter social information such as other agents' answers, tool outputs, or human recommendations. In humans, such inputs influence judgments in ways that depend on the source's credibility and the strength of consensus. This paper investigates whether LLMs exhibit analogous patterns of influence and whether they privilege feedback from humans over feedback from other LLMs. Across three binary decision-making tasks, reading comprehension, multi-step reasoning, and moral judgment, we present four instruction-tuned LLMs with prior responses attributed either to friends, to human experts, or to other LLMs. We manipulate whether the group is correct and vary the group size. In a second experiment, we introduce direct disagreement between a single human and a single LLM. Across tasks, models conform significantly more to responses labeled as coming from human experts, including when that signal is incorrect, and revise their answers toward experts more readily than toward other LLMs. These results reveal that expert framing acts as a strong prior for contemporary LLMs, suggesting a form of credibility-sensitive social influence that generalizes across decision domains.
Paper Structure (16 sections, 4 figures)

This paper contains 16 sections, 4 figures.

Figures (4)

  • Figure 1: Paradigm and example prompts. Each panel combines a schematic (left) and a representative prompt excerpt (right). In all conditions, a prompt is provided to the LLM, which produces a forced-choice YES/NO answer. (a) Baseline: the prompt contains only the question and context. (b) Experiment 1: a single social prior (friends, experts, or other LLMs) summarizes $k$ prior answers (unanimous), enabling measurement of conformity. (c) Experiment 2: two sources (a human friend/expert and another LLM) provide disagreeing answers, enabling analysis of switching toward one source versus the other.
  • Figure 2: Experiment 1: Effects of homogeneous social priors across datasets. Rows correspond to datasets (BoolQ, StrategyQA, ETHICS) and columns to models. Within each dataset block, the top row shows accuracy and the bottom row shows conformity (probability of matching the unanimous prior) as a function of group size. Solid vs. dotted lines indicate whether the prior answer is correct or incorrect (agrees vs. disagrees with the dataset label). The dashed black line (and point at $k{=}0$) shows the no-prior baseline accuracy. Error bars show 95% Wilson confidence intervals.
  • Figure 3: Token-level belief shifts (Llama-3.3 70B; BoolQ).(a) Experiment 1: Mean change in log-odds toward the unanimous prior (relative to no-prior baseline, $k{=}0$) as a function of group size. Colors and line styles match Fig. \ref{['fig:exp1_all']}. (b) Experiment 2: Mean shift toward the human source under conflict ($k{=}2$) relative to baseline. Error bars show 95% bootstrap confidence intervals.
  • Figure 4: Experiment 2: Belief revision under human--LLM conflict across datasets. Rows correspond to datasets and columns to models. Considering only trials where the model’s final answer differs from its no-prior baseline (Experiment 1, $k{=}0$), bars show the probability that the switch follows the human source (Friends vs. Experts) rather than the opposing LLM. The dashed horizontal line marks 0.5 (no directional preference). Annotations report the number of switch trials ($n$) in each condition. Error bars show 95% Wilson confidence intervals.