Table of Contents
Fetching ...

An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations

Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei, Yutao Yue, Xinlei He

TL;DR

This paper investigates the anchoring effect in LLMs, introducing SynAnchors to study existence, mechanisms, and mitigations. It demonstrates that anchoring is prevalent across modern models, though the effect is shallow and more muted in advanced/reasoning models. Through causal tracing, the authors show that anchor-sensitive signals are primarily active in early layers and involve specific ROI tokens, suggesting that reasoning can modestly mitigate but not eliminate the bias. The work proposes Anti-DP and other strategies, highlighting the need for cognition-aware evaluation and methods to align LLM behavior with trustworthy AI principles.

Abstract

The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors. Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies, while reasoning can offer some mitigation. This recontextualization via cognitive psychology urges that LLM evaluations focus not on standard benchmarks or over-optimized robustness tests, but on cognitive-bias-aware trustworthy evaluation.

An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations

TL;DR

This paper investigates the anchoring effect in LLMs, introducing SynAnchors to study existence, mechanisms, and mitigations. It demonstrates that anchoring is prevalent across modern models, though the effect is shallow and more muted in advanced/reasoning models. Through causal tracing, the authors show that anchor-sensitive signals are primarily active in early layers and involve specific ROI tokens, suggesting that reasoning can modestly mitigate but not eliminate the bias. The work proposes Anti-DP and other strategies, highlighting the need for cognition-aware evaluation and methods to align LLM behavior with trustworthy AI principles.

Abstract

The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors. Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies, while reasoning can offer some mitigation. This recontextualization via cognitive psychology urges that LLM evaluations focus not on standard benchmarks or over-optimized robustness tests, but on cognitive-bias-aware trustworthy evaluation.

Paper Structure

This paper contains 39 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Detailed illustration of our research on LLM's anchoring effect from three key aspects: (1) Existence: showing significant biases toward different anchor values for identical questions. (2) Mechanism: using causal tracing and statistics to explore underlying patterns. (3) Mitigation: evaluating across varied mitigation strategies. 'Q: "...?"' refers to asking the same question again.
  • Figure 2: Causal tracing on attention (red) and FFN (green) modules of LLama-3.1-8B-Instruct about semantic anchoring questions. The X-axis represents the layer index of the model (32 layers). The Y-axis is the ROI tokens.
  • Figure 3: Percentages of sufficient anchor information mentions in DeepSeek-R1 reasoning contents. Legend: "Anchored" refers to the percentages of questions judged as anchored based on the metrics introduced in \ref{['sec:4.1']}; "All" and "Non-anchored" indicate the percentages over all questions and those judged non-anchored, respectively. We employ an LLM-as-a-Judge approach to automatically detect explicit mentions of anchor-influenced features in reasoning contents, guided by detailed criteria defining what extent can be counted as sufficient mention (see more in \ref{['sec:c']}).
  • Figure 4: Causal tracing on attention (red) and FFN (green) modules of LLama-3.1-8B-Instruct about numerical anchoring questions. The X-axis represents the layer index of the model (32 layers). The Y-axis is the ROI tokens.
  • Figure 5: Categories of topic in semantic anchoring questions
  • ...and 3 more figures