Table of Contents
Fetching ...

Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs

Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald

TL;DR

This work investigates whether large language models exhibit the representativeness heuristic (RH) in reasoning. It introduces ReHeAT, a dataset of 202 RH questions across six RH types to probe RH biases in four prominent LLMs under varied prompting strategies, including standard prompts, zero-shot Chain-of-Thought, self-consistency, and in-context learning. The study finds that LLMs display RH-like biases similar to humans, with only modest gains from advanced prompts; prompting hints that encourage the model to recall its knowledge can improve accuracy, suggesting a potential mitigation pathway. The results underscore the need for deeper understanding and targeted interventions to address cognitive biases in model reasoning, with implications for the design of prompts and evaluation frameworks in AI systems that rely on probabilistic reasoning.

Abstract

Although large language models (LLMs) have demonstrated remarkable proficiency in modeling text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example, versus considering broader facts or statistical evidence. This research investigates the impact of the representativeness heuristic on LLM reasoning. We created ReHeAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to ReHeAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than on the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model to use its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while falling into a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.

Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs

TL;DR

This work investigates whether large language models exhibit the representativeness heuristic (RH) in reasoning. It introduces ReHeAT, a dataset of 202 RH questions across six RH types to probe RH biases in four prominent LLMs under varied prompting strategies, including standard prompts, zero-shot Chain-of-Thought, self-consistency, and in-context learning. The study finds that LLMs display RH-like biases similar to humans, with only modest gains from advanced prompts; prompting hints that encourage the model to recall its knowledge can improve accuracy, suggesting a potential mitigation pathway. The results underscore the need for deeper understanding and targeted interventions to address cognitive biases in model reasoning, with implications for the design of prompts and evaluation frameworks in AI systems that rely on probabilistic reasoning.

Abstract

Although large language models (LLMs) have demonstrated remarkable proficiency in modeling text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example, versus considering broader facts or statistical evidence. This research investigates the impact of the representativeness heuristic on LLM reasoning. We created ReHeAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to ReHeAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than on the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model to use its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while falling into a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.
Paper Structure (22 sections, 1 equation, 4 figures, 27 tables)

This paper contains 22 sections, 1 equation, 4 figures, 27 tables.

Figures (4)

  • Figure 1: Illustration of the representativeness heuristic problem. The model possesses the knowledge to answer the statistical prototype question, yet fails to use it to solve the representativeness heuristic question. Providing an appropriate hint in the prompt can guide the model in making a correct prediction.
  • Figure 2: Illustrations on six types of representativeness heuristic
  • Figure 3: In-context learning accuracy of four selected LLMs on ReHeAT with few-shot demonstrations.
  • Figure 4: The combination of reasoning and outcome correctness for GPT-4 and LLaMA2-70B on the ReHeAT via CoT prompts. Detailed Table \ref{['tab:main_results_reason_outcome']} in Appendix.