Table of Contents
Fetching ...

Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation

Elijah Berberette, Jack Hutchins, Amir Sadovnik

TL;DR

This work critiques the conventional use of 'hallucination' for LLM outputs and proposes a psychology-informed taxonomy that maps misoutputs to human cognitive biases such as source amnesia, recency effects, availability heuristics, suggestibility, cognitive dissonance, and confabulation. By linking AI misbehavior to well-studied psychological phenomena, the authors outline targeted mitigation strategies, including enhanced source attribution, source monitoring, and artificial metacognition-inspired processing. They discuss how metacognitive-inspired approaches, self-reflection, and self-inquiry could improve reliability, while acknowledging the feasibility limits of true machine metacognition. The proposed framework aims to reduce misinformation risk at scale and offers a structured direction for future research at the intersection of psychology and AI safety.

Abstract

In recent years, large language models (LLMs) have become incredibly popular, with ChatGPT for example being used by over a billion users. While these models exhibit remarkable language understanding and logical prowess, a notable challenge surfaces in the form of "hallucinations." This phenomenon results in LLMs outputting misinformation in a confident manner, which can lead to devastating consequences with such a large user base. However, we question the appropriateness of the term "hallucination" in LLMs, proposing a psychological taxonomy based on cognitive biases and other psychological phenomena. Our approach offers a more fine-grained understanding of this phenomenon, allowing for targeted solutions. By leveraging insights from how humans internally resolve similar challenges, we aim to develop strategies to mitigate LLM hallucinations. This interdisciplinary approach seeks to move beyond conventional terminology, providing a nuanced understanding and actionable pathways for improvement in LLM reliability.

Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation

TL;DR

This work critiques the conventional use of 'hallucination' for LLM outputs and proposes a psychology-informed taxonomy that maps misoutputs to human cognitive biases such as source amnesia, recency effects, availability heuristics, suggestibility, cognitive dissonance, and confabulation. By linking AI misbehavior to well-studied psychological phenomena, the authors outline targeted mitigation strategies, including enhanced source attribution, source monitoring, and artificial metacognition-inspired processing. They discuss how metacognitive-inspired approaches, self-reflection, and self-inquiry could improve reliability, while acknowledging the feasibility limits of true machine metacognition. The proposed framework aims to reduce misinformation risk at scale and offers a structured direction for future research at the intersection of psychology and AI safety.

Abstract

In recent years, large language models (LLMs) have become incredibly popular, with ChatGPT for example being used by over a billion users. While these models exhibit remarkable language understanding and logical prowess, a notable challenge surfaces in the form of "hallucinations." This phenomenon results in LLMs outputting misinformation in a confident manner, which can lead to devastating consequences with such a large user base. However, we question the appropriateness of the term "hallucination" in LLMs, proposing a psychological taxonomy based on cognitive biases and other psychological phenomena. Our approach offers a more fine-grained understanding of this phenomenon, allowing for targeted solutions. By leveraging insights from how humans internally resolve similar challenges, we aim to develop strategies to mitigate LLM hallucinations. This interdisciplinary approach seeks to move beyond conventional terminology, providing a nuanced understanding and actionable pathways for improvement in LLM reliability.
Paper Structure (13 sections, 7 figures)

This paper contains 13 sections, 7 figures.

Figures (7)

  • Figure 1: Conversation with ChatGPT that contains a hallucination. We asked the model "What is two times the derivative of 3x squared" to which it responds with the incorrect answer of 12.
  • Figure 2: An overview of psychological phenomena and cognitive biases in humans and their parallel in LLMs
  • Figure 3: Conversation with LLaMA-2 7B where we ask it to describe detritivores. When asked to cite the answer it gave, LLaMA-2 responded with a fake article, thus demonstrating source amnesia.
  • Figure 4: Q&A with GPT-3 that shows bias and the availability heuristic from biasExample. (bold added for emphasis)
  • Figure 5: We introduced suggestibility into a conversation with Google's Bard. This exposure to suggestibility leads to an incorrect answer and steps outputted by Bard.
  • ...and 2 more figures